Cluster schedulers

Agenda
• What is cluster scheduler and why one would need it?

• Cluster scheduler architectures

• Speciﬁcs of YARN, Kubernetes, Mesos and Nomad:

• Architecture

• Speciﬁc features / positioning

• Pros and cons

What is cluster scheduler?
Do I really need it?
• Software component (monolith or distributed) with two major functions:

• Allocate resources on node(s) for incoming workload

• Maintain task lifecycle on allocated resources (distribute, run, keep
up, shutdown)

• Cluster scheduler is diﬀerent from application scheduler

• You need one (and probably using one) if you run distributed
application

• You need a real one if you run more than one application and need
some elasticity

Monolith architecture
• Scheduler is a single process
that controls everything about
workloads

• Examples: Hadoop
JobTracker, Kubernetes (kube-
scheduler)

• Simple initial implementation

• Hard to implement different
requirements for different
workloads
* Picture source: http://www.firmament.io/blog/scheduler-architectures.html

Two-level architecture
• Task lifecycle is separated
from resource allocation

• Examples: YARN (you have to
see it), Mesos

• Easy to add diﬀerent types of
application

• Hard to implement anti-
interference measures, priority
cross-application preemption

Shared-state architecture
• Each scheduler (i.e.
application type) maintains its
own state of the cluster and
commits changes as a
transactions (that could
succeed or fail)

• Example: Nomad

• State synchronisation has to
be done

Distributed architecture
• No centralised resource
allocation, simpliﬁed model

• Example: Sparrow

• Has great advantages on ﬁne-
grained tasks randomly
distributed on large cluster

• Any synchronisation (e.g. to
avoid interference) is hard

YARN: Yet Another
Resource Negotiator

History
• MapReduce JobTracker generalisation (decoupled Resource
Manager and Application Master), one of two parts of
“Hadoop”

• Resource allocation based on requests

• Works ﬁne with large containers and batch processes, not so
much with ﬁne-grained / services

• All Hadoop frameworks have 1st class support for YARN
(MRv2, Pig, Hive, Spark)

• Supports pluggable schedulers (cluster-level), containerisation

Architecture
* Picture source: Apache Hadoop Website

Speciﬁc features / issues
• Pluggable “queue management” scheduler:

• FairScheduler: memory-fair by default, possible DRF policy for speciﬁc queue

• CapacityScheduler: pluggable resource calculator,
DominantResourceCalculator supports CPU and Memory

• Data locality support possible (e.g. MRv2)

• Preemption: across queues and intra queues (2.8.0/3.0.0)

• Kerberos authentication, ACLs on queue and cluster

• Awful metric system, no support for metric collection from “frameworks”

• No volume management

History
• Kubernetes happened after internal “Borg” project in Google

• Initially: greenﬁeld implementation of container orchestration
targeted for services

• kube-scheduler is a small part of what K8s does

• Best for micro services on cloud

• Huge momentum

• Very ops friendly, Google dogfooding it (Google Cloud
Engine is upstream K8s)

Architectures
* Picture source: Wikipedia

Speciﬁc features
• Pod / Controllers / Services

• Controllers: Replica Sets / StatefulSets / Daemon Sets

• Volumes!

• Resources, oversubscription and QoS

• Service Discovery / Load Balancing

• Secrets

• Authentication / Authorizations / Admission Controls

• Monitoring: Heapster / cAdvisor

• Federation!

• …

Issues
• Many concepts, hard to master and reason about (e.g.
controllers are like schedulers, but not really)

• Monolith kube-scheduler could be slow

• No IO isolation, not suitable for analytical workloads on
large on-premise clusters

• No real enterprise support (that I know of)

History
• UC Berkely 2009, Apache top-tier 2013

• Clean two-level architecture implementation

• Resource allocation based on oﬀers

• Initially part of BDAS groups, targeted at Big Data ﬁrst
(Apache Spark is Proof-of-Concept for Mesos)

• Popularised by Mesosphere in DC/OS product

Architecture
* Picture source: Apache Mesos Website

Speciﬁc features
• Flexible in terms of resources available that could be
allocated: cpus, memory, disks / volumes, gpus

• Pluggable: schedulers (called frameworks), containerizers,
loggers, networking (CNI/libnetwork)

• Oversubscription, revocable resources, quotas

• Some volume management

• Very rough around edges

Framework support
• Although it’s very common when somebody runs X on Y, Mesos is a
leader in terms of hosting other stuﬀ

• It’s really easy to develop Mesos framework

• Some examples:

• Marathon/Aurora for container orchestration (some people even tried
K8s, but that is too much)

• HDFS/Kafka/NoSQL DBs - if you like to live on the edge

• Jenkins/Artifactory/Gitlab

• Spark/TF/Flink/Storm

History
• 2015, developed by Hashicorp

• Shared-state architecture (service/batch/system
schedulers) Docker scheduler

• Dependent on other Hashicorp tools: Consul, Vault

Architecture
* Picture source: Nomad Website

Speciﬁc features & issues
• Multi-DC and multi-region support based on Gossip

• Service/batch/system schedulers

• No authorisations, only basic TLS on communication

• No volume management

• No IO isolation

• Preemption?

Cluster schedulers

More Related Content

What's hot

Similar to Cluster schedulers

Recently uploaded

Cluster schedulers