By Alvin Bao, Alex Petrov, Jennifer Lai, Aidan Sherr, and Samartha Chandrashekar
As part of the journey to transition Netflix’s compute infrastructure to be extra Kubernetes-native, we have now leaned into incorporating parts from the Kubernetes ecosystem into our container platform Titus. One instance of that is our use of Kueue, a cloud-native job queueing system for batch workloads, which has largely changed the customized queuing and scheduling logic in our homegrown managed batch resolution Compute Managed Batch (CMB). On this publish, we’ll give an outline of what motivated the migration, how we migrated tens of millions of batch jobs to make use of Kueue, and what Kueue permits us to supply as a Compute platform.
Temporary Overview of CMB and Titus
CMB is a managed batch resolution that permits customers and functions to execute and handle workloads that run to completion. Utilizing a tenant hierarchy, workloads are managed and queued with ordered execution via priorities, and capability is managed on a per-tenant foundation. Workloads which are submitted to CMB are then run on Titus. The options of Titus related to CMB are workload federation throughout a number of cells (Kubernetes clusters) and federated capability reservations. This implies CMB can speak to a single Titus endpoint to get/submit workloads and replace capability reservations with out having to fret concerning the underlying cell/cluster topology.
CMB Tenant Hierarchy
Tenants present a grouping mechanism for jobs submitted on behalf of sure organizations, platforms, or functions. Customers can create and arrange tenants nonetheless most accurately fits their group or use case. For instance, a company could use a single tenant throughout a number of functions or a fancy hierarchical construction that matches its workforce and software possession construction.
Tenants are related to a capability configuration. The capability configuration defines the quantity of compute capability out there to the tenant and offers sure ensures round isolation from different tenants. The capability configuration accommodates weight (used for truthful sharing) and useful resource dimensions.
There are two forms of tenants in CMB:
- Inside Tenants — meant to facilitate the creation of a tree of tenants. Inside tenants’ youngsters might be each inner and leaf tenants. Inside tenants themselves don’t settle for work and thus do not need related queues.
- Leaf Tenants — can settle for work and have queues related to them. Leaf tenants can’t have any youngsters.
As regards to capability configuration, tenants can use 2 forms of capability:
Reserved Capability
For inner tenants, if a person specifies reserved capability, it’s fair-shared throughout the subtree and usable by the leaf tenants below that inner tenant.
For leaf tenants, if a person specifies reserved capability, it partitions capability inside the hierarchy in order that different tenants can’t reserve the identical sources. These reserved sources should not shared with some other tenant, making certain throughput for a given leaf tenant.
Shared Capability
The Compute workforce maintains a world pool of shared capability that any tenant can burst into, along with its reserved capability. Reservations should not required to make use of CMB, so a tenant can run out of shared capability completely. The pool is fair-shared throughout tenants, however in CMB, this utilized solely at admission: CMB had no preemption, so as soon as a job was admitted, it ran to completion no matter shifts in fair-share demand.
Kueue adjustments the semantics for each forms of capability, which the truthful sharing and preemption part covers.
Right here is an instance of what a tenant hierarchy appears like:

CMB Consumer/Software Workload Submission Circulate

CMB Consumer/Software Tenant Administration Circulate

Why Kueue?
CMB was created in 2018, earlier than or alongside most of the open-source batch compute choices out there immediately. Through the years, because the Kubernetes ecosystem has advanced, most of the options that CMB supplied or strived to supply have been included in these open supply tasks e.g., truthful sharing, hierarchical tenants, capability administration, precedence queuing. As well as, it turned more and more cumbersome to develop new options corresponding to preemption when CMB was thus far faraway from the underlying Kubernetes cluster.
The workforce took a have a look at what it could take to modernize our batch abstraction and settled on Kueue for the next causes:
- In contrast to different choices corresponding to YuniKorn or Volcano, Kueue doesn’t change pod scheduling by the kube-scheduler, permitting integration with present Titus scheduling profiles. Changing Titus scheduler profiles can fragment job placement, doubtlessly harming effectivity.
- Adoption momentum and tempo of innovation.
- Kueue helps multi-tenant quota administration over heterogeneous {hardware}.
- Kueue can function on primitives corresponding to v1.Pod and batch/v1.Job, and in addition helps higher-level abstractions corresponding to RayJob / RayCluster for future extensibility.
- Kueue has native options that the workforce would have appreciated to implement in CMB, corresponding to preemption, all-or-nothing scheduling, topology conscious scheduling.
Migrating to Kueue
This initiative of migrating CMB workloads to Kueue turned often known as Netflix Batch. The important thing tenets of our migration had been the next:
- Migration ought to require zero carry for CMB finish customers and be fully clear to them
- No regressions in container launch price and total max throughput
- Change CMB queuing and scheduling with Kueue
Netflix Batch Consumer/Software Workload Submission Circulate

The important thing distinction between the previous and new flows is that we defer queuing and scheduling to Kueue, which is enabled in every Kueue-enabled Titus cell. Titus federation routes the job to Kueue cells utilizing our customized Kueue router.
Netflix Batch Consumer/Software Tenant Administration Circulate

For us as operators, the migration was so simple as clicking a button on a tenant in our UI (as proven within the instance above). This additionally permits us to simply rollback adjustments if there have been points.
Below the hood, this enrollment converts inner tenants to Cohorts and leaf tenants to a ClusterQueue + LocalQueue. The capability configuration on a given tenant is transformed into useful resource flavors and nominal quotas. The structure for this appears as follows:

Classes Discovered
- Sustaining API parity with the prevailing system (vs exposing a brand new API floor) and migrating the underlying parts as a primary step derisked the undertaking by unstacking bets whereas additionally making certain we didn’t disrupt the shopper expertise.
- Don’t wait till the top emigrate probably the most complicated use case. We determined early on emigrate our largest and most complicated buyer first. This allowed us to construct confidence that we might later migrate different prospects to Netflix Batch with out points, and resulted within the manufacturing migration lasting solely 4 weeks.
- We needed to run Kueue with a lot larger QPS, Burst, and groupKindConcurrency than the default configuration to fulfill our throughput wants. This was derisked early on by working load assessments in a growth surroundings that mimics Titus.
Present State of Kueue at Netflix
Kueue is totally rolled out in manufacturing, with it managing tens of millions of batch workloads. Sooner or later, we’re taking a look at choices to enroll extra of Titus batch workloads into this extra managed expertise. We now have additionally productionized extra truthful sharing and preemptions to deal with higher utilization of reserved capability. As well as, our learnings are being leveraged by different inner groups, together with these constructing Kubernetes-native coaching infrastructure, to tell their job scheduling and queuing configurations.
Honest Sharing and Preemption
With Kueue, Preemption-based Honest Sharing permits Netflix Batch to take care of reservation semantics whereas lending sources to different tenants when these reservations should not in use. As well as, preemption permits Netflix Batch to preempt lower-priority workloads for higher-priority workloads. For our prospects, because of this tenants can use extra idle capability from reservations, submit extra jobs with out the chance of hunger, and have faster turnaround occasions for business-critical workloads.
An instance preemption configuration on a ClusterQueue that we’d be utilizing is as follows:
apiVersion: kueue.x-k8s.io/v1beta2
form: ClusterQueue
metadata:
title: "team-a-cq"
spec:
preemption:
reclaimWithinCohort: Any
withinClusterQueue: LowerPriority
With these options deployed, Compute has seen a big enhance in common useful resource utilization.
Acknowledgement
This work wouldn’t have been doable with out the good work of your entire Compute workforce at Netflix.
How Netflix Simplified Batch Compute with Kueue was initially revealed in Netflix TechBlog on Medium, the place persons are persevering with the dialog by highlighting and responding to this story.