Entertainer.newsEntertainer.news
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards

Subscribe to Updates

Get the latest Entertainment News and Updates from Entertainer News

What's Hot

Nicola Peltz Beckham breaks silence following Brooklyn’s cryptic birthday message from parents

March 6, 2026

Lil Poppa’s Funeral Will Be Open to the Public and Livestreamed

March 6, 2026

SCREAM Slashes Past $1 Billion at the Box Office and Joins Horror’s Elite Club — GeekTyrant

March 5, 2026
Facebook Twitter Instagram
Friday, March 6
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
Facebook Twitter Tumblr LinkedIn
Entertainer.newsEntertainer.news
Subscribe Login
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards
Entertainer.newsEntertainer.news
Home How Temporal Powers Reliable Cloud Operations at Netflix
Web Series

How Temporal Powers Reliable Cloud Operations at Netflix

Team EntertainerBy Team EntertainerDecember 15, 2025Updated:December 16, 2025No Comments14 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
How Temporal Powers Reliable Cloud Operations at Netflix
Share
Facebook Twitter LinkedIn Pinterest Email


By Jacob Meyers and Rob Zienert

Temporal is a Sturdy Execution platform which lets you write code “as if failures don’t exist”. It’s develop into more and more vital to Netflix since its preliminary adoption in 2021, with customers starting from the operators of our Open Join international CDN to our Reside reliability groups now relying on Temporal to function their business-critical providers. On this put up, I’ll give a high-level overview of what Temporal provides customers, the issues we have been experiencing working Spinnaker that motivated its preliminary adoption at Netflix, and the way Temporal helped us cut back the variety of transient deployment failures at Netflix from 4% to 0.0001%.

A Crash Course on (a few of) Spinnaker

Spinnaker is a multi-cloud steady supply platform that powers the overwhelming majority of Netflix’s software program deployments. It’s composed of a number of (largely nautical themed) microservices. Let’s double-click on two specifically to know the issues we have been going through that led us to adopting Temporal.

In case you’re utterly new to Spinnaker, Spinnaker’s basic software for deployments is the Pipeline. A Pipeline consists of a sequence of steps known as Levels, which themselves will be decomposed into a number of Duties, or different Levels. An instance deployment pipeline for a manufacturing service could consist of those levels: Discover Picture -> Run Smoke Checks -> Run Canary -> Deploy to us-east-2 -> Wait -> Deploy to us-east-1.

How Temporal Powers Reliable Cloud Operations at Netflix
An instance Spinnaker Pipeline for a Netflix service

Pipeline configuration is extraordinarily versatile. You’ll be able to have Levels run utterly serially, one after one other, or you possibly can have a mixture of concurrent and serial Levels. Levels will also be executed conditionally based mostly on the results of earlier levels. This brings us to our first Spinnaker service: Orca. Orca is the orca-stration engine of Spinnaker. It’s liable for managing the execution of the Levels and Duties {that a} Pipeline unrolls into and coordinating with different Spinnaker providers to really execute them.

A kind of collaborating providers is known as Clouddriver. Within the instance Pipeline above, a number of the Levels would require interfacing with cloud infrastructure. For instance, the canary deployment entails creating ephemeral hosts to run an experiment, and a full deployment of a brand new model of the service could contain spinning up new servers after which tearing down the previous ones. We name these kinds of operations that mutate cloud infrastructure Cloud Operations. Clouddriver’s job is to decompose and execute Cloud Operations despatched to it by Orca as a part of a deployment. Cloud Operations despatched from Orca to Clouddriver are comparatively excessive degree (for instance: createServerGroup), so Clouddriver understands learn how to translate these into lower-level cloud supplier API calls.

Ache factors within the interplay between Orca and Clouddriver and the implementation particulars of Cloud Operation execution in Clouddriver are what led us to search for new options and in the end migrate to Temporal, so we’ll subsequent have a look at the anatomy of a Cloud Operation. Cloud Operations within the OSS model of Spinnaker nonetheless work as described under, so motivated readers can comply with alongside in supply code, nonetheless our migration to Temporal is completely closed-source following a fork from OSS in 2020 to permit Netflix to make bigger pivots to the product reminiscent of this one.

The Authentic Cloud Operation Circulation

A Cloud Operation’s execution goes one thing like this:

  1. Orca, in orchestrating a Pipeline execution, decides a specific Cloud Operation must be carried out. It sends a POST request to Clouddriver’s /ops endpoint with an untyped bag-of-fields.
  2. Clouddriver makes an attempt to resolve the operation Orca despatched right into a set of AtomicOperation s— inside operations that solely Clouddriver understands.
  3. If the payload was legitimate and Clouddriver efficiently resolved the operation, it’ll instantly return a Job ID to Orca.
  4. Orca will instantly start polling Clouddriver’s GET /job/<id> endpoint to maintain monitor of the standing of the Cloud Operation.
  5. Asynchronously, Clouddriver begins executing AtomicOperations utilizing its personal inside orchestration engine. In the end, the AtomicOperations resolve into cloud supplier API calls. Because the Cloud Operation progresses, Clouddriver updates an inside state retailer to floor progress to Orca.
  6. Ultimately, if all went nicely, Clouddriver will mark the Cloud Operation full, which finally surfaces to Orca in its polling. Orca considers the Cloud Operation completed, and the deployment can progress.
A sequence diagram of a Cloud Operation execution

This works nicely sufficient on the blissful path, however veer off the blissful path and dragons start to emerge:

  1. Clouddriver has its personal inside orchestration system unbiased of Orca to permit Orca to question the progress of Cloud Operation. That is largely undifferentiated lifting relative to Clouddriver’s purpose of actuating cloud infrastructure adjustments, and in the end provides complexity and floor space for bugs to the appliance. Moreover, Orca is tightly coupled to Clouddriver’s orchestration system — it should perceive learn how to ballot Clouddriver, interpret the standing, and deal with errors returned by Clouddriver.
  2. Distributed methods are messy — networks and exterior providers are unreliable. Whereas executing a Cloud Operation, Clouddriver might expertise transient community points, or the cloud supplier it’s trying to name into could also be having an outage, or any variety of points in between. Regardless of all of this, Clouddriver should be as dependable as moderately attainable as a core platform service. To take care of this form of challenge, Clouddriver internally advanced complicated retry logic, additional including cognitive complexity to the system.
  3. Bear in mind how a Cloud Operation will get decomposed by Clouddriver into AtomicOperations? Generally, if there’s a failure in the course of a Cloud Operation, we want to have the ability to roll again what was achieved in AtomicOperations previous to the failure. This led to a homegrown Saga framework being applied inside Clouddriver. Whereas this did end in an enormous step ahead in reliability of Cloud Operations going through transient failures as a result of the Saga framework additionally allowed replaying partially-failed Cloud Operations, it added but extra undifferentiated lifting contained in the service.
  4. The duty state stored by Clouddriver was instance-local. In different phrases, if the Clouddriver occasion finishing up a Cloud Operation crashed, that Cloud Operation state was misplaced, and Orca would finally outing polling for the duty standing. The Saga implementation talked about above mitigated this for sure operations, however was not extensively adopted throughout all cloud suppliers supported by Spinnaker.

We launched a lot of incidental complexity into Clouddriver in an effort to maintain Cloud Operation execution dependable, and regardless of all this deployments nonetheless failed round 4% of the time attributable to transient Cloud Operation failures.

Now, I can already hear you saying: “So what? Can’t individuals re-try their deployments in the event that they fail?” Whereas true, some pipelines take days to finish for complicated deployments, and a failed Cloud Operation mid-way by way of requires re-running the complete factor. This was detrimental to engineering productiveness at Netflix in a non-trivial manner. Fairly than proceed attempting to construct a quicker horse, we started to look elsewhere for our dependable orchestration necessities, which is the place Temporal comes in.

Temporal: Fundamental Ideas

Temporal is an open supply product that gives a sturdy execution platform on your functions. Sturdy execution implies that the platform will guarantee your packages run to completion regardless of adversarial situations. With Temporal, you set up your enterprise logic into Workflows, that are a deterministic collection of steps. The steps within Workflows are known as Actions, which is the place you encapsulate all of your non-deterministic logic that should occur in the middle of executing your Workflows. As your Workflows execute in processes known as Staff, the Temporal server durably shops their execution state in order that within the occasion of failures your Workflows will be retried and even migrated to a unique Employee. This makes Workflows extremely resilient to the kinds of transient failures Clouddriver was prone to. Right here’s a easy instance Workflow in Java that runs an Exercise to ship an e-mail as soon as each 30 days:

@WorkflowInterface
public interface SleepForDaysWorkflow {
@WorkflowMethod
void run();
}

public class SleepForDaysWorkflowImpl implements SleepForDaysWorkflow {

personal ultimate SendEmailActivities emailActivities =
Workflow.newActivityStub(
SendEmailActivities.class,
ActivityOptions.newBuilder()
.setStartToCloseTimeout(Length.ofSeconds(10))
.construct());

@Override
public void run() {
whereas (true) {
// Actions already carry retries/timeouts by way of choices.
emailActivities.sendEmail();

// Pause the workflow for 30 days earlier than sending the following e-mail.
Workflow.sleep(Length.ofDays(30));
}
}
}

@ActivityInterface
public interface SendEmailActivities {
void sendEmail();
}

There’s some fascinating issues to notice about this Workflow:

  1. Workflows and Actions are simply code, so you possibly can check them utilizing the identical strategies and processes as the remainder of your codebase.
  2. Actions are mechanically retried by Temporal with configurable exponential backoff.
  3. Temporal manages all of the execution state of the Workflow, together with timers (just like the one utilized by Workflow.sleep). If the Employee executing this workflow have been to have its energy cable unplugged, Temporal would guarantee one other Employee continues to execute it (even through the 30 day sleep).
  4. Workflow sleeps usually are not compute-intensive, they usually don’t tie up the course of.

You may already start to see how Temporal solves a number of the issues we had with Clouddriver. In the end, we determined to drag the set off on migrating Cloud Operation execution to Temporal.

Cloud Operations with Temporal

In the present day, we execute Cloud Operations as Temporal workflows. Right here’s what that appears like.

  1. Orca, utilizing a Temporal shopper, sends a request to Temporal to execute an UntypedCloudOperationRunner Workflow. The contract of the Workflow appears to be like one thing like this:
@WorkflowInterface
interface UntypedCloudOperationRunner {
/**
* Runs a cloud operation given an untyped payload.
*
* WorkflowResult is a skinny wrapper round OutputType offering an ordinary contract for
* purchasers to find out if the CloudOperation was profitable and fetching any errors.
*/
@WorkflowMethod
enjoyable <OutputType : CloudOperationOutput> run(stageContext: Map<String, Any?>, operationType: String): WorkflowResult<OutputType>
}

2. The Clouddriver Temporal employee is consistently polling Temporal for work. A employee will finally see a job for an UntypedCloudOperationRunner Workflow and begin executing it.

3. Just like earlier than with decision into AtomicOperations, Clouddriver does some pre-processing of the bag-of-fields in stageContext and resolves it to a strongly typed implementation of the CloudOperation Workflow interface based mostly on the operationType enter and the stageContext:

interface CloudOperation<I : CloudOperationInput, O : CloudOperationOutput> {
@WorkflowMethod
enjoyable function(enter: I, credentials: AccountCredentials<out Any>): O
}

4. Clouddriver begins a Baby Workflow execution of the CloudOperation implementation it resolved. The kid workflow will execute Actions which deal with the precise cloud supplier API calls to mutate infrastructure.

5. Orca makes use of its Temporal Consumer to await completion of the UntypedCloudOperationRunner Workflow. As soon as it’s full, Temporal notifies the shopper and sends the outcome and Orca can proceed progressing the deployment.

Sequence diagram of a Cloud Operation execution with Temporal

Outcomes and Classes Discovered from the Migration

A shiny new structure is nice, however equally necessary is the non-glamorous work of refactoring legacy methods to suit the brand new structure. How did we combine Temporal into vital dependencies of all Netflix engineers transparently?

The reply, after all, is a mixture of abstraction and dynamic configuration. We constructed a CloudOperationRunner interface in Orca to encapsulate whether or not the Cloud Operation was being executed by way of the legacy path or Temporal. At runtime, Quick Properties (Netflix’s dynamic configuration system) decided which path a stage that wanted to execute a Cloud Operation would take. We might set these properties fairly granularly — by Stage kind, cloud supplier account, Spinnaker software, Cloud Operation kind (createServerGroup), and cloud supplier (both AWS or Titus in our case). The Spinnaker providers themselves have been the primary to be deployed utilizing Temporal, and inside two quarters, all functions at Netflix have been onboarded.

Impression

What did we now have to indicate for all of it? With Temporal because the orchestration engine for Cloud Operations, the proportion of deployments that failed attributable to transient Cloud Operation failures dropped from 4% to 0.0001%. For these holding monitor at residence, that’s a 4 and a half order of magnitude discount. Just about eliminating this failure mode for deployments was an enormous win for developer productiveness, particularly for groups with lengthy and complicated deployment pipelines.

Past the advance in deployment success metrics, we noticed quite a few different advantages:

  1. Orca now not must instantly talk with Clouddriver to start out Cloud Operations or ballot their standing with Temporal because the middleman. The providers are much less coupled, which is a win for maintainability.
  2. Talking of maintainability, with Temporal doing the heavy lifting of orchestration and retries within Clouddriver, we obtained to take away a number of the homegrown logic we’d constructed up over time for a similar goal.
  3. Since Temporal manages execution state, Clouddriver situations grew to become stateless and Cloud Operation execution can bounce between situations with impunity. We are able to deal with Clouddriver situations extra like cattle and allow issues like Chaos Monkey for the service which we have been beforehand prevented from doing.
  4. Migrating Cloud Operation steps into Actions was a forcing perform to re-write the logic to be idempotent. Since Temporal retries actions by default, it’s usually really useful they be idempotent. This alone fastened quite a few points that existed beforehand when operations have been retried in Clouddriver.
  5. We set the retry timeout for Actions in Clouddriver to be two hours by default. This provides us an extended leash to fix-forward or rollback Clouddriver if we introduce a regression earlier than buyer deployments fail — to them, it’d simply appear to be a deployment is taking longer than typical.
  6. Cloud Operations are a lot simpler to introspect than earlier than. Temporal ships with a terrific UI to assist visualize Workflow and Exercise executions, which is a big boon for debugging reside Workflows executing in manufacturing. The Temporal SDKs and server additionally emit a number of helpful metrics.
A Cloud Operation Workflow as seen from the Temporal UI. This operation executes 3 Activities: DescribeAutoScalingGroup, GetHookConfigurations, and ResizeServerGroup
Execution of a resizeServerGroup Cloud Operation as seen from the Temporal UI. This operation executes 3 Actions: DescribeAutoScalingGroup, GetHookConfigurations, and ResizeServerGroup

Classes Discovered

With the advantage of hindsight, there are additionally some classes we will share from this migration:

1. Keep away from pointless Baby Workflows: Structuring Cloud Operations as an UntypedCloudOperationRunner Workflow that begins Baby Workflows to really execute the Cloud Operation’s logic was pointless and the indirection made troubleshooting harder. There are conditions the place Baby Workflows are applicable, however on this case we have been utilizing them as a software for code group, which is usually pointless. We might’ve achieved the identical impact with class composition within the top-level mother or father Workflow.

2. Use single argument objects: At first, we structured Workflow and Exercise features with variable arguments, a lot as you’d write regular features. This may be problematic for Temporal due to Temporal’s determinism constraints. Including or eradicating an argument from a perform signature is not a backward-compatible change, and doing so can break long-running workflows — and it’s not instantly apparent in code assessment your change is problematic. The popular sample is to make use of a single serializable class to host all of your arguments for Workflows and Actions — these will be extra freely modified with out breaking determinism.

3. Separate enterprise failures from workflow failures: We just like the sample of the WorkflowResult kind that UntypedCloudOperationRunner returns within the interface above. It permits us to speak enterprise course of failures with out failing the Workflow itself and have extra total nuance in error dealing with. It is a sample we’ve carried over to different Workflows we’ve applied since.

Temporal at Netflix In the present day

Temporal adoption has skyrocketed at Netflix since its preliminary introduction for Spinnaker. In the present day, we now have tons of of use circumstances, and we’ve seen adoption double within the final yr with no indicators of slowing down.

One main distinction between preliminary adoption and immediately is that Netflix migrated from an on-prem Temporal deployment to utilizing Temporal Cloud, which is Temporal’s SaaS providing of the Temporal server. This has allow us to scale Temporal adoption whereas working a lean group. We’ve additionally constructed up a strong inside platform round Temporal Cloud to combine with Netflix’s inside ecosystem and make onboarding for our builders as simple as attainable. Keep tuned for a future put up digging into extra specifics of our Netflix Temporal platform.

Acknowledgement

All of us stand on the shoulders of giants in software program. I wish to name out that I’m retelling the work of my two gorgeous colleagues Chris Smalley and Rob Zienert on this put up, who have been the 2 aforementioned engineers who launched Temporal and carried out the migration.


How Temporal Powers Dependable Cloud Operations at Netflix was initially printed in Netflix TechBlog on Medium, the place persons are persevering with the dialog by highlighting and responding to this story.



Source link

Cloud Netflix Operations Powers Reliable Temporal
Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous ArticleEvery Celeb Hair Transformation of 2025: Carrie Underwood and More
Next Article Taylor Swift End Of An Era Docuseries: 13 Best Moments
Team Entertainer
  • Website

Related Posts

LITTLE HOUSE ON THE PRAIRIE Series Renewed for Season 2 at Netflix Ahead of the Season 1 Premiere — GeekTyrant

March 4, 2026

Optimizing Recommendation Systems with JDK’s Vector API | by Netflix Technology Blog | Mar, 2026

March 3, 2026

Skip ‘Wuthering Heights’ and Watch This 21st Century Period Romance Before It Leaves Netflix

March 1, 2026

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs | by Netflix Technology Blog

February 28, 2026
Recent Posts
  • Nicola Peltz Beckham breaks silence following Brooklyn’s cryptic birthday message from parents
  • Lil Poppa’s Funeral Will Be Open to the Public and Livestreamed
  • SCREAM Slashes Past $1 Billion at the Box Office and Joins Horror’s Elite Club — GeekTyrant
  • Metallica Add Third Set of Las Vegas Sphere Dates

Archives

  • March 2026
  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021

Categories

  • Actress
  • Awards
  • Behind the Camera
  • BollyBuzz
  • Celebrity
  • Edit Picks
  • Glam & Style
  • Global Bollywood
  • In the Frame
  • Insta Inspector
  • Interviews
  • Movies
  • Music
  • News
  • News & Gossip
  • News & Gossips
  • OTT
  • Podcast
  • Power & Purpose
  • Press Release
  • Spotlight Stories
  • Spotted!
  • Star Luxe
  • Television
  • Trending
  • Uncategorized
  • Web Series
NAVIGATION
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
  • About us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us
Copyright © 2026 Entertainer.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?