by Emily Gill
Every year, we carry the Analytics Engineering group collectively for an Analytics Summit — a multi-day inner convention to share analytical deliverables throughout Netflix, focus on analytic observe, and construct relationships throughout the group. This put up is one in every of a number of subjects introduced on the Summit highlighting the breadth and influence of Analytics work throughout totally different areas of the enterprise.
Understanding Threat in Content material Launches
Each title you see on Netflix goes by way of a number of key phases: Improvement, Pre-Manufacturing, Manufacturing/Principal Images, Submit-Manufacturing, and eventually, Launch Preparation, all main as much as the Title Launch. As soon as Principal Images wraps, the main target shifts in Submit-Manufacturing from content material creation to high quality assurance and visible results (if wanted).
On the finish of Submit Manufacturing, Netflix receives the ultimate audio and video recordsdata — typically delivered as an IMF (Interoperable Grasp Format) — which triggers a flurry of Launch Preparation actions, targeted on duties reminiscent of the event of art work and trailers, creation of subtitles, maturity rankings & high quality management, that occur inside a decent window and depend on having the finalized media belongings in hand.
A few of this work might be kicked off earlier utilizing a non-final model of the media known as the Locked Minimize, however because it’s not absolutely the last deliverable, this presents a tradeoff: ought to our groups who put together content material for service look ahead to the extra finalized IMF to start their work, or begin sooner with the unfinal Locked Minimize? Ready for the IMF dangers a compressed timeline if it arrives late, whereas beginning with the Locked Minimize means groups could must do further conformance work if there are vital modifications between the Locked Minimize and the ultimate IMF.
Figuring out Gaps in Schedule Accuracy
To assist navigate the choice of when to start out launch preparation, our groups depend on estimated supply dates for each the Locked Minimize and IMF media belongings, that are manually supplied by content material companions in manufacturing schedules. Nevertheless, these schedules typically have gaps in protection and lack accuracy for each asset sorts (see Determine 1).

This isn’t sudden — productions are dynamic, going through frequent modifications, scheduling conflicts, and unexpected obstacles that may shift timelines with out warning. In consequence, there’s a transparent alternative to leverage the wealth of manufacturing knowledge we accumulate to foretell the danger of schedule slips. By growing a predictive mannequin, we intention to each fill in ETA gaps (offering asset supply estimates when none exist) and enhance the accuracy of present ETAs in comparison with conventional guide schedules.
Correlation between Schedule Accuracy and Launch Misses
Our evaluation reveals a robust correlation between scheduled inaccuracies and launch misses — cases the place a title experiences delays. To quantify schedule inaccuracy, we created a metric known as Gathered Error Days (AED), which measures the cumulative deviation between estimated (scheduled or predicted) supply dates and precise supply dates over time. AED is calculated retrospectively as the world between the scheduled (gray line) or predicted (blue line) supply dates and the precise supply date (inexperienced line).
After we evaluate titles with no less than one launch miss to these with out, we discover that imply AED is considerably greater within the group with launch misses. Notably, this impact is much more pronounced once we give attention to the interval nearer to supply — indicating that top AED (i.e., inaccurate schedules) within the last stretch earlier than launch is particularly correlated with launch misses, extra so than AED collected over an extended timeline. These findings additional inspire our efforts to enhance schedule accuracy and scale back AED by leveraging wealthy manufacturing knowledge and predictive modeling.
Modeling Time-to-Supply
Our predictive fashions are designed as boosted tree regression fashions that predict the “days till” both media asset supply for in-progress productions.
To energy these fashions, we leverage a variety of upstream knowledge sources together with production-level indicators of progress, title metadata, and seasonal indicators. We’re in a position to predict the times till media asset supply utilizing every day replace snapshots, permitting us to generate up-to-date predictions that mirror the most recent state of every in-progress manufacturing. Which means we’ve every function and what its worth was as of every day of a manufacturing. Modeling with this snapshotted knowledge permits us to generate up-to-date predictions as new data turns into out there, construct a versatile mannequin that works throughout all manufacturing phases, and seamlessly incorporate dynamic options that evolve over time (Determine 2).

Evaluating Our Strategy
Constructing a Complete Metrics Suite
When evaluating the efficiency of the predictive fashions, we glance throughout a collection of metrics to attempt to perceive the place and when predicted dates outperform scheduled dates. Amongst these are imply and median absolute error, relative to precise supply, to grasp the accuracy of our estimated dates. We additionally take into account bias metrics, reminiscent of imply and median error, to grasp if we’re persistently over- or under-predicting the precise supply. We calculate the usual deviation of our errors to grasp if there are giant shifts within the bulk of the distribution of errors. For the tails of our error distributions, we calculate the share of our absolute errors which are larger than x days to supply.
For scheduled dates, we calculate protection throughout varied horizons to supply. It is a worth prop of the mannequin; we’ve constructed the mannequin in such a means that we are able to at all times present a predicted date and recoup any protection gaps that exist from scheduled dates alone.
Benchmarking In opposition to Guide Scheduling
In a backtest, we noticed vital enhancements throughout all of our metrics and throughout most horizons from supply. For instance, see Determine 3 which plots international imply absolute error (MAE) and exhibits giant reductions in errors (larger accuracy) in predicted IMF and Locked dates as in comparison with scheduled dates. Moreover, we see giant reductions in outliers from scheduled to predicted dates as effectively.

Since our groups use these dates over a time period and never at a single cut-off date, there may be an extra profit that we’re describing as an Earlier Accuracy Sign. By leveraging predictive dates, our groups profit from a degree of accuracy that they’d in any other case have to attend x period of time for if utilizing scheduled dates. For instance, 6 months out from Locked Minimize supply the anticipated dates are higher than scheduled dates on 76% of titles and have a degree of accuracy (6.1 wks MAE) that scheduled dates don’t attain till 11 weeks later.
Circling again to AED, which we talked about earlier is correlated to launch misses, we discover that in our backtested titles globally, and throughout most shopping for orgs and content material sorts (i.e., collection versus standalones), predicted IMF and Locked Minimize dates scale back AED from scheduled dates when calculated throughout the 6 months main as much as supply. We see related patterns once we repeat this for shorter horizons to supply as effectively.
Streamlining Workflows with Improved Scheduling
A key benefit of this predictive mannequin is that estimated supply dates are already integral to our stakeholders’ workflows — that means we are able to introduce predictive dates with out overhauling present processes. Nevertheless, this creates a brand new problem: with each scheduled and predicted dates out there, groups want to find out which is extra dependable. Whereas predictive dates are sometimes extra correct on common, there are conditions the place scheduled dates carry out higher. To handle this, we’ve constructed serving logic that defaults to scheduled dates in shopping for orgs the place the mannequin underperforms. Elsewhere, groups can view each dates aspect by aspect in dashboards, permitting them to use their very own judgment. Moreover, our predictive fashions leverage options which are tied to scheduled dates, which has emphasised the necessity and influence of making certain our upstream groups proceed to enter and replace scheduled dates even within the presence of our predictions. We’re piloting these predictive indicators in a number of methods, tailoring the strategy to suit the various wants and instruments of our varied launch prep capabilities.
Predicting Threat in Content material Launches: How Information-Pushed Insights can Remodel Launch Planning was initially revealed in Netflix TechBlog on Medium, the place individuals are persevering with the dialog by highlighting and responding to this story.