by Matthew Wooden, Ishan Gupta, Kevin Mercurio, Devon Bryant, and Claire Dorman
In his seminal ebook “Considering, Quick and Gradual,” Daniel Kahneman describes two programs that drive human cognition: System 1, which operates mechanically and rapidly with little effort, and System 2, which allocates consideration to more difficult psychological actions requiring deliberate focus. This dual-process idea has profound implications not only for understanding human habits, however for designing clever programs that should stability rapid responsiveness with strategic foresight. Related “plan vs. act” decompositions present up in different domains too — for instance, robotics and autonomous driving usually separate a slower planning layer (setting objectives and constraints over longer horizons) from quicker management and execution loops, and fashionable LLM brokers regularly pair deliberate planning with speedy, step-by-step software use and response.
At Netflix, our messaging platform faces an analogous problem every single day. We ship lots of of thousands and thousands of customized notifications — push messages, emails, and in-app alerts — to assist members uncover content material they’ll love. This creates a central stress: optimizing every notification for near-term engagement can battle with what’s greatest for the member over the long run. Greater message frequency can improve fatigue and opt-out danger, whereas decrease frequency can cut back consciousness of related titles and options the member would worth.
This weblog put up introduces our framework for customized notifications — a hierarchical system the place a “sluggish” coverage makes strategic, customized choices a couple of member’s weekly messaging plan (e.g., the meant frequency per channel and the ensuing pacing over the week), whereas a “quick” coverage handles the tactical, real-time choices about which particular message to ship when a ship alternative happens. Collectively, they stability near-term engagement with longer-term member expertise.
The Downside:
Earlier than introducing our new framework, it’s useful to floor the dialogue in a consultant baseline for a personalised notification system. In our earlier manufacturing system, we used a causal mannequin to make ship choices by predicting the causal impact of a single message over a short while horizon. Whereas this method is efficient as a baseline, it suffers from two elementary limitations:
Quick-Time period Reward Horizons
The only-message consequence mannequin is educated to optimize short-horizon metrics, similar to rapid consumer actions occurring shortly after a notification is shipped. Whereas that is glorious for driving near-term engagement, it misses the cumulative, long-term results of a messaging technique. A message that drives an interplay at the moment may additionally contribute to notification fatigue, decreasing responsiveness within the weeks to observe. As a result of vital indicators of member satisfaction — like sustained viewing habits or gradual opt-out danger — solely floor over prolonged timeframes, a short-term mannequin will all the time miss the larger image.
Coupled Rating and Pacing Choices
When a single system evaluates day by day incrementality to resolve each whether or not to ship one thing and, in that case, which merchandise to ship, a person member’s weekly message frequency turns into a by-product of these day by day choices somewhat than an express management variable. In our earlier single-policy system, frequency was managed implicitly by a relevance threshold on the mannequin rating calibrated to attain a goal mixture ship price. Whereas efficient for managing total frequency, this mechanism restricted the system’s capacity to personalize frequency primarily based on particular person engagement patterns. Furthermore, as a result of ship eligibility and message choice have been coupled in the identical choice rule, adjusting the edge to regulate frequency additionally modified the distribution and high quality of chosen messages, and vice versa.
To resolve these challenges, we would have liked a system that would separate longer-term technique from shorter-term choices. What if we may decide an optimum, customized message plan for every member, after which concentrate on deciding on essentially the most related content material inside these bounds? Within the following sections, we element how we realized this imaginative and prescient by decoupling our notification engine right into a hierarchical ‘System 1’ and ‘System 2’ framework.
The Proposed Technique: A Hierarchical Gradual-Quick Structure

The Gradual coverage’s main function is to outline a customized pacing of messages over an outlined time horizon. The choices made by sluggish coverage are consumed by the Quick Coverage whose function is to maximise rapid relevance and choose the optimum message for the member at any given second.
For example the Gradual Coverage in observe: For instance, if optimized at a weekly cadence, the coverage evaluates a member’s long-term engagement patterns to pick out a “Pacing Plan Motion.” To maintain the motion house manageable but expressive, we discretize the choice house right into a set of actions that independently specify push and e mail frequencies. This offers roughly O(100) distinct mixtures of cross-channel pacing methods.
The Utility Operate
The Gradual coverage selects actions by maximizing a personalised utility perform. This perform explicitly trades off constructive engagement indicators towards the long-term “value” of messaging.
U(member, motion) = Σ wₖ·Reward_k(member,motion) — Price(motion)
To seize a holistic view of member well being, this utility consists of:
- Constructive Alerts: Capturing the probability {that a} member will discover worth in and interact with the platform.
- Detrimental Alerts: Capturing the probability of member fatigue or a propensity to choose out of a messaging channel.
Ideally, detrimental indicators alone would naturally penalize over-messaging. In observe, nevertheless, express detrimental suggestions is extraordinarily sparse. With out an extra constraint, the expected ‘value’ of an incremental message seems negligible, inflicting the mannequin to gravitate towards most frequency.
To deal with this, we introduce a common message value that’s added to the customized detrimental‑suggestions prediction for each ship. This extra value time period retains the reward perform concave and effectively‑behaved, stopping degenerate “all the time ship” insurance policies. The message value parameter is empirically tuned utilizing a mixture of on-line experiments and offline analysis metrics.
Pacing Technique
The 2-stage design naturally permits for optimizing each the common frequency in addition to pacing of messages over time. The best pacing technique is uniform random: we translate the frequency goal right into a per-opportunity ship likelihood and, at every eligible alternative, successfully flip a weighted coin to resolve whether or not to ship. This produces an organically randomized sample whose anticipated ship price matches the goal.
Whereas uniform pacing offers a clear and strong baseline, the framework readily extends to richer, non-uniform pacing profiles (for instance, day-of-week patterns, conditioning on consumer exercise, or launch-aligned bursts) each time product or user-experience issues name for extra structured temporal distributions.
Coverage-to-Coverage Communication
The true energy of this hierarchy lies in decoupling. By splitting into “Gradual” and “Quick” insurance policies, we permit every to concentrate on what it does greatest.
To bridge these two worlds asynchronously, choices are occasions and state is managed by a low-latency function retailer:
- The Planner (Gradual): The Gradual coverage calculates a member’s ideally suited pacing plan. It writes this strategic intent to a function retailer
- The Executor (Quick): Day by day, when a notification alternative arises, the Quick Coverage merely pulls that saved “plan” as a function. It then executes the tactical ship choice inside these strategic guardrails.
This structure offers two vital benefits:
- “Stickiness”: It ensures a member receives a constant expertise. The Gradual coverage might be executed as soon as at an outlined cadence; the plan is saved and honored.
- Unbiased Evolution: We are able to retrain, optimize, or A/B take a look at our weekly pacing methods (the “Gradual” layer) with out ever touching the real-time rating logic (the “Quick” layer).

Key Outcomes & Takeaways
The transition to a hierarchical structure resulted in one in every of our largest manufacturing metric lifts up to now. We noticed a number of key breakthroughs:
- Empowering the “Informal Viewer”: Positive aspects have been most vital amongst members who watch much less regularly — a vital cohort the place well timed, high-relevance consciousness of recent content material is very important.
- The Energy of Decoupling: Separating frequency planning from message choice was as transformative because the modeling itself. This new structure unlocks unimaginable flexibility, permitting us to iterate on content material rating fashions and pacing methods as two unbiased, clear variables.
- Respecting the Horizon: The influence of messaging is never an remoted occasion; its results construct up cumulatively primarily based on ongoing interactions between our system and the member. By isolating pacing right into a devoted strategic layer, we now have the mechanism to explicitly handle long-term fatigue and opt-out danger.
Acknowledgments
We couldn’t have delivered this mission with out the assistance of our excellent colleagues, and we sincerely thank them for his or her contributions.
Characteristic Retailer Group: Aaron Lewis, Tom Switzer, Abby Whittier, Ray Zhang
Product: Fiona Li
AI for Member Techniques (supporting contributor): Sergi Perez
Considering Quick & Gradual for a Customized Notification System was initially revealed in Netflix TechBlog on Medium, the place individuals are persevering with the dialog by highlighting and responding to this story.