Entertainer.newsEntertainer.news
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards

Subscribe to Updates

Get the latest Entertainment News and Updates from Entertainer News

What's Hot

Storage Wars Tragedy: How A&E Plans To Pay Tribute To Darrell ‘The Gambler’ Sheets

April 24, 2026

Drake Is the Most-Streamed Rapper of All Time on Spotify

April 24, 2026

Shawn Hatosy Is Back for ‘Fire Country’ Finale — See First Photo (Exclusive)

April 24, 2026
Facebook Twitter Instagram
Friday, April 24
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
Facebook Twitter Tumblr LinkedIn
Entertainer.newsEntertainer.news
Subscribe Login
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards
Entertainer.newsEntertainer.news
Home Recommending for Long-Term Member Satisfaction at Netflix | by Netflix Technology Blog | Aug, 2024
Web Series

Recommending for Long-Term Member Satisfaction at Netflix | by Netflix Technology Blog | Aug, 2024

Team EntertainerBy Team EntertainerAugust 29, 2024Updated:August 29, 2024No Comments9 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
Recommending for Long-Term Member Satisfaction at Netflix | by Netflix Technology Blog | Aug, 2024
Share
Facebook Twitter LinkedIn Pinterest Email


Netflix Technology Blog
Netflix TechBlog

By Jiangwei Pan, Gary Tang, Henry Wang, and Justin Basilico

Our mission at Netflix is to entertain the world. Our personalization algorithms play an important function in delivering on this mission for all members by recommending the suitable reveals, motion pictures, and video games on the proper time. This objective extends past speedy engagement; we purpose to create an expertise that brings lasting enjoyment to our members. Conventional recommender methods usually optimize for short-term metrics like clicks or engagement, which can not absolutely seize long-term satisfaction. We attempt to advocate content material that not solely engages members within the second but additionally enhances their long-term satisfaction, which will increase the worth they get from Netflix, and thus they’ll be extra more likely to proceed to be a member.

One easy method we are able to view suggestions is as a contextual bandit drawback. When a member visits, that turns into a context for our system and it selects an motion of what suggestions to indicate, after which the member offers numerous varieties of suggestions. These suggestions alerts might be speedy (skips, performs, thumbs up/down, or including objects to their playlist) or delayed (finishing a present or renewing their subscription). We are able to outline reward features to mirror the standard of the suggestions from these suggestions alerts after which prepare a contextual bandit coverage on historic knowledge to maximise the anticipated reward.

There are lots of ways in which a suggestion mannequin might be improved. They could come from extra informative enter options, extra knowledge, completely different architectures, extra parameters, and so forth. On this publish, we deal with a less-discussed side about bettering the recommender goal by defining a reward perform that tries to raised mirror long-term member satisfaction.

Member retention may appear to be an apparent reward for optimizing long-term satisfaction as a result of members ought to keep in the event that they’re happy, nonetheless it has a number of drawbacks:

  • Noisy: Retention might be influenced by quite a few exterior elements, resembling seasonal tendencies, advertising campaigns, or private circumstances unrelated to the service.
  • Low Sensitivity: Retention is simply delicate for members on the verge of canceling their subscription, not capturing the complete spectrum of member satisfaction.
  • Onerous to Attribute: Members may cancel solely after a collection of dangerous suggestions.
  • Gradual to Measure: We solely get one sign per account per thirty days.

Because of these challenges, optimizing for retention alone is impractical.

As a substitute, we are able to prepare our bandit coverage to optimize a proxy reward perform that’s extremely aligned with long-term member satisfaction whereas being delicate to particular person suggestions. The proxy reward r(consumer, merchandise) is a perform of consumer interplay with the advisable merchandise. For instance, if we advocate “One Piece” and a member performs then subsequently completes and offers it a thumbs-up, a easy proxy reward is perhaps outlined as r(consumer, merchandise) = f(play, full, thumb).

Click on-through price (CTR)

Click on-through price (CTR), or in our case play-through price, might be considered as a easy proxy reward the place r(consumer, merchandise) = 1 if the consumer clicks a suggestion and 0 in any other case. CTR is a typical suggestions sign that typically displays consumer desire expectations. It’s a easy but sturdy baseline for a lot of suggestion functions. In some circumstances, resembling adverts personalization the place the clicking is the goal motion, CTR might even be an affordable reward for manufacturing fashions. Nonetheless, normally, over-optimizing CTR can result in selling clickbaity objects, which can hurt long-term satisfaction.

Past CTR

To align the proxy reward perform extra carefully with long-term satisfaction, we have to look past easy interactions, take into account all varieties of consumer actions, and perceive their true implications on consumer satisfaction.

We give just a few examples within the Netflix context:

  • Quick season completion ✅: Finishing a season of a advisable TV present in someday is a robust signal of enjoyment and long-term satisfaction.
  • Thumbs-down after completion ❌: Finishing a TV present in a number of weeks adopted by a thumbs-down signifies low satisfaction regardless of vital time spent.
  • Taking part in a film for simply 10 minutes ❓: On this case, the consumer’s satisfaction is ambiguous. The transient engagement may point out that the consumer determined to desert the film, or it may merely imply the consumer was interrupted and plans to complete the film later, maybe the following day.
  • Discovering new genres ✅ ✅: Watching extra Korean or recreation reveals after “Squid Recreation” suggests the consumer is discovering one thing new. This discovery was possible much more priceless because it led to a wide range of engagements in a brand new space for a member.

Reward engineering is the iterative means of refining the proxy reward perform to align with long-term member satisfaction. It’s just like characteristic engineering, besides that it may be derived from knowledge that isn’t out there at serving time. Reward engineering entails 4 phases: speculation formation, defining a brand new proxy reward, coaching a brand new bandit coverage, and A/B testing. Under is an easy instance.

Consumer suggestions used within the proxy reward perform is commonly delayed or lacking. For instance, a member might resolve to play a advisable present for only a few minutes on the primary day and take a number of weeks to completely full the present. This completion suggestions is due to this fact delayed. Moreover, some consumer suggestions might by no means happen; whereas we may need in any other case, not all members present a thumbs-up or thumbs-down after finishing a present, leaving us unsure about their stage of enjoyment.

We may attempt to wait to provide an extended window to watch suggestions, however how lengthy ought to we anticipate delayed suggestions earlier than computing the proxy rewards? If we wait too lengthy (e.g., weeks), we miss the chance to replace the bandit coverage with the newest knowledge. In a extremely dynamic atmosphere like Netflix, a stale bandit coverage can degrade the consumer expertise and be notably dangerous at recommending newer objects.

Resolution: predict lacking suggestions

We purpose to replace the bandit coverage shortly after making a suggestion whereas additionally defining the proxy reward perform based mostly on all consumer suggestions, together with delayed suggestions. Since delayed suggestions has not been noticed on the time of coverage coaching, we are able to predict it. This prediction happens for every coaching instance with delayed suggestions, utilizing already noticed suggestions and different related data as much as the coaching time as enter options. Thus, the prediction additionally will get higher as time progresses.

The proxy reward is then calculated for every coaching instance utilizing each noticed and predicted suggestions. These coaching examples are used to replace the bandit coverage.

However aren’t we nonetheless solely counting on noticed suggestions within the proxy reward perform? Sure, as a result of delayed suggestions is predicted based mostly on noticed suggestions. Nonetheless, it’s less complicated to cause about rewards utilizing all suggestions immediately. As an illustration, the delayed thumbs-up prediction mannequin could also be a posh neural community that takes under consideration all noticed suggestions (e.g., short-term play patterns). It’s extra easy to outline the proxy reward as a easy perform of the thumbs-up suggestions moderately than a posh perform of short-term interplay patterns. It may also be used to regulate for potential biases in how suggestions is offered.

The reward engineering diagram is up to date with an elective delayed suggestions prediction step.

Two varieties of ML fashions

It’s price noting that this method employs two varieties of ML fashions:

  • Delayed Suggestions Prediction Fashions: These fashions predict p(remaining suggestions | noticed feedbacks). The predictions are used to outline and compute proxy rewards for bandit coverage coaching examples. In consequence, these fashions are used offline through the bandit coverage coaching.
  • Bandit Coverage Fashions: These fashions are used within the bandit coverage π(merchandise | consumer; r) to generate suggestions on-line and in real-time.

Improved enter options or neural community architectures usually result in higher offline mannequin metrics (e.g., AUC for classification fashions). Nonetheless, when these improved fashions are subjected to A/B testing, we frequently observe flat and even unfavorable on-line metrics, which might quantify long-term member satisfaction.

This online-offline metric disparity normally happens when the proxy reward used within the suggestion coverage will not be absolutely aligned with long-term member satisfaction. In such circumstances, a mannequin might obtain greater proxy rewards (offline metrics) however lead to worse long-term member satisfaction (on-line metrics).

However, the mannequin enchancment is real. One method to resolve that is to additional refine the proxy reward definition to align higher with the improved mannequin. When this tuning ends in constructive on-line metrics, the mannequin enchancment might be successfully productized. See [1] for extra discussions on this problem.

On this publish, we offered an summary of our reward engineering efforts to align Netflix suggestions with long-term member satisfaction. Whereas retention stays our north star, it isn’t simple to optimize immediately. Subsequently, our efforts deal with defining a proxy reward that’s aligned with long-term satisfaction and delicate to particular person suggestions. Lastly, we mentioned the distinctive problem of delayed consumer suggestions at Netflix and proposed an method that has confirmed efficient for us. Seek advice from [2] for an earlier overview of the reward innovation efforts at Netflix.

As we proceed to enhance our suggestions, a number of open questions stay:

  • Can we be taught a great proxy reward perform robotically by correlating conduct with retention?
  • How lengthy ought to we anticipate delayed suggestions earlier than utilizing its predicted worth in coverage coaching?
  • How can we leverage Reinforcement Studying to additional align the coverage with long-term satisfaction?

[1] Deep studying for recommender methods: A Netflix case examine. AI Journal 2021. Harald Steck, Linas Baltrunas, Ehtsham Elahi, Dawen Liang, Yves Raimond, Justin Basilico.

[2] Reward innovation for long-term member satisfaction. RecSys 2023. Gary Tang, Jiangwei Pan, Henry Wang, Justin Basilico.



Source link

Aug Blog LongTerm Member Netflix Recommending Satisfaction Technology
Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous ArticleNick Cannon & Brittany Bell’s Son Golden Is So Smart He’s Starting 4th Grade At 7 Years Old! WHOA!
Next Article ‘The Challenge 40’ Episode 3 Recap — [Spoiler] Makes a Huge Mistake
Team Entertainer
  • Website

Related Posts

Netflix and Henry Cavill’s 3-Part Detective Franchise Officially Returns on July 1

April 23, 2026

‘Street Fighter’ Star Confirms Long-Term Franchise Plans Are Already in Motion

April 22, 2026

Zara Tindall pictured with adorable new family member

April 21, 2026

Wednesday Season 3 Photo Gives First Look at Jenna Ortega in Netflix Return

April 21, 2026
Recent Posts
  • Storage Wars Tragedy: How A&E Plans To Pay Tribute To Darrell ‘The Gambler’ Sheets
  • Drake Is the Most-Streamed Rapper of All Time on Spotify
  • Shawn Hatosy Is Back for ‘Fire Country’ Finale — See First Photo (Exclusive)
  • Netflix and Henry Cavill’s 3-Part Detective Franchise Officially Returns on July 1

Archives

  • April 2026
  • March 2026
  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021

Categories

  • Actress
  • Awards
  • Behind the Camera
  • BollyBuzz
  • Celebrity
  • Edit Picks
  • Glam & Style
  • Global Bollywood
  • In the Frame
  • Insta Inspector
  • Interviews
  • Movies
  • Music
  • News
  • News & Gossip
  • News & Gossips
  • OTT
  • Podcast
  • Power & Purpose
  • Press Release
  • Spotlight Stories
  • Spotted!
  • Star Luxe
  • Television
  • Trending
  • Uncategorized
  • Web Series
NAVIGATION
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
  • About us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us
Copyright © 2026 Entertainer.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?