Entertainer.newsEntertainer.news
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards

Subscribe to Updates

Get the latest Entertainment News and Updates from Entertainer News

What's Hot

The Top 5 Clinics to Get Mounjaro in Abu Dhabi

March 6, 2026

Nicola Peltz Beckham breaks silence following Brooklyn’s cryptic birthday message from parents

March 6, 2026

Sarah Ferguson Essentially Homeless Amid Epstein Scandal – Friends & Even Her Daughters Are Shutting Her Out!

March 6, 2026
Facebook Twitter Instagram
Friday, March 6
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
Facebook Twitter Tumblr LinkedIn
Entertainer.newsEntertainer.news
Subscribe Login
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards
Entertainer.newsEntertainer.news
Home Detecting Scene Changes in Audiovisual Content | by Netflix Technology Blog | Jun, 2023
Web Series

Detecting Scene Changes in Audiovisual Content | by Netflix Technology Blog | Jun, 2023

Team EntertainerBy Team EntertainerJune 20, 2023Updated:June 20, 2023No Comments6 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
Detecting Scene Changes in Audiovisual Content | by Netflix Technology Blog | Jun, 2023
Share
Facebook Twitter LinkedIn Pinterest Email


Netflix Technology Blog
Netflix TechBlog

Avneesh Saluja, Andy Yao, Hossein Taghavi

When watching a film or an episode of a TV present, we expertise a cohesive narrative that unfolds earlier than us, usually with out giving a lot thought to the underlying construction that makes all of it potential. Nonetheless, motion pictures and episodes usually are not atomic items, however fairly composed of smaller parts equivalent to frames, photographs, scenes, sequences, and acts. Understanding these parts and the way they relate to one another is essential for duties equivalent to video summarization and highlights detection, content-based video retrieval, dubbing high quality evaluation, and video modifying. At Netflix, such workflows are carried out lots of of occasions a day by many groups all over the world, so investing in algorithmically-assisted tooling round content material understanding can reap outsized rewards.

Whereas segmentation of extra granular items like frames and shot boundaries is both trivial or can primarily depend on pixel-based data, increased order segmentation¹ requires a extra nuanced understanding of the content material, such because the narrative or emotional arcs. Moreover, some cues could be higher inferred from modalities aside from the video, e.g. the screenplay or the audio and dialogue observe. Scene boundary detection, specifically, is the duty of figuring out the transitions between scenes, the place a scene is outlined as a steady sequence of photographs that happen in the identical time and placement (usually with a comparatively static set of characters) and share a typical motion or theme.

On this weblog publish, we current two complementary approaches to scene boundary detection in audiovisual content material. The primary methodology, which could be seen as a type of weak supervision, leverages auxiliary knowledge within the type of a screenplay by aligning screenplay textual content with timed textual content (closed captions, audio descriptions) and assigning timestamps to the screenplay’s scene headers (a.ok.a. sluglines). Within the second strategy, we present {that a} comparatively easy, supervised sequential mannequin (bidirectional LSTM or GRU) that makes use of wealthy, pretrained shot-level embeddings can outperform the present state-of-the-art baselines on our inner benchmarks.

Determine 1: a scene consists of a sequence of photographs.

Screenplays are the blueprints of a film or present. They’re formatted in a particular manner, with every scene starting with a scene header, indicating attributes equivalent to the placement and time of day. This constant formatting makes it potential to parse screenplays right into a structured format. On the similar time, a) modifications made on the fly (directorial or actor discretion) or b) in publish manufacturing and modifying are not often mirrored within the screenplay, i.e. it isn’t rewritten to mirror the modifications.

Determine 2: screenplay parts, from The Witcher S1E1.

To be able to leverage this noisily aligned knowledge supply, we have to align time-stamped textual content (e.g. closed captions and audio descriptions) with screenplay textual content (dialogue and action² strains), allowing for a) the on-the-fly modifications which may lead to semantically related however not an identical line pairs and b) the potential post-shoot modifications which can be extra vital (reordering, eradicating, or inserting whole scenes). To deal with the primary problem, we use pre skilled sentence-level embeddings, e.g. from an embedding mannequin optimized for paraphrase identification, to signify textual content in each sources. For the second problem, we use dynamic time warping (DTW), a way for measuring the similarity between two sequences which will fluctuate in time or velocity. Whereas DTW assumes a monotonicity situation on the alignments³ which is often violated in apply, it’s strong sufficient to get better from native misalignments and the overwhelming majority of salient occasions (like scene boundaries) are well-aligned.

Because of DTW, the scene headers have timestamps that may point out potential scene boundaries within the video. The alignments can be used to e.g., increase audiovisual ML fashions with screenplay data like scene-level embeddings, or switch labels assigned to audiovisual content material to coach screenplay prediction fashions.

Determine 3: alignments between screenplay and video through time stamped textual content for The Witcher S1E1.

The alignment methodology above is an effective way to stand up and working with the scene change activity because it combines easy-to-use pretrained embeddings with a widely known dynamic programming method. Nonetheless, it presupposes the supply of high-quality screenplays. A complementary strategy (which in reality, can use the above alignments as a function) that we current subsequent is to coach a sequence mannequin on annotated scene change knowledge. Sure workflows in Netflix seize this data, and that’s our main knowledge supply; publicly-released datasets are additionally out there.

From an architectural perspective, the mannequin is comparatively easy — a bidirectional GRU (biGRU) that ingests shot representations at every step and predicts if a shot is on the finish of a scene.⁴ The richness within the mannequin comes from these pretrained, multimodal shot embeddings, a preferable design alternative in our setting given the problem in acquiring labeled scene change knowledge and the comparatively bigger scale at which we will pretrain varied embedding fashions for photographs.

For video embeddings, we leverage an in-house mannequin pretrained on aligned video clips paired with textual content (the aforementioned “timestamped textual content”). For audio embeddings, we first carry out supply separation to attempt to separate foreground (speech) from background (music, sound results, noise), embed every separated waveform individually utilizing wav2vec2, after which concatenate the outcomes. Each early and late-stage fusion approaches are explored; within the former (Determine 4a), the audio and video embeddings are concatenated and fed right into a single biGRU, and within the latter (Determine 4b) every enter modality is encoded with its personal biGRU, after which the hidden states are concatenated previous to the output layer.

Determine 4a: Early Fusion (concatenate embeddings on the enter).
Determine 4b: Late Fusion (concatenate previous to prediction output).

We discover:

  • Our outcomes match and generally even outperform the state-of-the-art (benchmarked utilizing the video modality solely and on our analysis knowledge). We consider the outputs utilizing F-1 rating for the optimistic label, and in addition chill out this analysis to contemplate “off-by-n” F-1 i.e., if the mannequin predicts scene modifications inside n photographs of the bottom fact. This can be a extra reasonable measure for our use instances as a result of human-in-the-loop setting that these fashions are deployed in.
  • As with earlier work, including audio options improves outcomes by 10–15%. A main driver of variation in efficiency is late vs. early fusion.
  • Late fusion is persistently 3–7% higher than early fusion. Intuitively, this outcome is smart — the temporal dependencies between photographs is probably going modality-specific and needs to be encoded individually.

Now we have introduced two complementary approaches to scene boundary detection that leverage quite a lot of out there modalities — screenplay, audio, and video. Logically, the following steps are to a) mix these approaches and use screenplay options in a unified mannequin and b) generalize the outputs throughout a number of shot-level inference duties, e.g. shot kind classification and memorable moments identification, as we hypothesize that this path could be helpful for coaching normal goal video understanding fashions of longer-form content material. Longer-form content material additionally comprises extra complicated narrative construction, and we envision this work as the primary in a sequence of initiatives that goal to higher combine narrative understanding in our multimodal machine studying fashions.

Particular due to Amir Ziai, Anna Pulido, and Angie Pollema.



Source link

Audiovisual Blog Content Detecting Jun Netflix Scene Technology
Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous ArticleShakira & Manuel Turizo’s New Song ‘Copa Vacia’ Announced – Billboard
Next Article Jonathan Majors to Stand Trial in August
Team Entertainer
  • Website

Related Posts

LITTLE HOUSE ON THE PRAIRIE Series Renewed for Season 2 at Netflix Ahead of the Season 1 Premiere — GeekTyrant

March 4, 2026

Optimizing Recommendation Systems with JDK’s Vector API | by Netflix Technology Blog | Mar, 2026

March 3, 2026

Skip ‘Wuthering Heights’ and Watch This 21st Century Period Romance Before It Leaves Netflix

March 1, 2026

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs | by Netflix Technology Blog

February 28, 2026
Recent Posts
  • The Top 5 Clinics to Get Mounjaro in Abu Dhabi
  • Nicola Peltz Beckham breaks silence following Brooklyn’s cryptic birthday message from parents
  • Sarah Ferguson Essentially Homeless Amid Epstein Scandal – Friends & Even Her Daughters Are Shutting Her Out!
  • Tuesday TV Ratings: RJ Decker, Best Medicine, NCIS, NBA Basketball, WWE NXT – canceled + renewed TV shows, ratings

Archives

  • March 2026
  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021

Categories

  • Actress
  • Awards
  • Behind the Camera
  • BollyBuzz
  • Celebrity
  • Edit Picks
  • Glam & Style
  • Global Bollywood
  • In the Frame
  • Insta Inspector
  • Interviews
  • Movies
  • Music
  • News
  • News & Gossip
  • News & Gossips
  • OTT
  • Podcast
  • Power & Purpose
  • Press Release
  • Spotlight Stories
  • Spotted!
  • Star Luxe
  • Television
  • Trending
  • Uncategorized
  • Web Series
NAVIGATION
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
  • About us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us
Copyright © 2026 Entertainer.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?