Entertainer.newsEntertainer.news
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards

Subscribe to Updates

Get the latest Entertainment News and Updates from Entertainer News

What's Hot

The Top 5 Clinics to Get Mounjaro in Abu Dhabi

March 6, 2026

Nicola Peltz Beckham breaks silence following Brooklyn’s cryptic birthday message from parents

March 6, 2026

Sarah Ferguson Essentially Homeless Amid Epstein Scandal – Friends & Even Her Daughters Are Shutting Her Out!

March 6, 2026
Facebook Twitter Instagram
Friday, March 6
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
Facebook Twitter Tumblr LinkedIn
Entertainer.newsEntertainer.news
Subscribe Login
  • Home
  • Celebrity
  • Movies
  • Music
  • Web Series
  • Podcast
  • OTT
  • Television
  • Interviews
  • Awards
Entertainer.newsEntertainer.news
Home Data ingestion pipeline with Operation Management (Marken)
Web Series

Data ingestion pipeline with Operation Management (Marken)

Team EntertainerBy Team EntertainerMarch 7, 2023Updated:March 8, 2023No Comments8 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
Data ingestion pipeline with Operation Management (Marken)
Share
Facebook Twitter LinkedIn Pinterest Email


At Netflix, to advertise and suggest the content material to customers in the absolute best method there are lots of Media Algorithm groups which work hand in hand with content material creators and editors. A number of of those algorithms purpose to enhance completely different handbook workflows in order that we present the customized promotional picture, trailer or the present to the consumer.

These media centered machine studying algorithms in addition to different groups generate numerous knowledge from the media recordsdata, which we described in our earlier weblog, are saved as annotations in Marken. We designed a singular idea referred to as Annotation Operations which permits groups to create knowledge pipelines and simply write annotations with out worrying about entry patterns of their knowledge from completely different functions.

Annotation Operations

Lets choose an instance use case of figuring out objects (like bushes, automobiles and so forth.) in a video file. As described within the above image

  • Throughout the first run of the algorithm it recognized 500 objects in a specific Video file. These 500 objects had been saved as annotations of a selected schema kind, let’s say Objects, in Marken.
  • The Algorithm group improved their algorithm. Now after we re-ran the algorithm on the identical video file it created 600 annotations of schema kind Objects and saved them in our service.

Discover that we can’t replace the annotations from earlier runs as a result of we don’t know what number of annotations a brand new algorithm run will end result into. It’s also very costly for us to maintain observe of which annotation must be up to date.

The objective is that when the patron comes and searches for annotations of kind Objects for the given video file then the next ought to occur.

  • Earlier than Algo run 1, in the event that they search they need to not discover something.
  • After the completion of Algo run 1, the question ought to discover the primary set of 500 annotations.
  • Throughout the time when Algo run 2 was creating the set of 600 annotations, shoppers search ought to nonetheless return the older 500 annotations.
  • When the entire 600 annotations are efficiently created, they need to substitute the older set of 500.
  • So now when shoppers search annotations for Objects then they need to get 600 annotations.

Does this remind you of one thing? This appears very related (not precisely identical) to a distributed transaction.

Usually, an algorithm run can have 2k-5k annotations. There are lots of naive options potential for this downside for instance:

  • Write completely different runs in several databases. That is clearly very costly.
  • Write algo runs into recordsdata. However we can’t search or current low latency retrievals from recordsdata
  • And so on.

As a substitute our problem was to implement this characteristic on high of Cassandra and ElasticSearch databases as a result of that’s what Marken makes use of. The answer which we current on this weblog will not be restricted to annotations and can be utilized for every other area which makes use of ES and Cassandra as nicely.

Marken’s structure diagram is as follows. We refer the reader to our earlier weblog article for particulars. We use Cassandra as a supply of fact the place we retailer the annotations whereas we index annotations in ElasticSearch to supply wealthy search functionalities.

Marken Structure

Our objective was to assist groups at Netflix to create knowledge pipelines with out desirous about how that knowledge is obtainable to the readers or the shopper groups. Equally, shopper groups don’t have to fret about when or how the information is written. That is what we name decoupling producer flows from shoppers of the information.

Lifecycle of a film goes via numerous artistic levels. Now we have many short-term recordsdata that are delivered earlier than we get to the ultimate file of the film. Equally, a film has many various languages and every of these languages can have completely different recordsdata delivered. Groups typically need to run algorithms and create annotations utilizing all these media recordsdata.

Since algorithms may be run on a distinct permutations of how the media recordsdata are created and delivered we are able to simplify an algorithm run as follows

  • Annotation Schema Sort — identifies the schema for the annotation generated by the Algorithm.
  • Annotation Schema Model — identifies the schema model of the annotation generated by the Algorithm.
  • PivotId — a singular string identifier which identifies the file or technique which is used to generate the annotations. This could possibly be the SHA hash of the file or just the film Identifier quantity.

Given above we are able to describe the information mannequin for an annotation operation as follows.

{
"annotationOperationKeys": [
{
"annotationType": "string", ❶
"annotationTypeVersion": “integer”,
"pivotId": "string",
"operationNumber": “integer” ❷
}
],
"id": "UUID",
"operationStatus": "STARTED", ❸
"isActive": true ❹
}
  1. We already defined AnnotationType, AnnotationTypeVersion and PivotId above.
  2. OperationNumber is an auto incremented quantity for every new operation.
  3. OperationStatus — An operation goes via three phases, Began, Completed and Canceled.
  4. IsActive — Whether or not an operation and its related annotations are energetic and searchable.

As you possibly can see from the information mannequin that the producer of an annotation has to decide on an AnnotationOperationKey which lets them outline how they need UPSERT annotations in an AnnotationOperation. Inside, AnnotationOperationKey the essential discipline is pivotId and the way it’s generated.

Our supply of fact for all objects in Marken in Cassandra. To retailer Annotation Operations we’ve got the next important tables.

  • AnnotationOperationById — It shops the AnnotationOperations
  • AnnotationIdByAnnotationOperationId — it shops the Ids of all annotations in an operation.

Since Cassandra is NoSql, we’ve got extra tables which assist us create reverse indices and run admin jobs in order that we are able to scan all annotation operations every time there’s a want.

Every annotation in Marken can also be listed in ElasticSearch for powering numerous searches. To document the connection between annotation and operation we additionally index two fields

  • annotationOperationId — The ID of the operation to which this annotation belongs
  • isAnnotationOperationActive — Whether or not the operation is in an ACTIVE state.

We offer three APIs to our customers. In following sections we describe the APIs and the state administration carried out inside the APIs.

StartAnnotationOperation

When this API is named we retailer the operation with its OperationKey (tuple of annotationType, annotationType Model and pivotId) in our database. This new operation is marked to be in STARTED state. We retailer all OperationIDs that are in STARTED state in a distributed cache (EVCache) for quick entry throughout searches.

StartAnnotationOperation

UpsertAnnotationsInOperation

Customers name this API to upsert the annotations in an Operation. They go annotations together with the OperationID. We retailer the annotations and in addition document the connection between the annotation IDs and the Operation ID in Cassandra. Throughout this section operations are in isAnnotationOperationActive = ACTIVE and operationStatus = STARTED state.

Observe that usually in a single operation run there may be 2K to 5k annotations which may be created. Purchasers can name this API from many various machines or threads for quick upserts.

UpsertAnnotationsInOperation

FinishAnnotationOperation

As soon as the annotations have been created in an operation shoppers name FinishAnnotationOperation which modifications following

  • Marks the present operation (let’s say with ID2) to be operationStatus = FINISHED and isAnnotationOperationActive=ACTIVE.
  • We take away the ID2 from the Memcache since it isn’t in STARTED state.
  • Any earlier operation (let’s say with ID1) which was ACTIVE is now marked isAnnotationOperationActive=FALSE in Cassandra.
  • Lastly, we name updateByQuery API in ElasticSearch. This API finds all Elasticsearch paperwork with ID1 and marks isAnnotationOperationActive=FALSE.
FinishAnnotationOperation

Search API

That is the important thing half for our readers. When a shopper calls our search API we should exclude

  • any annotations that are from isAnnotationOperationActive=FALSE operations or
  • for which Annotation operations are at present in STARTED state. We do this by excluding the next from all queries in our system.

To attain above

  1. We add a filter in our ES question to exclude isAnnotationOperationStatus is FALSE.
  2. We question EVCache to seek out out all operations that are in STARTED state. Then we exclude all these annotations with annotationId present in memcache. Utilizing memcache permits us to maintain latencies for our search low (most of our queries are lower than 100ms).

Cassandra is our supply of fact so if an error occurs we fail the shopper name. Nonetheless, as soon as we decide to Cassandra we should deal with Elasticsearch errors. In our expertise, all errors have occurred when the Elasticsearch database is having some difficulty. Within the above case, we created a retry logic for updateByQuery calls to ElasticSearch. If the decision fails we push a message to SQS so we are able to retry in an automatic trend after some interval.

In close to time period, we need to write a excessive stage abstraction single API which may be referred to as by our shoppers as a substitute of calling three APIs. For instance, they’ll retailer the annotations in a blob storage like S3 and provides us a hyperlink to the file as a part of the one API.



Source link

Data ingestion management Marken Operation pipeline
Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous ArticleHARD Summer announces Skrillex b2b Four Tet, Kaskade b2b John Summit, Ludacris, Kid Cudi, & more
Next Article In ‘A House Made Of Splinters,’ Kids In Ukraine Cope With War, Neglect – Deadline
Team Entertainer
  • Website

Related Posts

The Situations Personal Online Reputation Management Is Quietly Solving Right Now

March 3, 2026

Optimizing Recommendation Systems with JDK’s Vector API | by Netflix Technology Blog | Mar, 2026

March 3, 2026

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs | by Netflix Technology Blog

February 28, 2026

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix | by Netflix Technology Blog | Feb, 2026

February 23, 2026
Recent Posts
  • The Top 5 Clinics to Get Mounjaro in Abu Dhabi
  • Nicola Peltz Beckham breaks silence following Brooklyn’s cryptic birthday message from parents
  • Sarah Ferguson Essentially Homeless Amid Epstein Scandal – Friends & Even Her Daughters Are Shutting Her Out!
  • Tuesday TV Ratings: RJ Decker, Best Medicine, NCIS, NBA Basketball, WWE NXT – canceled + renewed TV shows, ratings

Archives

  • March 2026
  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021

Categories

  • Actress
  • Awards
  • Behind the Camera
  • BollyBuzz
  • Celebrity
  • Edit Picks
  • Glam & Style
  • Global Bollywood
  • In the Frame
  • Insta Inspector
  • Interviews
  • Movies
  • Music
  • News
  • News & Gossip
  • News & Gossips
  • OTT
  • Podcast
  • Power & Purpose
  • Press Release
  • Spotlight Stories
  • Spotted!
  • Star Luxe
  • Television
  • Trending
  • Uncategorized
  • Web Series
NAVIGATION
  • About us
  • Advertise with us
  • Submit Articles
  • Privacy Policy
  • Contact us
  • About us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us
Copyright © 2026 Entertainer.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?