Netflix's tech stack is a complex and robust infrastructure that ensures a seamless and high-quality viewing experience, featuring a blend of tried-and-true technologies and cutting-edge innovations.
Abstract
Netflix's tech stack is a marvel of technological innovation, consisting of a blend of proven technologies and cutting-edge innovations. The streaming giant uses Kotlin and Swift for mobile development, React, JavaScript, and HTML5 for the frontend, and employs various technologies such as Open Connect, AWS CloudFront, AWS S3, and AWS Elastic Transcoder for streaming and transcoding. GraphQL is used for data fetching, while Spring Boot, Zuul, and Netflix Eureka power the backend. Databases used include MySQL, Cassandra, CockroachDB, and EVCache. Messaging and streaming are handled by Kafka and Flink, while big data is managed using AWS S3, Redshift, Apache Iceberg, and Druid. Netflix's DevOps practices are facilitated by tools such as JIRA, Confluence, Jenkins, Spinnaker, Netflix Atlas, Netflix Chaos Monkey, Gradle, and Nebula.
Bullet points
Netflix's tech stack is a blend of tried-and-true technologies and cutting-edge innovations.
Kotlin and Swift are used for mobile development.
React, JavaScript, and HTML5 are used for the frontend.
Open Connect, AWS CloudFront, AWS S3, and AWS Elastic Transcoder are used for streaming and transcoding.
GraphQL is used for data fetching.
Spring Boot, Zuul, and Netflix Eureka power the backend.
Databases used include MySQL, Cassandra, CockroachDB, and EVCache.
Kafka and Flink are used for messaging and streaming.
AWS S3, Redshift, Apache Iceberg, and Druid are used for big data management.
DevOps practices are facilitated by tools such as JIRA, Confluence, Jenkins, Spinnaker, Netflix Atlas, Netflix Chaos Monkey, Gradle, and Nebula.
Decoding Netflix: An In-Depth Look at the Tech Stack Powering the Streaming Giant
In the world of streaming media, Netflix stands as a titan, boasting over 232.5 million subscribers worldwide. Behind the scenes of your favorite movies and TV shows lies a complex and robust technology infrastructure that ensures a seamless and high-quality viewing experience. This infrastructure results from years of technological innovation and evolution, making Netflix a leader in entertainment and technology.
Netflix’s tech stack is a testament to its commitment to delivering the best possible service to its users. It is a fascinating blend of tried-and-true technologies and cutting-edge innovations, all working together to support the massive scale of its operations. From handling billions of requests per day and personalizing content for millions of unique users to ensuring the highest quality streaming experience, Netflix’s tech stack is designed to handle it all.
In this blog post, we will delve into the technological marvel of Netflix. We’ll explore the critical components of its tech stack, understand why certain technologies were chosen, and see how they all come together to create the Netflix we know and love. Whether you’re a tech enthusiast curious about the inner workings of Netflix, a software engineer looking for architectural inspiration, or simply a Netflix user interested in the technology behind your daily streaming, this post is for you.
Regarding mobile development, Netflix has made significant strides in adopting modern, robust, and developer-friendly languages: Kotlin for Android and Swift for iOS. These languages enhance developer productivity and play a crucial role in delivering a smooth and efficient user experience.
Kotlin: The Powerhouse of Android Development
Netflix’s Android app is primarily built with Kotlin, a statically typed programming language developed by JetBrains. Kotlin is fully interoperable with Java, which means developers can use all existing Android libraries and frameworks in a Kotlin application and migrate Java code to Kotlin incrementally.
Kotlin’s support for coroutines simplifies asynchronous programming. This is particularly beneficial for a data-intensive application like Netflix, where numerous operations, such as network requests or database transactions, must be performed in the background.
In an innovative move, Netflix has also embraced Kotlin Multiplatform (KMP) for sharing code across Android and iOS. This approach allows developers to write the business logic once in Kotlin and share it across platforms, leading to increased code consistency and reduced time-to-market.
Swift: Revolutionizing iOS Development
On the iOS front, Netflix uses Swift, a powerful and intuitive language developed by Apple. Swift offers several advantages over its predecessor, Objective-C, including a more modern and safer syntax, better performance, and improved error handling.
Netflix’s transition from UIWebView to UIKit was a significant milestone in its iOS journey. The need for better performance and user experience drove this move. In an interview, Jordanna Kwok from Netflix’s Playback UI team shared that switching to UIKit resulted in smoother animations, faster load times, and a more native feel to the app.
Moreover, Netflix has pioneered SwiftUI, Apple’s innovative framework for building user interfaces. SwiftUI allows developers to build UIs with less code and provides native support for new iOS features, making it an excellent choice for future app-proofing.
Frontend Tech Stack: React, JavaScript, and HTML5
Netflix’s front end is where the magic happens for the end user. It’s where users browse through thousands of movies and TV shows, get personalized recommendations, and stream their favorite content. To deliver a seamless and interactive user experience, Netflix uses a tech stack that includes React, JavaScript, and HTML5.
React: Building Interactive UIs
React, a popular JavaScript library developed by Facebook, is vital to Netflix’s front end. React allows developers to build fast, responsive, and reusable UI components, making it an excellent choice for a dynamic and interactive application like Netflix.
One of the main reasons Netflix chose React is its virtual DOM (Document Object Model) feature. The virtual DOM allows React to make minimal updates to the actual DOM, resulting in a smoother and faster user experience. This is particularly beneficial for Netflix, where the UI must update frequently based on user interactions.
Streaming and Transcoding: Powering the Netflix Experience
At the heart of Netflix’s service is its ability to simultaneously deliver high-quality streaming content to millions of users. This is no small feat — it requires a robust and scalable infrastructure capable of handling massive amounts of data. To achieve this, Netflix leverages various technologies, including its content delivery network (CDN) called Open Connect, AWS CloudFront, AWS S3, and AWS Elastic Transcoder.
Open Connect: Netflix’s CDN
Open Connect is Netflix’s in-house CDN, designed to handle the delivery of its streaming content worldwide. It consists of Open Connect Appliances (OCAs) — servers that store and deliver Netflix’s content — located in data centers worldwide.
These OCAs are close to the ISP networks that Netflix’s customers use, ensuring that content can be delivered quickly and efficiently.
How Netflix Works With ISPs Around the Globe to Deliver a Great Viewing Experience
One of the critical innovations of Open Connect is its ability to pre-position content based on its popularity in a given region. This means that popular movies and TV shows are stored closer to the users who watch them, resulting in faster load times and less buffering.
AWS CloudFront and S3: Scalable Content Delivery
While Open Connect handles the bulk of Netflix’s content delivery, Netflix also uses AWS CloudFront, a global CDN service offered by Amazon Web Services. CloudFront delivers static and dynamic web content with AWS S3, a scalable storage service.
Netflix uses S3 to store and retrieve data anywhere on the web. It’s used for backup and recovery, a tiered archive, user-generated content, and cloud-native application data. On the other hand, CloudFront delivers the entire library of movies and TV shows to Netflix customers.
AWS Elastic Transcoder: High-Quality Video Transcoding
To ensure that its content can be streamed on various devices, Netflix uses AWS Elastic Transcoder. This cloud-based service converts (or “transcodes”) media files from their source format into versions that will playback on devices like smartphones, tablets, and smart TVs.
Elastic Transcoder is particularly useful because it provides a cost-effective means to simultaneously transcode media files into multiple output formats. This means that whether a user watches on an iPhone or a large 4K TV, they’ll get a version of the video optimized for their device.
GraphQL: Streamlining Data Fetching
In data fetching, Netflix has adopted GraphQL, a powerful query language for APIs, to streamline getting data from the server to the client. GraphQL provides a more efficient data fetching model than traditional REST APIs, making it an excellent choice for a data-intensive application like Netflix.
Why GraphQL?
Netflix chose GraphQL for several reasons. Firstly, GraphQL allows clients to specify exactly what data they need, which can significantly reduce the amount of data that needs to be transferred over the network. This mainly benefits mobile clients, where network conditions can be unpredictable.
Secondly, GraphQL supports “introspection,” allowing clients to discover the available data types and operations. This can make it easier for developers to understand what data they can request and how to request it.
Finally, GraphQL’s support for real-time data updates via subscriptions can be helpful for features requiring real-time updates, such as a “watching now” feature that shows what other users are watching.
How Netflix Uses GraphQL
Netflix uses GraphQL in several parts of its application. For example, it’s used in the Netflix Studio, a suite of applications that help Netflix manage the production of its original content. The Studio’s applications must fetch and update a wide variety of data, from scripts and budgets to shooting schedules and visual effects. Using GraphQL, these applications can fetch and update the data they need in a single request.
Evolution of an API Architecture
Netflix has also developed a GraphQL Federation Gateway, which allows it to combine multiple GraphQL services into a single API. This makes it easier for clients to fetch data from various services without making numerous requests.
Backend Tech Services: Spring Boot, Zuul, and Netflix Eureka
The backend is the powerhouse of any application, and Netflix is no exception. It’s responsible for everything from managing user data and processing requests to handling business logic and interacting with databases. To build a robust and scalable backend, Netflix uses various technologies, including Spring Boot, Zuul, and Netflix Eureka.
Spring Boot: Simplifying Spring Applications
At the heart of Netflix’s backend is Spring Boot, a framework that simplifies the setup and development of Spring applications. Spring Boot provides a range of features that make it easier to create stand-alone, production-grade Spring-based applications, including auto-configuration, an embedded server, and dependency management.
Netflix uses Spring Boot to create microservices, small, independent services that work together to form a complete application. This microservices architecture allows Netflix to scale its application quickly, as each service can be scaled independently based on its needs.
Zuul: Handling Dynamic Routing
To manage the routing of requests between its many microservices, Netflix uses Zuul, a dynamic routing library. Zuul is the front door for all requests coming into Netflix’s backend. It’s responsible for routing each request to the appropriate microservice, handling retries in case of failures, and providing security measures such as authentication and rate limiting.
Zuul’s dynamic routing capabilities benefit Netflix, as they allow it to route requests based on various factors, such as the user’s location, the type of device they’re using, and the current load on its servers.
Netflix Eureka: Service Discovery Made Easy
In a microservices architecture, services need to be able to discover and communicate with each other. To facilitate this, Netflix uses Eureka, a service discovery tool. Eureka allows services to register themselves and discover other services, making it easier for them to communicate with each other.
Eureka’s service discovery capabilities are crucial for ensuring the smooth operation of Netflix’s microservices architecture. By allowing services to discover each other, Eureka helps to ensure that requests can be efficiently routed to the appropriate service, even as services are added, removed, or moved.
Databases: MySQL, Cassandra, CockroachDB, and EVCache
Data is at the heart of Netflix’s operations. From storing user profiles and viewing history to cataloging its vast content library, Netflix relies on various databases to store and manage its data. These include MySQL, Cassandra, CockroachDB, and EVCache. Each database serves a specific purpose and is crucial in Netflix’s backend.
MySQL: Reliable Relational Database
MySQL is a popular open-source relational database that Netflix uses for various purposes. One of its key uses is in Netflix’s billing system, where it’s used to store transactional data. Netflix chose MySQL for its robustness, performance, and the strong community support it enjoys.
Netflix has also built a database migration system to move data from Oracle to MySQL with minimal downtime. This system, which uses GoldenGate for data replication, has enabled Netflix to migrate its critical billing data to the cloud.
Cassandra: Scalable NoSQL Database
Cassandra is a highly scalable and distributed NoSQL database that Netflix uses to store much of its customer data. Cassandra’s ability to scale horizontally and its support for flexible schema makes it an excellent choice for a service like Netflix, which has a large amount of customer data that needs to be available 24/7.
Netflix uses Cassandra for many purposes, including storing customer viewing history, bookmarks, and recommendations. It also uses Cassandra for its “fallback” system, which ensures that Netflix remains available even if other systems fail.
CockroachDB: Distributed SQL Database
Netflix uses CockroachDB, a distributed SQL database in its Device Management Platform. This platform manages the devices that Netflix’s customers use to stream content. CockroachDB’s horizontal scalability and strong consistency make it an ideal choice for this use case.
CockroachDB allows Netflix to ensure that its device management system remains reliable and responsive, even as the number of devices it needs to manage grows. It also provides a path for future growth, as it can quickly scale to handle even more significant amounts of data.
EVCache: Distributed In-Memory Datastore
EVCache is a distributed in-memory data store that Netflix uses for caching data. It’s based on Memcached and is designed to handle high volumes of read and write traffic with low latency.
Netflix uses EVCache for various purposes, including caching customer data, session data, and API responses. By caching this data, Netflix can reduce the load on its databases and provide faster customer response times.
Messaging and Streaming: Kafka and Flink
In a distributed system like Netflix, components need to communicate with each other efficiently and reliably. This is where messaging and streaming technologies come into play. Netflix uses Apache Kafka for messaging and Apache Flink for stream processing, which are crucial in its backend infrastructure.
Apache Kafka is a distributed streaming platform that Netflix uses for messaging and real-time data processing. Kafka provides a robust and durable messaging system that allows Netflix’s microservices to communicate with each other efficiently.
Kafka’s publish-subscribe model enables services to publish messages to topics other services can then subscribe to. This decouples the services from each other, allowing them to operate independently while still being able to communicate effectively.
Netflix uses Kafka for various purposes, including event tracking, metric collection, and log aggregation. Kafka’s ability to handle high volumes of real-time data makes it an excellent choice for these use cases.
Apache Flink: Powerful Stream Processing
Apache Flink is a framework for stateful computations over unbounded and bounded data streams. Netflix uses Flink for real-time stream processing, which involves analyzing and processing data as soon as it arrives.
Flink’s ability to handle large volumes of data in real-time makes it an excellent choice for Netflix, which needs to process vast amounts of data from its users and services in real time. For example, in real-time, Netflix uses Flink to process events from its users, such as play, pause, and stop events. This allows Netflix to provide real-time analytics and insights, which can be used to improve its service and provide better recommendations to its users.
Big Data at Netflix
In the world of streaming media, data is king. With its vast user base and diverse content library, Netflix generates and processes enormous amounts of data daily. This data is used for everything from personalizing content recommendations to optimizing streaming quality. To handle this “Big Data,” Netflix uses various technologies, which we’ll explore in this section.
Data Storage: AWS S3, Redshift, Apache Iceberg, and Druid
Storing and managing big data is a significant challenge, and Netflix uses several technologies to do this effectively. These include AWS S3 for object storage, Redshift for data warehousing, Apache Iceberg for large-scale data storage, and Druid for real-time analytics.
AWS S3: Scalable Object Storage
Amazon S3 (Simple Storage Service) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Netflix uses S3 as a data lake, storing raw data that can be processed and analyzed. This includes everything from user activity data to movie metadata.
S3’s scalability and durability make it an excellent choice for storing extensive data. Its integration with other AWS services makes it a central part of Netflix’s data infrastructure.
Redshift: Powerful Data Warehousing
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service. Netflix uses Redshift for storing and analyzing large datasets. It’s beneficial for running complex analytical queries across large datasets, crucial for Netflix’s data-driven decision-making process.
Netflix has built a data pipeline around Redshift, using it to ingest, store, and analyze data. This pipeline allows Netflix to gain insights from its data quickly and efficiently.
Apache Iceberg: Large-Scale Data Storage
Apache Iceberg is an open table format for extensive, slow-moving tabular data. It’s designed to improve the limitations of older formats like Avro and Parquet. Netflix uses Iceberg to manage its large-scale data storage more efficiently.
One of the critical benefits of Iceberg is its support for fine-grained partitioning, which can significantly improve data access efficiency. It also provides atomic commits, which ensures data integrity even in the event of failures.
Apache Druid: Real-Time Analytics
Apache Druid is a high-performance, column-oriented, distributed data store that Netflix uses for real-time analytics. Druid’s ability to ingest, store, query, and analyze large amounts of event data in real-time makes it an excellent choice for Netflix’s analytical needs.
Netflix uses Druid to power its real-time monitoring and alerting system, which helps it to maintain a high-quality streaming experience for its users. It also uses Druid to provide real-time insights into its business metrics, allowing it to make data-driven decisions quickly.
“Apache Druid is a high-performance real-time analytics database. It’s designed for workflows where fast queries and ingest matter. Druid excels at instant data visibility, ad-hoc queries, operational analytics, and handling high concurrency.” — druid.io
Data Processing
Processing big data is a complex task that involves cleaning, transforming, and analyzing data to extract valuable insights. Netflix uses several technologies for data processing, including Tableau for data visualization and Apache Flink and Apache Spark for data processing.
Tableau: Interactive Data Visualization
Tableau is a powerful data visualization tool that allows users to interact with their data visually and intuitively. Netflix uses Tableau to create dashboards and reports that provide insights into operations and user behavior.
Netflix built its analytics in the cloud with Tableau and AWS: Source
Tableau’s ability to connect to various data sources, powerful data visualization capabilities, and user-friendly interface make it an excellent tool for data analysis. Netflix’s data scientists and business analysts use Tableau to explore data, answer questions, and share insights with the rest of the company.
Apache Flink: Stream Processing
As mentioned earlier, Apache Flink is a framework for stateful computations over unbounded and bounded data streams. In the context of data processing, Netflix uses Flink for real-time stream processing, which involves analyzing and processing data as soon as it arrives.
Flink’s ability to handle large volumes of data in real-time makes it an excellent choice for Netflix, which needs to process vast amounts of data from its users and services in real time. This allows Netflix to provide real-time analytics and insights, which can be used to improve its service and provide better recommendations to its users.
Apache Spark: Large-Scale Data Processing
Apache Spark is a unified analytics engine for large-scale data processing. Netflix uses Spark for various data processing tasks, including ETL (Extract, Transform, Load) operations, data analysis, and machine learning.
Spark’s ability to process large datasets in parallel, support for a wide range of data sources, and powerful analytics capabilities make it a crucial part of Netflix’s data infrastructure. Netflix has also contributed to the Spark community by developing and open-sourcing Polynote. This notebook environment supports mixed-language programming in Scala and Python, commonly used for data processing and analysis in Spark.
DevOps at Netflix: Streamlining Development and Operations
DevOps, a combination of “development” and “operations,” is a set of practices that shorten the system development life cycle and provide continuous delivery with high software quality. At Netflix, DevOps is integral to the company’s culture and operations. The company uses various tools to facilitate DevOps practices, including JIRA, Confluence, Jenkins, Spinnaker, Netflix Atlas, Netflix Chaos Monkey, Gradle, and Nebula.
JIRA and Confluence: Project Management and Collaboration
JIRA is a project management tool that Netflix’s development teams use to plan, track, and manage their work. It allows teams to create user stories, plan sprints, and distribute tasks across their team.
Conversely, Confluence is a collaboration tool where teams can create, share, and collaborate on documents. Netflix uses Confluence to document everything from meeting notes and project plans to product requirements and technical documentation.
Jenkins, Spinnaker: Continuous Integration and Delivery
Jenkins is an open-source automation server that enables developers to build, test, and deploy their software. Netflix uses Jenkins for continuous integration, a DevOps practice where developers regularly merge their code changes into a central repository, after which automated builds and tests are run.
Spinnaker, a continuous delivery platform developed by Netflix, manages deployments. With Spinnaker, Netflix can automate releasing software changes to production, ensuring that new features and fixes reach users as quickly as possible.
Netflix Atlas, Chaos Monkey: Monitoring and Resilience
Netflix Atlas is a scalable and robust monitoring platform that Netflix uses to keep an eye on its operations. Atlas collects metrics from all areas of Netflix’s infrastructure and provides real-time operational insight into its systems’ performance.
Chaos Monkey is a resilience tool developed by Netflix. It randomly terminates instances in production to ensure that engineers implement their services to be resilient to instance failures.
Gradle and Nebula: Build Automation
Gradle is a powerful build automation tool that Netflix uses to automate the process of building, testing, and deploying software. It allows script tasks in Java, making it a flexible and extensible tool for building automation.
Nebula is a collection of Gradle plugins built by Netflix to support its software delivery process. These plugins provide solutions for dependency management, release management, and other aspects of build automation.
Acknowledgments
Before we conclude, I would like to thank those who made this exploration of Netflix’s tech stack possible.
Firstly, a big thank you to Alex Xu, the founder of ByteByteGo. Alex first consolidated information about different tech stacks into one place, providing a valuable resource for anyone interested in the technologies powering today’s leading companies. His research was instrumental in creating this blog post, and I am grateful for his willingness to share his findings with me.
Secondly, I would like to express my appreciation to Netflix. The company’s commitment to sharing its tech stack across various platforms has provided invaluable insights into the complexity behind the “tudum” sound we all know and love. By openly discussing the technologies they use and the reasons behind their choices, Netflix allows us to learn and contributes to the broader tech community.
In conclusion, understanding the tech stack of Netflix is a journey through a landscape of cutting-edge technologies and innovative solutions. It’s a testament to the company’s commitment to delivering a seamless and enjoyable streaming experience to millions of users worldwide. As Netflix continues to grow and innovate, I look forward to seeing how its tech stack will evolve.