avatarNetflix Technology Blog

Summary

Netflix has introduced Ribbon, an open-source client-side load balancer, to enhance the resilience and performance of its mid-tier services within a multi-region, multi-zone cloud deployment architecture.

Abstract

Netflix has announced the release of Ribbon, a new tool designed to improve the reliability and efficiency of inter-service communication within their cloud infrastructure. Ribbon provides client-side software load balancing algorithms that have been proven in Netflix's operational environment. It is part of the Netflix Internal Web Service Framework (NIWS) and works in conjunction with Eureka for service discovery. Ribbon supports various data serialization formats and offers real-time load balancing decisions, including a Zone Aware Load Balancer that can dynamically adjust to zone outages. This tool is a critical component in Netflix's multi-region, multi-zone deployment strategy, which aims to ensure high availability and resilience. Ribbon is now available on GitHub, alongside other Netflix open-source projects, and the company plans to release additional components of NIWS in the future.

Opinions

  • Netflix emphasizes the importance of Ribbon in their operational learnings and experience, particularly in the Amazon Cloud environment.
  • The authors, Allen Wang and Sudhir Tonse, highlight Ribbon's battle-tested features, such as its Zone Aware Load Balancer, which includes circuit tripping logic and can be configured for zone affinity.
  • Netflix continues to use Amazon's ELB for public-facing Edge Services while employing Ribbon for internal mid-tier requests, indicating a strategic choice based on service type and location within their architecture.
  • The article conveys enthusiasm for community contributions to their open-source libraries and invites interested parties to engage with their projects and consider career opportunities at Netflix.

Announcing Ribbon: Tying the Netflix Mid-Tier Services Together

by Allen Wang and Sudhir Tonse

Netflix embraces a fine-grained Service Oriented Architecture as the underpinning of its Cloud based deployment model. Currently, we run hundreds of such fine grained services that are collectively responsible in handling the customer facing requests via a few “Edge Services” such as the Netflix API Service. A lightweight REST based protocol is the choice for inter process communication amongst these services.

The Netflix Internal Web Service Framework (aka NIWS) forms the bedrock of this architecture in the realm of communication. Together with the previously announced Eureka, which aids in service discovery, NIWS provides all the components pertinent for making REST calls.

NIWS is comprised of a REST client and server framework, based on JSR-311 which is a RESTful API specification for Java. Our services use various payload data serialization formats such as Avro, XML, JSON, Thrift and Google Protocol Buffers. NIWS provides the mechanism for serializing and deserializing the data.

Today, we are happy to announce Ribbon as the latest offering of our hugely popular and growing Open Source libraries hosted on GitHub.

Ribbon, as a first cut, mainly offers the client side software load balancing algorithms which have been battle tested at Netflix along with a few of the other components that form our Inter Process Communication stack (aka NIWS). We plan to continue open sourcing the rest the of the NIWS stack in the coming months. Please note that the loadbalancers mentioned are the internal client-side loadbalancers used alongside Eureka that are primarily used for load balancing requests to our mid-tier services. For our public facing Edge Services we continue to use Amazon’s ELB Service.

Deployment Topology

Typical (representative) deployment architecture at Netflix.

A typical deployment architecture at Netflix is a multi-region, multi-zone deployment which aids in better availability and resiliency.

The Amazon ELB provides load balancing for customer/device facing requests while internal mid-tier requests are handled via the Ribbon framework.

Eureka provides the service registry for all Netflix services. Ribbon clients are typically created and configured for each of the target services. Ribbon’s Client component offers a good set of configuration options such as connection timeouts, retries, retry algorithm (exponential, bounded backoff) etc.

Ribbon comes built in with a pluggable and customizable LoadBalancing component. Some of the load balancing strategies offered are listed below, with more to follow.

  • Simple Round Robin LB
  • Weighted Response Time LB
  • Zone Aware Round Robin LB
  • Random LB

Battle Tested Features

The main benefit of Ribbon is that it offers a simple inter process communication mechanism with features built based on our operational learnings and experience in the Amazon Cloud. Ribbon’s Zone Aware Load Balancer for example, is built with circuit tripping logic and can be configured to favor the target service instance based on Zone Affinity (i.e it favors the zone in which the calling service itself is hosted, thus benefiting from reduced latency and savings in cost). It monitors the operational behavior of the instances running in each zone and has the capability to quickly (at real time) drop an entire Zone out of rotation. This helps us be resilient in the face of Zone outages as described in a prior blog post.

Zone Aware Load Balancer

The picture above shows the Zone Aware LoadBalancer in action. The LoadBalancer will do the following when picking a server:

  1. The LoadBalancer will calculate and examine zone stats of all available zones. If the active requests per server for any zone has reached a configured threshold, this zone will be dropped from the active server list. In case more than one zone has reached the threshold, the zone with the most active requests per server will be dropped.
  2. Once the the worst zone is dropped, a zone will be chosen among the rest with the probability proportional to its number of instances.
  3. A server will be returned from the chosen zone with a given Rule (A Rule is a loadbalacing strategy, for example a simple Round Robin Rule)

For each request, the steps above will be repeated. That is to say, each zone related load balancing decisions are made at real time with the up-to-date statistics aiding the choice.

Some of the features of Ribbon are listed below with more to follow.

  • Easy integration with a Service Discovery component such as Netflix’s Eureka
  • Runtime configuration using Archaius
  • Operational Metrics exposed via JMX and published via Servo
  • Multiple and pluggable serialization choices (via JSR-311, Jersey)
  • Asynchronous and Batch operations (coming up)
  • Automatic SLA framework (coming up)
  • Administration/Metrics console (coming up)

Please visit the wiki pages at GitHub for detailed documentation on Ribbon.

At Netflix, we typically wrap our REST calls made via Ribbon using Hystrix which provides Latency and Fault Tolerance in Distributed Systems.

If you would like to contribute to our highly scalable libraries and frameworks for ephemeral distributed environments, please take a look at http://netflix.github.com. You can follow us on twitter at @NetflixOSS.

We will be hosting a NetflixOSS Open House on the 6th of February, 2013 (Limited seats. RSVP needed).

We are constantly looking for great talent to join us and we welcome you to take a look at our Jobs page or contact @stonse for positions in the Cloud Platform Infrastructure team.

Resources

  1. Netflix Open Source Dashboard
  2. Ribbon
  3. Eureka (Service Discovery and Metadata)
  4. Archaius (Dynamic configurations)
  5. Hystrix (Latency and Fault Tolerance)

See Also:

Originally published at techblog.netflix.com on January 28, 2013.

Cloud Architecture
Distributed
Load Balancing
Remote Procedure Calls
Rest
Recommended from ReadMedium