avatarNetflix Technology Blog

Summary

Hystrix is a library designed to control the interactions between distributed services, providing greater tolerance of latency and failure in a distributed environment.

Abstract

Hystrix is a library developed by Netflix to improve the resilience and fault tolerance of distributed systems. It was created in response to the inevitable failure of services in a distributed environment and has been adopted across many teams within Netflix. Hystrix isolates points of access between services, stops cascading failures, and provides fallback options. It has led to a dramatic improvement in uptime and resiliency at Netflix. The library is available on GitHub and has extensive documentation, including examples of how it is used in a distributed system.

Bullet points

  • Hystrix is a library designed to control the interactions between distributed services.
  • It provides greater tolerance of latency and failure in a distributed environment.
  • Hystrix isolates points of access between services, stops cascading failures, and provides fallback options.
  • The library evolved out of resilience engineering work done by the Netflix API team in 2011.
  • Hystrix has been adopted across many teams within Netflix and has led to a dramatic improvement in uptime and resiliency.
  • The library is available on GitHub and has extensive documentation.
  • The documentation includes examples of how Hystrix is used in a distributed system.
  • Netflix is planning to release a real-time dashboard for monitoring Hystrix.
  • Netflix is hiring and interested in working on great open source software.

Introducing Hystrix for Resilience Engineering

by Ben Christensen

In a distributed environment, failure of any given service is inevitable. Hystrix is a library designed to control the interactions between these distributed services providing greater tolerance of latency and failure. Hystrix does this by isolating points of access between the services, stopping cascading failures across them, and providing fallback options, all of which improve the system’s overall resiliency.

Hystrix evolved out of resilience engineering work that the Netflix API team began in 2011. Over the course of 2012, Hystrix continued to evolve and mature, eventually leading to adoption across many teams within Netflix. Today tens of billions of thread-isolated and hundreds of billions of semaphore-isolated calls are executed via Hystrix every day at Netflix and a dramatic improvement in uptime and resilience has been achieved through its use. The following links provide more context around Hystrix and the challenges that it attempts to address:

Getting Started

Hystrix is available on GitHub at http://github.com/Netflix/Hystrix

Full documentation is available at http://github.com/Netflix/Hystrix/wiki including Getting Started, How To Use, How It Works and Operations examples of how it is used in a distributed system.

You can get and build the code as follows:

$ git clone git://github.com/Netflix/Hystrix.git
$ cd Hystrix/
$ ./gradlew build

Coming Soon

In the near future we will also be releasing the real-time dashboard for monitoring Hystrix as we do at Netflix:

We hope you find Hystrix to be a useful library. We’d appreciate any and all feedback on it and look forward to fork/pulls and other forms of contribution as we work on its roadmap.

Are you interested in working on great open source software? Netflix is hiring! http://jobs.netflix.com

Originally published at techblog.netflix.com on November 26, 2012.

API
Fault Tolerance
Hystrix
Recommended from ReadMedium