avatarKBryan

Summary

The article discusses design considerations for scaling WebSocket servers horizontally using a publish-subscribe pattern to address challenges such as message loss and duplicate processing in a microservice architecture.

Abstract

In a follow-up to his previous work, the author delves into the complexities of scaling WebSocket servers horizontally within a microservice framework. The article emphasizes the need for horizontal scaling due to increased server load with a growing user base, and the role of a load balancer in distributing traffic across multiple microservice instances. It outlines two primary issues: message loss due to load balancer redirection and duplicate message processing by multiple backend subscribers in a pub/sub system. The author proposes solutions such as broadcasting messages via a publish-subscribe channel to prevent message loss and implementing consumer groups to ensure single processing of messages, potentially using technologies like Redis Streams, Google Pub/Sub, RabbitMQ, or Apache Kafka. The article concludes with a summary of the design considerations and the importance of the publish-subscribe pattern in achieving efficient real-time communication without message loss or duplication.

Opinions

  • The author, Bryan Kok, advocates for the use of a publish-subscribe messaging pattern as a solution to the challenges faced when scaling WebSocket servers horizontally.
  • Bryan Kok acknowledges the inspiration from Amr Saleh's work on building scalable notification systems using server-sent events and Redis.
  • The article suggests that the choice of technology for implementing the publish-subscribe pattern with consumer groups (such as Redis Streams, Google Pub/Sub, RabbitMQ, or Apache Kafka) should be tailored to the specific needs of the implementation, as the article does not endorse a particular solution.
  • Bryan Kok encourages readers to stay informed by following his work for more insights on scaling WebSocket servers and suggests considering cost-effective AI services like ZAI.chat for similar performance to ChatGPT Plus (GPT-4) at a lower cost.

Design Considerations for Scaling WebSocket Server Horizontally With a Publish-Subscribe Pattern

Understanding the challenges in scaling WebSocket servers

Photo by Kelly Sikkema on Unsplash

In my previous article, I wrote about designing and building a WebSocket server in a microservice architecture. Although the implementation works fine for a single instance of a WebSocket server, we will start facing issues when we try to scale up the number of WebSocket server instances (aka horizontal scaling). This article looks into the design considerations for scaling the WebSocket server using a publish-subscribe messaging pattern.

My Websocket Server Series

What is Horizontal Scaling?

First, let’s try to understand why we need horizontal scaling. As our user base grows, the load on the server grows. And when the load grows, a single server will not be able to provide high performance for all the users. Hence, it is necessary to provide the capability to increase/decrease the number of servers whenever necessary to meet the user’s demand as well as to save resources as part of our design considerations.

Horizontal scaling refers to adding more machines to your infrastructure to cope with the high demand on the server. In our microservice context, scaling horizontally is the same as deploying more instances of the microservice. A load balancer will then be required to distribute the traffic among the multiple microservice instances, as shown below:

Example of horizontal scaling with load balancer

With this, I hope you better understand why we need horizontal scaling in our infrastructure. So let’s move on to learn the design considerations for scaling WebSocket servers in a microservice architecture.

Quick Recap

High-level diagram of a WebSocket server in a microservice architecture

Previously, we implemented the WebSocket server using Spring Boot, Stomp, and Redis Pub/Sub. Communication between the web application (frontend) and WebSocket server is via WebSocket, while communication between the microservices (backend) and WebSocket server is via API and publish-subscribe messaging pattern. For more information, refer to the previous article.

What Are the Issues and Solutions?

The previous design works perfectly fine in a setup where we only have a single instance of each microservices. However, having a single instance is not practical in a production environment. Typically, we will deploy our microservices with multiple replicas (or instances) for high availability in a production environment. Therefore, when we try to horizontally scale the number of WebSocket servers (microservice) or backend microservices, we will notice the following problems.

Issue #1: Message loss due to the load balancer

In our previous article, we added APIs for backend microservices to send messages to the WebSocket server for unidirectional real-time communication. As shown below, a load balancer helps to handle traffic redirection when scaling the number of WebSocket servers.

Issue when sending messages from microservices (backend) to the web application (frontend) via API

In the above setup, an instance of the web application (frontend) establishes a WebSocket connection to the WebSocket server (instance B). When the backend server tries to send messages to the web application, the load balancer redirects the API request to the WebSocket server (instance A). Since WebSocket server (instance A) does not have a WebSocket connection to that particular instance of the web application, the message will be lost.

Solution for Issue #1: Broadcast messages using Pub/Sub

Solution for message loss due to the load balancer

Note: This solution is greatly inspired by Amr Saleh, who wrote about Building Scalable Facebook-like Notification using Server-Sent Events and Redis. Do check that out!

To resolve the first issue, we can introduce a broadcast channel using the publish-subscribe messaging pattern where all messages received from the backend microservices will be broadcasted to all WebSocket server instances as shown in the diagram above. This ensures that all web application instances (frontend) will receive that message via WebSocket from the WebSocket server.

Issue #2: Duplicate message processing due to multiple backend subscribers to a single topic

In our previous article, we used Redis Pub/Sub to handle bidirectional real-time communication between the WebSocket server (microservice) and backend microservices. When we scale up the number of WebSocket servers and backend microservices, you will notice that all subscribers to Redis Pub/Sub will receive the messages as shown below.

Bidirectional real-time communication between web application (frontend) and microservices (backend)

Let’s look at the message flow in each direction in bidirectional real-time communication.

  • Message Flow: Microservices to web application (no duplicated processing) → It is necessary for all instances of the WebSocket server to receive the messages as each web browser establishes a WebSocket connection with only a single WebSocket server instance. Hence, when messages flow from the backend microservices to the web application (backend → WebSocket server → frontend), only one instance of the web application will receive the message, which is the correct behavior.
  • Message Flow: Web application to microservices (duplicated processing) → When messages are flowing from the web application to the backend microservices (frontend → WebSocket server → backend), we would expect only one instance of the backend microservices to process the message. However, all backend microservices (as subscribers) will receive the message, resulting in the message being processed multiple times, which is incorrect behavior.

Solution for Issue #2: Pub/Sub with consumer groups

Solution for duplicate message processing due to multiple backend subscribers to a single topic

To resolve the second issue, we will make use of the concept of Consumer Groups (introduced by Kafka), where only one subscriber receives the message for processing. This ensures that there will be no duplicated message processing as only one backend microservice instance will receive the message.

As Redis Pub/Sub in my previous article implementation does not support the consumer group concept, we can either use Redis Streams, Google Pub/Sub, RabbitMQ, or Apache Kafka to implement the publish-subscribe messaging pattern with consumer groups. I will not go into details on which is better for your implementation as this is not the intent of this article.

Summary

Full Design for scaling WebSocket servers in a microservice architecture using publish-subscribe pattern

To wrap things up, we have run through the design considerations on how to scale the WebSocket server in a microservice architecture horizontally. Essentially, we are using publish-subscribe messaging patterns to ensure that there is no message loss or duplicated message processing in the process of real-time communication between the web application (frontend) and microservices (backend).

That’s it! I hope you learned something new from this article. This article only covers the design considerations for scaling the WebSocket server. Stay tuned for the next one, where I will elaborate more on how you can implement this design using Redis Pub/Sub, Redis Streams, and Spring Boot.

If you like this article, please follow me for more :).

Thank you for reading until the end. Happy learning!

Programming
Spring Boot
Web Development
Software Engineering
Software Architecture
Recommended from ReadMedium