avatarAnkur Bansal

Summary

The website content outlines the design considerations for a news feed system, drawing inspiration from Facebook's EdgeRank algorithm.

Abstract

The provided content discusses the essential steps in designing a news feed system, emphasizing the importance of understanding the requirements through initial questions to the interviewer. It highlights the need for simplicity in the design process, suggesting that a complex system like Facebook's news feed cannot be fully replicated in a short time frame. The text explains the EdgeRank algorithm, which ranks content based on user interactions, affinity, weight, and time decay. It also covers the storage of activities using a schema that includes user ID, data, edge rank, activity type, source ID, and created date. The document describes two models for publishing feeds: the push model and the pull model, each with its constraints and failure scenarios. The choice between denormalized and normalized data storage is discussed, considering the trade-offs in memory usage and data retrieval overhead. The use of Redis and Cassandra is recommended based on their read and write optimization capabilities, and a hybrid approach to feed distribution is suggested.

Opinions

  • The interviewer does not expect a candidate to design a complex feed system like Facebook's in a short time, indicating a preference for a simplified approach.
  • The EdgeRank algorithm is presented as a key method for ranking news feed content, with the opinion that it effectively prioritizes content based on user engagement and recency.
  • The document suggests that the push model may not scale well for users with a large audience, proposing priority-based fan-out tasks as a solution.
  • The pull model is criticized for its potential catastrophic failure scenario, where a user's feed might fail to generate, but it is appreciated for its simplicity and immediacy.
  • The choice between denormalized and normalized data is seen as dependent on the backend technology and the specific needs of the system being designed.
  • Redis is recommended for read-heavy operations due to its in-memory storage, while Cassandra is suggested for write-heavy scenarios due to its large storage capacity, despite being harder to use with normalized data.
  • A hybrid approach to feed distribution is seen as a viable solution to balance the strengths and weaknesses of both push and pull models.
  • The document encourages feedback and alternative approaches, indicating an openness to diverse strategies and continuous improvement in news feed system design.

Design a news feed system

http://i.stack.imgur.com/KPvit.gif

First thing, the candidate should do is ask questions to the interviewer:

  1. Do you mean like facebook, quora news feed?
  2. So, it will fetch all feeds or lets say last 100 feeds?
  3. Will the feed be sorted by created date or some important feeds will be given preferences?
  4. And many more questions..

The interviewer will not expect you to design a facebook like feed which must taken months in just 45 minutes. So the idea is ‘Keep it simple’

So lets say, we want to design a news feed system which give some weight-age to each feed.

So here, I will discuss Facebook Edge Rank Algo :

An Edge is basically everything that “happens” in Facebook. Examples of Edges would be status updates, comments, likes, and shares. There are many more Edges than the examples above—any action that happens within Facebook is an Edge.

EdgeRank ranks Edges in the News Feed. EdgeRank looks at all of the Edges that are connected to the User, then ranks each Edge based on importance to the User. Objects with the highest EdgeRank will typically go to the top of the News Feed.

http://www.whatisedgerank.com/images/er_algorithm_graphic.png

Affinity is a one-way relationship between a User and an Edge. It could be understood as how close of a “relationship” a Brand and a Fan may have. Affinity is built by repeat interactions with a Brand’s Edges. Actions such as Commenting, Liking, Sharing, Clicking, and even Messaging can influence a User’s Affinity. Affinity can be based on number of the interactions with your friend, weight of each interaction and time since your last interaction.

Weight is a value system created by Facebook to increase/decrease the value of certain actions within Facebook. Commenting is more involved and therefore deemed more valuable than a Like. In the weighting system, Comments would have a higher value than a Like. In this system, all Edges are assigned a value chosen by Facebook. As a general rule, it’s best to assume Edges that take the most time to accomplish tend to weigh more.

Time Decay refers to how long the Edge has been alive; the older it is the less valuable it is. Time Decay is the easiest of the variables to understand. Mathematically it is understood as 1/(Time Since Action). As an Edge ages, it loses value. This helps keep the News Feed fresh with interesting new content, as opposed to lingering old content.

So now we know the importance of each feed.

Next is storing this activity.

Ask your interviewer about the use cases of the application.

Our schema would be like this:

id
user_id // user who created the activity
data //may be json object with metadata
edge_rank //calculated using above algo
activity_type // whether its a photo, status, video
source_id //the record that the activity is related to.
created_date //timestamp

We can index on (user_id, created_date) and then perform basic queries.

Publishing the feed

  1. “Push” Model/Fan-out-on-write

As the activity is created publish it to all the recipients. One way is to maintain a pub-sub model. In which the recipients will be subscribed to the messaging queue. Once the activity is generated, its pushed into the queue and the subscribers get a notification.

2. “Pull” Model/or Fan-out-on-load

This method involves keeping all recent activity data in memory and pulling in (or fanning out) that data at the time a user loads their home page.

Constraints

  1. The push model will start breaking if we have to push the messages to a large audience. Let's say Sachin Tendulkar will have a lot of fans. If this fails there will be a backlog of feeds. An alternative strategy is using different priorities for the fan-out tasks. You simply mark fan-outs to active users as high priority and fan-outs to inactive users as a low priority.
  2. The downside of the pull model is that the failure scenario is more catastrophic — instead of just delaying updates, you may potentially fail to generate a user’s feed. One workaround is to have a fallback method, e.g get the updates of 10 friends only.

You must be thinking why we chose denormalized activity rather than normalized. So it will also depend on the backend that you are using. The feed with the activities by people you follow either contains the ids of the activities (normalized) or the full activity (denormalized).

Storing id alone will reduce memory usage but add an additional overhead of getting the data based on the id. So if you are building a system in which feed will go to many users, the data will be copied many times and hence memory can be insufficient.

With Redis you need to be careful about memory usage. Cassandra, on the other hand, has plenty of storage space but is quite hard to use if you normalize your data.

The interviewer may ask the usage of Redis and Cassandra. Redis is read-optimized and stores all data in memory. Cassandra is a write-optimized data store.

So with redis the following approach can be used:

1.Create your MySQL activity record
2. For each friend of the user who created the activity, push the ID onto their activity list in Redis.

A hybrid approach of pull and push model can also be used.

Please feel free to give comments, feedback, or alternative approaches on this answer.

If you want me to write a new architecture system, please leave a comment with details.

Recommended reads:
http://www.whatisedgerank.com/
http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed
http://stackoverflow.com/questions/1443960/how-to-implement-the-activity-stream-in-a-social-network
http://www.quora.com/Activity-Streams/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
Interview
Coding
Technology
Facebook
Recommended from ReadMedium