avatarSystemDesign

Summary

The provided content outlines a comprehensive guide to designing a scalable photo-sharing application, akin to Instagram, with considerations for database schema, system architecture, and performance optimization.

Abstract

The article delves into the system design of a photo-sharing app, similar to Instagram, detailing the necessary requirements, database design, and high-level architecture to support millions of users and daily photo uploads. It emphasizes the importance of high availability, eventual consistency, and efficient storage solutions, while also discussing extended features such as user interactions and personalized content recommendations. The design process includes the use of relational and NoSQL databases, microservices for scalability, and caching mechanisms to enhance read performance. The article also provides resources for further learning and preparation for system design interviews, including courses and YouTube videos.

Opinions

  • The article suggests that a photo-sharing app should prioritize high availability over strict consistency, adopting an eventually consistent model to ensure users can access the service with minimal latency.
  • It advocates for the use of separate servers for handling read and write operations to optimize performance and resource utilization, especially given the read-heavy nature of photo-sharing applications.
  • The article recommends employing a load balancer to efficiently distribute requests across multiple servers, ensuring scalability and reliability.
  • It highlights the significance of caching frequently accessed data to speed up read operations and improve the user experience.
  • The design proposes pre-generating user news feeds to reduce latency, with periodic updates to maintain content relevance.
  • The author encourages the use of distributed file storage systems like GFS or HDFS for storing large media files, separating them from the database to improve performance.
  • The article promotes Educative's courses, such as "Grokking the System Design Interview," as valuable resources for interview preparation in the tech industry.
  • It suggests that mastering multi-threading and dynamic programming patterns can be beneficial for coding interviews at big tech companies.

System Design Interview: Instagram or a Similar App (Snapchat, Flickr, Picasa) Design

Photo-sharing apps

PREV | HOME | NEXT

Don’t forget to get your copy of Designing Data Intensive Applications, the single most important book to read for system design interview prep! Udacity | Coursera | Pluralsight.

Check out ByteByteGo’s popular System Design Interview Course

Consider signing-up for paid Medium account to access our curated content for system design resources.

Grokking Modern System Design for Software Engineers and Managers

If you are interviewing, consider buying our number#1 course for Java Multithreading Interviews.

Instagram is one of the most popular photo-sharing app today, with around 1 billion monthly active users according to Statista. Similar services, such as Flickr, Picasa, Pinterest and Snapchat also have their fair share of following. The basic purpose of such apps is to upload and share pictures with other users but there are plenty of other features too, to keep the interests of the users alive.

Get a leg up on your competition with the Grokking the Advanced System Design Interview course and land that dream job! Don’t waste life on Leetcode. Learn patterns with the course Grokking the Coding Interview: Patterns for Coding Questions. Or, if you prefer video-based courses check out Udacity.

If you were asked to design Instagram or a similar photo-sharing app in a system design interview, how will you go about it? Let’s design a basic photo-sharing app that can support millions of users and handle the flow of an equally large number of photos daily.

What Does A Basic Photo-Sharing App Do?

Instagram allows people to upload pictures and videos and share them with other users. Similar to Twitter, users can also follow other users and like, share and comment on their posts. There’s also a News Feed just like other social networking apps, with top posts from people.

Other than the basics, Instagram continually upgrades its services with the latest features. You’ll often find new filters to try out new poses, backgrounds and looks. An interesting feature is Instagram Story that allows you to post your day’s pictures and videos that your friends can see for 24 hours.

Grokking the Coding Interview: Patterns for Coding Questions

List Down The Requirements

For a basic photo-sharing app, we’ll include the following requirements in our design:

  • Users should be able to upload and view photos.
  • Users can follow any number of users.
  • Users can view News Feeds with posts from the users they follow.

Extended Requirements

If you wish to extend the design and incorporate more advanced features, here are some suggestions:

  • Users can comment and like photos.
  • Users can send and receive messages from other users.
  • Customized recommendations to connect with other other users based on the user’s interests.
  • Users can add tags to pictures. Users can also be tagged on photos.

Check out the course Coderust: Hacking the Coding Interview for Google and Facebook coding interviews.

Breeze through your coding interviews with Hacking the Coding Interview.

Characteristics Of A Photo-Sharing App

Considering non-functional requirements is important in building a scalable system that can efficiently serve millions of users. Here are some of the characteristics that we will want in a scalable photo-sharing app.

Land a higher salary with Grokking Comp Negotiation in Tech.

High Availability, Eventual Consistency

Our photo-sharing app should be highly available, with minimum latency in developing News Feeds and viewing photos. As compared to availability, consistency is of secondary importance since it’s acceptable if the photos or videos recently uploaded aren’t immediately available to all the followers. So we’re aiming for an eventually consistent system with high availability.

Reliability

Even though uploaded photos may not immediately be available to other users on the network, the service should guarantee that once a photo is uploaded, it will not be lost.

Brush-up on Big-O notation for tech interviews with: Big-O Notation For Coding Interviews and Beyond

Read-Heavy

Applications such as Instagram and Snapchat are read-heavy. Read requests to fetch News Feeds and display photos are much more than write requests to upload photos. We want a system that can handle a high number of reads each second.

Efficient Storage

Since Instagram deals with photos and videos and there are no limitations on the numbers of files users can upload, our system will need an efficient mechanism to store it.

Assuming there are 1 billion active users on Instagram and each user uploads 3 photos in a day, there will be 3 billion pictures uploaded each day.

If each picture takes 150 KBs of storage, we’ll need about

of storage for pictures each day.

Let’s assume Instagram stores files in its database for 5 years. The total storage we’ll need is:

Pictures aren’t the only items that are stored. Each picture will also carry metadata. In addition, user comments and a list of people that a user follows will also need to be stored. Even if we allocate a separate server to manage the database, it will not be able to store such a large size of data. To retrieve data faster and optimize performance, we will need an approach to scale the database.

Different scaling techniques may be used:

  • Vertical splitting, or partitioning, to partition database based on types of data, such as user database, image database, comment database etc.
  • Horizontal splitting, or sharding may also be used to split storage over different machines.
Grokking Comp Negotiation in Tech

If you are preparing for tech interviews at FANGs, you may want to check out the course Grokking the System Design Interview by Educative.

Database Design

Since there are several different types of data, including user data, images, image metadata, followers information, to be stored, we need a scalable database schema that allows fast scanning and retrieval of information. Since the types of data in this application are inherently relational, for example images have a relationship with User IDs (each image belongs to an owner — who uploaded it), we can use a relational database to store information.

The most straightforward approach is to store data in the form of tables in a relational database such as MySQL.

User Table

For the user table, our primary key will be the UserID, which can increment serially with the creation of new profiles. Against each UserID, we’ll have a table that holds the complete information for that particular user, including name, email, location, time zone etc.

Master multi-threading in Python with: Python Concurrency for Senior Engineering Interviews.

Photo Table

The photo table will carry the PhotoID as the primary key which can increment serially with each new photo that uploads on the system. There can be a foreign key that holds the UserID to reference the user who uploaded the picture. This foreign key also represents the relationship between the photo table and the user table. Users will have a one-to-many relationship with the photos because each user can upload multiple photos but every photo will belong to a single user.

Other than the UserID, the photo table may also carry additional information like caption, location from where the photo was uploaded, date and time it was uploaded and more. Besides all the metadata related to the photo, we also need to store the actual image somewhere.

Owing to the large size of the images, it’s not recommended to store the images in the database itself. We will store the path for the image in the photo table which points to its actual storage location. Any distributed file storage, such as GFS or HDFS may be used to store images.

Check out the course Coderust: Hacking the Coding Interview for Google and Facebook coding interviews.

Follower Table

The Follower table contains the UserIDs of the follower (let’s say, A) and the followee (let’s say, B). Both are foreign keys that refer to the UserID from the user table. The follower table stores a single direction of user following. From the information in the above diagram, we can interpret that A follows B, where B may or may not follow A (note that this is different from the ‘Friends’ on Facebook, which is a bidirectional relationship).

Interviewing? consider buying our number#1 course for Java Multithreading Interviews.

The user table will have a many-to-many relationship with the follower table, as opposed to the photo table, since a user can follow multiple people and a followee can have multiple users following him/her.

Extending The Database Design To NoSQL

The same data can also be stored in a scalable, high-performance NoSQL database such as Cassandra to achieve high availability and minimize latency. In a key-value store such as this one, we can maintain the photo table with the PhotoID as the ‘key’ and an object containing all the metadata as the ‘value’. The user table will be maintained in a similar way.

In addition to the photo and the user table, we’ll need another table to map the photos on the user to which they belong (refer to the tables above). For this UserPhoto table, we will have UserID as the ‘key’, and a list of PhotoIDs belonging to the user as the ‘value’. A similar UserFollow table is also required, mapping the list of people a user follows onto the UserID.

If you are preparing for tech interviews at FANGs, you may want to check out the course Grokking the System Design Interview by Educative.

Ace the machine learning engineer interview with Grokking the Machine Learning Interview.

Instagram Architecture — High Level Design

There are multiple features that our app will handle, including uploading images, viewing images and News Feeds and following other users. When designing an app to serve millions of users and handle several features, the best option is to split them into microservices, operating independently of one another to optimize server usage and app performance.

Also keep in mind that there are going to be much more requests for viewing News Feeds than the requests for uploading images. As mentioned earlier, Instagram is read-heavy. To accommodate for the high number of reads, we will want to replicate our database across multiple servers to retrieve data efficiently and display it to the user in the smallest possible time.

Replicating the database (and cache) also creates redundancy to eliminate any possibility of data loss. When multiple copies of each information is stored on multiple storage servers, it’s possible to retrieve the file from a different server if one storage server is not available, thus saving the service from returning an error to the client.

Separate Reads And Writes

Uploading photos is a slow process and if the same server was to serve both read and write requests, it’s possible that uploads will consume more resources and the users who want to view photos will often find the system busy.

Also, since read and write requests may have different requirements, it makes sense to have different servers handle them independently. All read requests will be directed to the Read Server, while all write requests will go to the Write Server, as you can see in the diagram.

The Write Server is not only responsible for storing the metadata for the fresh image into the database and cache, but also for uploading the image to the external storage, such as HDFS. The image metadata will carry the path from where to retrieve the image when there’s a request for it.

Learn patterns to dynamic programming interview questions to land your next Big Tech job!

Load Balancer

Separate operations are handled on different servers and there will yet be multiple servers to handle each operation. All the read requests, for example, cannot be handled on a single server when running applications at the scale of millions of users. There will be several servers to handle read and write requests to scale the system horizontally and minimize latency.

Introducing a load balancer is essential since it will redirect the request of the client to the server that’s available and has the resources to handle the type of request made by the user. So the client’s request will first hit the load balancer and from there, it will be redirected to a suitable server to handle the request.

Check out the course Coderust: Hacking the Coding Interview for Google and Facebook coding interviews.

Cache For Faster Reads

For ‘read’ applications, you can store cache for frequently used data in an in-memory data store, such as Redis, which the server can query for the requests instead of scanning the entire database. So every new ‘write’ needs to be stored to the database as well as the cache so that it is quickly available to any relevant read request and is consistent with the information in the database.

Each ‘read’ request will be queried at Redis by the Read Server and the metadata from the cache for that particular image is returned to the client by the server. This metadata information will also carry the path/link for the image so the client can directly download and view it on his/her device from that link.

Crack your next Java interview with: The Java Interview Handbook: 300+ Interview Questions

News Feed Generation

An important component of Instagram design is the generation of News Feed. Since it’s an independent microservice, there can be a dedicated server (or servers) for managing News Feed service. Let’s call it News Feed Server.

A user’s News Feed carries a list of all the latest and high ranking photos from the users they follow. For this, the service needs to pick the UserFollow table from the cache to get the list of all the people that the user follows. Using the UserIDs of the followees, the News Feed Server will pull the metadata information for all the latest photos from the followees. The photos from all the followees are ranked based on certain attributes such as their freshness, likes, and comments before displaying a selected number of top photos to the client.

Since there are multiple steps involved before the News Feed Service can actually display the News Feed to the users, we can expect a high latency with the above approach. Yet, whenever you log into your Instagram account, your News Feed is there for you within milliseconds.

The News Feed Service counters this latency by generating the News Feeds for the user in advance and storing it in either the same cache that stores metadata or a dedicated News Feed cache. You can build the system to update the pre-generated News Feed on an hourly basis or every few minutes, depending on what’s required. Each time the user makes a request to display their News Feed, the News Feed Server simply queries the News Feed Cache to fetch the pre-prepared feed for the user and display it on their homepage.

If you are preparing for tech interviews at FANGs, you may want to check out the course Grokking the System Design Interview by Educative.

Start your journey in machine learning with Grokking Machine Learning Design.

Top 5 YouTube Videos On Designing A Photo-Sharing App

Conclusion

This is how to design a simple version of Instagram. We have covered the basics of any photo-sharing app, including uploading, viewing/downloading photos and creating News Feeds (as is available in Instagram). If you want to extend the design, there are several additional features to discuss.

Breeze through your coding interviews with Hacking the Coding Interview.

Adding to the same design, you can append some additional microservices, including one that handles comments and likes of photos and another one that caters personal messaging between users. You can also build a microservice that creates Instagram stories.

Teach kids to code and have fun at the same time — CodeMonkey!

If you enjoyed the article, kindly support us and consider following. Thank you :)

Your Comprehensive Interview Kit for Big Tech Jobs

0. Grokking the Machine Learning Interview This course helps you build that skill, and goes over some of the most popularly asked interview problems at big tech companies.

1. Grokking the System Design Interview Learn how to prepare for system design interviews and practice common system design interview questions.

2. Grokking Dynamic Programming Patterns for Coding Interviews Faster preparation for coding interviews.

3. Grokking the Advanced System Design Interview Learn system design through architectural review of real systems.

4. Grokking the Coding Interview: Patterns for Coding Questions Faster preparation for coding interviews.

5. Grokking the Object Oriented Design Interview Learn how to prepare for object oriented design interviews and practice common object oriented design interview questions

6. Machine Learning System Design

7. System Design Course Bundle

8. Coding Interviews Bundle

9. Tech Design Bundle

10. All Courses Bundle

Codingbootcamp
Coding
System Design Interview
Technology
Code
Recommended from ReadMedium