Summary

The web content outlines the design of a scalable tagging service system, emphasizing data storage, write and read paths, and encourages subscriptions to newsletters for system design insights.

Abstract

The provided web content delves into the system design of a tagging service, detailing the architecture necessary to handle tagging items, viewing tagged items in near real-time, and ensuring scalability. It highlights the use of a SQL database for storing relationships between tags and items, a NoSQL data store for item metadata, and a cache server like Redis for popular tags and items. The design includes a write path that incorporates a message queue for asynchronous processing and a read path that utilizes a CDN and cache servers for efficient data retrieval. The content also promotes a system design newsletter, offering an ultimate guide to system design interviews upon subscription, and mentions the use of object storage such as AWS S3 for media files. The author, NK, encourages readers to subscribe to the newsletter and Medium using their referral link, ensuring readers that this support helps in producing valuable content without affecting their subscription cost.

Opinions

The author believes in the value of their newsletter, suggesting it as a resource for excelling in system design interviews.
The author values the community's support, expressing gratitude for subscriptions and follows, which in turn motivates them to continue creating content.
There is an emphasis on the importance of scalability and fault tolerance in system design, as evidenced by the use of rate limiting, message queues, and replication across data centers.
The author endorses using their referral link for Medium membership, indicating that it aids in the production of their content without any additional cost to the subscriber.
The use of a trie data structure for typeahead autosuggestion indicates the author's preference for efficient search query suggestions.
The inclusion of references to external resources suggests the author's commitment to providing well-researched and credible information.

Tagging Service System Design

Hashtag Service Design

You can subscribe to the system design newsletter to excel in system design interviews and software architecture. You will also receive the ultimate guide to approaching system design interviews on newsletter sign-up.

The original article was published on systemdesign.one website by the author NK. Some of the popular tagging services are the following:

JIRA tags
Confluence tags
Stackoverflow tags
Twitter hashtags

Disclaimer: Some of the linked resources are affiliates.

Requirements

Tag an item
View the items with a specific tag in near real-time
Scalable

Subscribe to my newsletter and never miss a new blog post again, as you’ll get notified via email every time I publish something. You will also receive the ultimate guide to approaching system design interviews on newsletter sign-up.

If you’re planning to subscribe to Medium using my referral link, I wanted to let you know that I will receive a portion of the membership fees as a reward for referring you. This helps me continue to produce valuable content. However, I want to assure you that this doesn’t affect your subscription cost in any way. You’ll still get the same great benefits and features as any other Medium member. Thank you for considering my referral link and supporting my work!

Data storage

Database schema

The primary entities of the database are the Tags table, the Items table, and the Tags_Items table
The Tags_Items is a join table to represent the relationship between the Items and the Tags
The relationship between the Tags and the Items tables is many-to-many

Type of data store

The media files (images, videos) and text files are stored in a managed object storage such as AWS S3
A SQL database such as Postgres stores the metadata on the relationship between tags and items
A NoSQL data store such as MongoDB stores the metadata of the item
A cache server such as Redis stores the popular tags and items

High-level design

When a new item is tagged, the metadata is stored on the SQL database
The popular tags and items are cached on dedicated cache servers to improve latency
The non-popular tags and items are fetched by querying the read replicas of SQL and NoSQL data stores

Write path

The client makes an HTTP connection to the load balancer
The load balancer delegates the client request to a web server with free capacity
The write requests to create an item or tag an item are rate limited
The write requests are stored on the message queue for asynchronous processing and improved fault tolerance
The fanout service distributes the write request to multiple services to tag an item
The object store persists the text files or media files embedded in an item
The NoSQL data store persists the metadata of an item (comments, upvotes, published date)
The SQL database persists metadata on the relationship between tags and items
The tags info service is queried to identify the popular tags
If the item was tagged with a popular tag, the item is stored on the items cache server
The tags cache server stores the IDs of items that were tagged with popular tags
LRU cache is used to evict the cache servers
The data objects (items and tags) are replicated across data centers at the web server level to save bandwidth

Read path

The client executes a DNS query to resolve the domain name
The client queries the CDN to check if the tag data is cached on the CDN
The client creates an HTTP connection to the load balancer
The load balancer delegates the client request to a web server with free capacity
The read requests to fetch the tags or items are rate limited
The web server queries the tags service to fetch the tags
The tags service queries the tags info service to identify if the requested tag is popular
The lists of tagged items for a popular tag are fetched from the tags cache server
The tags service executes an MGET Redis request to fetch the relevant tagged items from the items cache server
The list of items tagged with non-popular tags is fetched from the read replicas of the SQL database
The items tagged with non-popular tags are fetched from the read replicas of the NoSQL data store
The media files embedded in an item are fetched from the object store
The trie data structure is used for typeahead autosuggestion for search queries on tags

References

Raffi Krikorian, Timelines at Scale, infoq.com
How feed works, facebook.com

Level Up Coding

Thanks for being a part of our community! Before you go:

👏 Clap for the story and follow the author 👉
📰 View more content in the Level Up Coding publication
💰 Free coding interview course ⇒ View Course
🔔 Follow us: Twitter | LinkedIn | Newsletter

🚀👉 Join the Level Up talent collective and find an amazing job