avatarNK

Summary

The web content outlines the design of a scalable tagging service system, emphasizing data storage, write and read paths, and encourages subscriptions to newsletters for system design insights.

Abstract

The provided web content delves into the system design of a tagging service, detailing the architecture necessary to handle tagging items, viewing tagged items in near real-time, and ensuring scalability. It highlights the use of a SQL database for storing relationships between tags and items, a NoSQL data store for item metadata, and a cache server like Redis for popular tags and items. The design includes a write path that incorporates a message queue for asynchronous processing and a read path that utilizes a CDN and cache servers for efficient data retrieval. The content also promotes a system design newsletter, offering an ultimate guide to system design interviews upon subscription, and mentions the use of object storage such as AWS S3 for media files. The author, NK, encourages readers to subscribe to the newsletter and Medium using their referral link, ensuring readers that this support helps in producing valuable content without affecting their subscription cost.

Opinions

  • The author believes in the value of their newsletter, suggesting it as a resource for excelling in system design interviews.
  • The author values the community's support, expressing gratitude for subscriptions and follows, which in turn motivates them to continue creating content.
  • There is an emphasis on the importance of scalability and fault tolerance in system design, as evidenced by the use of rate limiting, message queues, and replication across data centers.
  • The author endorses using their referral link for Medium membership, indicating that it aids in the production of their content without any additional cost to the subscriber.
  • The use of a trie data structure for typeahead autosuggestion indicates the author's preference for efficient search query suggestions.
  • The inclusion of references to external resources suggests the author's commitment to providing well-researched and credible information.

Tagging Service System Design

Hashtag Service Design

source: unsplash.com, janbaborak

You can subscribe to the system design newsletter to excel in system design interviews and software architecture. You will also receive the ultimate guide to approaching system design interviews on newsletter sign-up.

The original article was published on systemdesign.one website by the author NK. Some of the popular tagging services are the following:

  • JIRA tags
  • Confluence tags
  • Stackoverflow tags
  • Twitter hashtags

Disclaimer: Some of the linked resources are affiliates.

Requirements

  • Tag an item
  • View the items with a specific tag in near real-time
  • Scalable

Newsletter

Subscribe to my newsletter and never miss a new blog post again, as you’ll get notified via email every time I publish something. You will also receive the ultimate guide to approaching system design interviews on newsletter sign-up.

If you’re planning to subscribe to Medium using my referral link, I wanted to let you know that I will receive a portion of the membership fees as a reward for referring you. This helps me continue to produce valuable content. However, I want to assure you that this doesn’t affect your subscription cost in any way. You’ll still get the same great benefits and features as any other Medium member. Thank you for considering my referral link and supporting my work!

Data storage

Database schema

Tagging service; Database schema
  • The primary entities of the database are the Tags table, the Items table, and the Tags_Items table
  • The Tags_Items is a join table to represent the relationship between the Items and the Tags
  • The relationship between the Tags and the Items tables is many-to-many

Type of data store

  • The media files (images, videos) and text files are stored in a managed object storage such as AWS S3
  • A SQL database such as Postgres stores the metadata on the relationship between tags and items
  • A NoSQL data store such as MongoDB stores the metadata of the item
  • A cache server such as Redis stores the popular tags and items

High-level design

  1. When a new item is tagged, the metadata is stored on the SQL database
  2. The popular tags and items are cached on dedicated cache servers to improve latency
  3. The non-popular tags and items are fetched by querying the read replicas of SQL and NoSQL data stores

Write path

Tagging service; Write path
  1. The client makes an HTTP connection to the load balancer
  2. The load balancer delegates the client request to a web server with free capacity
  3. The write requests to create an item or tag an item are rate limited
  4. The write requests are stored on the message queue for asynchronous processing and improved fault tolerance
  5. The fanout service distributes the write request to multiple services to tag an item
  6. The object store persists the text files or media files embedded in an item
  7. The NoSQL data store persists the metadata of an item (comments, upvotes, published date)
  8. The SQL database persists metadata on the relationship between tags and items
  9. The tags info service is queried to identify the popular tags
  10. If the item was tagged with a popular tag, the item is stored on the items cache server
  11. The tags cache server stores the IDs of items that were tagged with popular tags
  12. LRU cache is used to evict the cache servers
  13. The data objects (items and tags) are replicated across data centers at the web server level to save bandwidth

Read path

Tagging service; Read path
  1. The client executes a DNS query to resolve the domain name
  2. The client queries the CDN to check if the tag data is cached on the CDN
  3. The client creates an HTTP connection to the load balancer
  4. The load balancer delegates the client request to a web server with free capacity
  5. The read requests to fetch the tags or items are rate limited
  6. The web server queries the tags service to fetch the tags
  7. The tags service queries the tags info service to identify if the requested tag is popular
  8. The lists of tagged items for a popular tag are fetched from the tags cache server
  9. The tags service executes an MGET Redis request to fetch the relevant tagged items from the items cache server
  10. The list of items tagged with non-popular tags is fetched from the read replicas of the SQL database
  11. The items tagged with non-popular tags are fetched from the read replicas of the NoSQL data store
  12. The media files embedded in an item are fetched from the object store
  13. The trie data structure is used for typeahead autosuggestion for search queries on tags

Newsletter

Subscribe to my newsletter and never miss a new blog post again, as you’ll get notified via email every time I publish something. You will also receive the ultimate guide to approaching system design interviews on newsletter sign-up.

If you’re planning to subscribe to Medium using my referral link, I wanted to let you know that I will receive a portion of the membership fees as a reward for referring you. This helps me continue to produce valuable content. However, I want to assure you that this doesn’t affect your subscription cost in any way. You’ll still get the same great benefits and features as any other Medium member. Thank you for considering my referral link and supporting my work!

References

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job

Web Development
System Design Interview
Software Architecture
Hashtag
Jira
Recommended from ReadMedium