avatarAman Ranjan Verma

Summary

The website content discusses three primary schemas used in data warehousing: Star Schema, Snowflake Schema, and Fact Constellation Schema, detailing their characteristics, use cases, and relationships with fact and dimension tables.

Abstract

The article provides an overview of the dimensional modeling techniques used in data warehousing, focusing on the Star Schema, Snowflake Schema, and Fact Constellation Schema. The Star Schema is characterized by its simplicity, with one dimension table linked directly to a fact table, facilitating straightforward queries and faster response times. The Snowflake Schema, on the other hand, normalizes dimension tables to reduce data redundancy, which is beneficial when dimension tables grow significantly. Lastly, the Fact Constellation Schema, also known as the Galaxy Schema, involves multiple fact tables sharing dimension tables, offering flexibility for complex analytical queries. The article concludes by emphasizing the importance of selecting a schema based on specific use cases and future requirements, suggesting that organizations often progress from the Star Schema to the Snowflake Schema, and then to the Fact Constellation Schema as their needs evolve.

Opinions

  • The Star Schema is considered the most widely used schema in the industry due to its simplicity and fast query performance.
  • The Snowflake Schema is preferred when a dimension table's growth rate necessitates normalization to avoid redundancy.
  • The Galaxy Schema, or Fact Constellation Schema, is an advanced model that supports scenarios with multiple facts, providing a scalable solution for complex data warehouse needs.
  • The choice of schema is influenced by the specific needs of the business and the potential impact of each schema's limitations on those needs.
  • Organizations typically start with simpler schemas like the Star Schema and move towards more complex ones like the Fact Constellation Schema as their data warehousing requirements become more sophisticated.

Data Warehousing Schemas

Example: Star Schema, Snowflake Schema, Fact Constellation Schema

Much like an OLTP system(database), an OLAP system(data warehouse) is also required to maintain a schema. A database uses an entity relational model, while a data warehouse uses a dimensional model. In this blog, we will see the three major schemas that are used to model a data warehouse.

About Star Schema

  • Each dimension is represented by only one dimension table
  • The dimension tables are directly linked with the fact table
  • Queries are not that complex because of straightforward joins.
  • Simple queries result in the faster query response time
  • Most widely used in the industry
Example: Star Schema

As in this example, we have one fact(Sales) and four dimensions (Product, Time, Customer, Employee). The fact has measures like quantity, amount, discount, and payment.

About Snowflake Schema

  • The dimension table is split into sub-dimension tables
  • Some dimensions are normalized to avoid redundancy of data
  • Used when a dimension table starts growing at a higher rate
Example: Snowflake Schema

As in this example, the product and employee dimensions have been further divided to hold more data. Businesses are not interested in the employee contact details, so we can segregate them to a different table. Similarly, the product category dimension might be of interest to the inventory management team, so we can keep this information aside. They are no fixed rules, it depends on what is the use case.

About Galaxy Schema

  • It is also called Fact Constellation Schema
  • It has more than one fact
  • It can be an extension of either star schema or snowflake schema
  • The same dimension table can be shared between more than one fact table
Example: Fact Constellation Schema

As in this example, we have two fact tables(Sales and Shipping). Shipping Fact has 3 dimensions(Product, Employee, and Shipping_address). 2/3 dimensions are shared between the two facts.

Conclusion

We saw three different data warehouse modeling techniques, the star schema, the snowflake schema, and the fact constellation schema. Each one has its pros and cons. Your use case will let you know how much the cons of one schema impact your need. You will have to keep future requirements in mind before choosing. People start with Star and then they slowly move to the snowflake schema and then to the fact constellation schema.

If you are looking to prepare for a Data Engineering interview do check out my interview blog series:

Data Engineering
Data Warehouse
Database
Sql
Data Science
Recommended from ReadMedium