Data Warehousing Schemas
Example: Star Schema, Snowflake Schema, Fact Constellation Schema
Much like an OLTP system(database), an OLAP system(data warehouse) is also required to maintain a schema. A database uses an entity relational model, while a data warehouse uses a dimensional model. In this blog, we will see the three major schemas that are used to model a data warehouse.

About Star Schema
- Each dimension is represented by only one dimension table
- The dimension tables are directly linked with the fact table
- Queries are not that complex because of straightforward joins.
- Simple queries result in the faster query response time
- Most widely used in the industry

As in this example, we have one fact(Sales) and four dimensions (Product, Time, Customer, Employee). The fact has measures like quantity, amount, discount, and payment.
About Snowflake Schema
- The dimension table is split into sub-dimension tables
- Some dimensions are normalized to avoid redundancy of data
- Used when a dimension table starts growing at a higher rate

As in this example, the product and employee dimensions have been further divided to hold more data. Businesses are not interested in the employee contact details, so we can segregate them to a different table. Similarly, the product category dimension might be of interest to the inventory management team, so we can keep this information aside. They are no fixed rules, it depends on what is the use case.
About Galaxy Schema
- It is also called Fact Constellation Schema
- It has more than one fact
- It can be an extension of either star schema or snowflake schema
- The same dimension table can be shared between more than one fact table

As in this example, we have two fact tables(Sales and Shipping). Shipping Fact has 3 dimensions(Product, Employee, and Shipping_address). 2/3 dimensions are shared between the two facts.
Conclusion
We saw three different data warehouse modeling techniques, the star schema, the snowflake schema, and the fact constellation schema. Each one has its pros and cons. Your use case will let you know how much the cons of one schema impact your need. You will have to keep future requirements in mind before choosing. People start with Star and then they slowly move to the snowflake schema and then to the fact constellation schema.

If you are looking to prepare for a Data Engineering interview do check out my interview blog series:






