skip to content
Ben Lau statistics . machine learning . programming . optimization . research

Database Modeling

2 min read Updated:

Star schema

  • It is a data model for data warehouse
  • Star schema makes things modular, you can update smaller tables when things change and the changes get propagated.
  • Fact vs. Dimension Tables Explained - Fact tables capture the quantitative essence of business events – sales, clicks, shipments. Dimension tables provide the crucial context – who, what, where, when. Together, they transform data from a source of frustration into a navigable landscape of business intelligence.
    • You can use different types of dimension tables within your data warehouse: Conformed Dimensions, Role-Playing Dimensions, and Slowly Changing Dimensions (SCD)
      • Type 1 SCD overwrite old data, while type 2 SCD append data, with additional colummns which are start_date, end_date, and is_current
      • A role-playing dimension is a dimension that can filter related facts differently.
  • A fact table contains the quantitative data or facts of a business process, such as sales or revenue. The fact table typically consists of numerical values and foreign keys that link to dimension tables.
  • Dimension tables provide context or descriptive information about the data in the fact table. They contain attributes or characteristics that help to categorize or classify the data, such as product, customer, location, and time. Dimension tables are usually smaller than fact tables and contain unique values for each attribute.
  • Resources

Star schema vs normalized data

  • Star schema: Faster queries, simpler to understand, great for analysis and basic reporting, but can be redundant and inflexible.
  • Normalized data: Less storage, better data integrity, but complex queries and less analysis-friendly.

Medallion Architecture

  • Three-Tier data
    • Bronze: raw data as-is from source systems, table like structure with some relevant metadata. It could be a data lake, or it could be right after data lake, which data lake contains raw json, csv and excel files instead of tables
    • Silver: cleansed and conformed data, dimension and fact tables (dim, fct) from star schema could be applied
    • Gold: curated business-level tables ready for business consumption , report table (rpt) from star schema could be applied
    • Resources