Introduction to Star Schema Design — datamodelassistant.com

Star schema is the foundation of Kimball dimensional modeling. It organizes data into a central fact table surrounded by dimension tables — like a star with points radiating outward.

Why Star Schema?

Analytical queries need fast, intuitive access to business metrics. Normalized OLTP schemas optimize for write performance, but force complex multi-join queries for reporting. Star schema inverts this trade-off: denormalized dimensions make reads fast and queries readable.

Anatomy of a Star Schema

Every star schema has two building blocks:

Fact table — stores measurable events (sales, clicks, shipments). Each row is one event. Columns are foreign keys to dimensions plus numeric measures.
Dimension tables — store descriptive context (who, what, where, when). Each row is one entity. Columns are attributes used for filtering, grouping, and labeling.

A Practical Example: Sales

Consider a retail sales database. The source system has normalized tables for orders, line items, products, customers, stores, and dates.

A Kimball star schema for this domain looks like:

-- Fact table: one row per line item sold
CREATE TABLE fact_sales (
  sale_key        BIGINT PRIMARY KEY,
  date_key        INT REFERENCES dim_date(date_key),
  product_key     INT REFERENCES dim_product(product_key),
  customer_key    INT REFERENCES dim_customer(customer_key),
  store_key       INT REFERENCES dim_store(store_key),
  quantity_sold   INT NOT NULL,
  unit_price      DECIMAL(10,2) NOT NULL,
  total_amount    DECIMAL(12,2) NOT NULL
);

-- Dimension: date
CREATE TABLE dim_date (
  date_key        INT PRIMARY KEY,
  full_date       DATE NOT NULL,
  year            INT,
  quarter         INT,
  month           INT,
  day_of_week     VARCHAR(10)
);

-- Dimension: product
CREATE TABLE dim_product (
  product_key     INT PRIMARY KEY,
  product_name    VARCHAR(200),
  category        VARCHAR(100),
  subcategory     VARCHAR(100),
  brand           VARCHAR(100)
);

The fact table references each dimension via surrogate keys. Queries join the fact to whichever dimensions they need — no multi-hop joins through normalized lookup tables.

Querying the Star

With the star schema in place, analytical queries are simple:

SELECT
  d.year,
  p.category,
  SUM(f.total_amount) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY d.year, p.category
ORDER BY d.year, revenue DESC;

Two joins, one GROUP BY, and you have revenue by year and product category. Try writing that against a normalized OLTP schema — you would need five or six joins.

Design Guidelines

One fact table per business process — sales, inventory, returns are separate fact tables
Conformed dimensions — dim_date and dim_customer are shared across fact tables
Surrogate keys on every dimension — never use natural keys as the primary key
Avoid snowflaking — keep dimensions flat; do not normalize dimension attributes into sub-tables
Grain first — define the grain (one row = one line item) before adding any columns

Next Steps

Star schema is the starting point. Once you understand facts and dimensions, you can explore slowly changing dimensions (SCD Type 2), bridge tables for many-to-many relationships, and aggregated fact tables for performance optimization.