Star schema is the foundation of Kimball dimensional modeling. It organizes data into a central fact table surrounded by dimension tables — like a star with points radiating outward.
Why Star Schema?
Analytical queries need fast, intuitive access to business metrics. Normalized OLTP schemas optimize for write performance, but force complex multi-join queries for reporting. Star schema inverts this trade-off: denormalized dimensions make reads fast and queries readable.
Anatomy of a Star Schema
Every star schema has two building blocks:
- Fact table — stores measurable events (sales, clicks, shipments). Each row is one event. Columns are foreign keys to dimensions plus numeric measures.
- Dimension tables — store descriptive context (who, what, where, when). Each row is one entity. Columns are attributes used for filtering, grouping, and labeling.
A Practical Example: Sales
Consider a retail sales database. The source system has normalized tables for orders, line items, products, customers, stores, and dates.
A Kimball star schema for this domain looks like:
-- Fact table: one row per line item sold
CREATE TABLE fact_sales (
sale_key BIGINT PRIMARY KEY,
date_key INT REFERENCES dim_date(date_key),
product_key INT REFERENCES dim_product(product_key),
customer_key INT REFERENCES dim_customer(customer_key),
store_key INT REFERENCES dim_store(store_key),
quantity_sold INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
total_amount DECIMAL(12,2) NOT NULL
);
-- Dimension: date
CREATE TABLE dim_date (
date_key INT PRIMARY KEY,
full_date DATE NOT NULL,
year INT,
quarter INT,
month INT,
day_of_week VARCHAR(10)
);
-- Dimension: product
CREATE TABLE dim_product (
product_key INT PRIMARY KEY,
product_name VARCHAR(200),
category VARCHAR(100),
subcategory VARCHAR(100),
brand VARCHAR(100)
);The fact table references each dimension via surrogate keys. Queries join the fact to whichever dimensions they need — no multi-hop joins through normalized lookup tables.
Querying the Star
With the star schema in place, analytical queries are simple:
SELECT
d.year,
p.category,
SUM(f.total_amount) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY d.year, p.category
ORDER BY d.year, revenue DESC;Two joins, one GROUP BY, and you have revenue by year and product category. Try writing that against a normalized OLTP schema — you would need five or six joins.
Design Guidelines
- One fact table per business process — sales, inventory, returns are separate fact tables
- Conformed dimensions — dim_date and dim_customer are shared across fact tables
- Surrogate keys on every dimension — never use natural keys as the primary key
- Avoid snowflaking — keep dimensions flat; do not normalize dimension attributes into sub-tables
- Grain first — define the grain (one row = one line item) before adding any columns
Next Steps
Star schema is the starting point. Once you understand facts and dimensions, you can explore slowly changing dimensions (SCD Type 2), bridge tables for many-to-many relationships, and aggregated fact tables for performance optimization.