Modeling — Slowly Changing Dimension

Sai Teja
3 min readDec 17, 2023

--

src:https://www.expedient.nl/wp-content/uploads/2020/05/Data-Modeling.png

What is SCD? Why consider SCD when modeling?

Modeling is simple when data does not change over time, but what if your items change over time? That’s where Slowly Changing Dimensions (SCD) step in. Consider a retail scenario: your customers’ details, like home address, phone number, or email, might change with time. SCD helps us decide how to handle these changes.

In one approach, you can overwrite the data (Type-1 SCD) — whenever something changes, just update the existing information. But what if you want a record of how things were for the analytical purposes? That’s where Type-2 SCD comes in. It’s like keeping different versions of your customer’s details, showing when each change started and ended with date columns(start date and end date), like flipping through an address book with historical entries.

Why SCD Type-2 is most preferred in Industry?

Type-2 is best used in industry because, in most cases maintaining a detailed history of changes is crucial. Imagine you’re in a scenario, like retail or finance, where understanding the evolution of customer data over time is essential. Type-2 SCD shines here by creating a historical log of these changes. Each modification results in a new record with a start and end date, providing a comprehensive timeline. This proves invaluable for analytics, allowing businesses to track how customer information evolves and make informed decisions based on a rich historical context.

Types of SCD:

Type-1: Overwrite

In Type-1, when a change occurs, the existing record is simply updated with the new value. This method doesn’t maintain historical data, so it’s suitable for situations where historical changes are not important or can be discarded.

Example:

UPDATE dimension_table
SET attribute_column = 'new_value'
WHERE key_column = 'some_key';

Type-2:

In Type-2, a new record is added to the table whenever there is a change. This way, historical data is preserved, and each record has a start and end date to indicate its validity period. A surrogate key is used to uniquely identify each record.

INSERT INTO dimension_table (key_column, attribute_column, start_date, end_date)
VALUES ('new_key', 'new_value', 'start_date', 'end_date');

Type-3:

In Type 3, we add new columns to the existing dimension table to store both the current and previous value of an attribute.
This method is more space-efficient than Type 2 but doesn’t maintain a complete historical record.

UPDATE dimension_table
SET previous_value_column = current_value_column,
current_value_column = 'new_value'
WHERE key_column = 'some_key';

Type-4:

In Type 4,we maintain records in two distinct tables: a current record table and a historical record table. The current table holds active records at any given moment, while the historical table stores previous versions of those records. For instance, while tracking information, the current table would contain active records, and the historical table would retain past records, enabling a detailed history of changes over time. This approach proves valuable for scenarios where frequent changes to records, such as evolving customer orders, require both real-time tracking and a comprehensive historical perspective.

How to chose which type of SCD is better?

Choosing the appropriate type of SCD involves considering various factors. The decision mainly depends on the specific business requirements and characteristics of the data being managed.

If historical changes (past history) are not crucial and only the latest information is needed, Type 1 SCD is a one and only choice. Overwriting records with new values, requires less storage.

On the other hand, if maintaining a history of data change is needed for analytical purposes, Type 2 SCD is more appropriate. Adding new records for each change with date periods, preserving historical data, while complex to implement and potentially requiring more storage, is valuable when historical analysis is a priority.

Considerations for choosing between these methods also include storage capacity, query and reporting needs, and the overall requirements of your business. Often, a mix of SCD types is used in the same data warehouse. Different dimensions may employ different strategies based on their unique characteristics and the specific needs of the business.

The aim is to find the right balance between simplicity, efficient storage, and the level of historical detail required for effective analysis and decision-making.

Determining whether to opt for type 1, 2, 3, or 4 is really a matter of preference and what makes sense to your business.

--

--

Sai Teja
Sai Teja

Written by Sai Teja

Data || Machine Learning || Fitness

No responses yet