Published on

Dimensional Modeling - Part 2: Basic Dimension Table Techniques

Authors
Article Cover
Table of Contents

1. Degenerate Dimension

A degenerate dimension is a concept within data warehousing and database design. It refers to a dimension that consists of a single attribute, typically a key or identifier, found in a fact table. This type of dimension is commonly encountered in scenarios involving transactional or accumulating snapshot fact tables.

For instance, consider a fact table that records sales transactions. In this context, the TransactionId can be regarded as a degenerate dimension. Unlike conventional dimensions, degenerate dimensions lack descriptive information or hierarchical structure.

Furthermore, they don't find a place in any separate dimension table as they are directly associated with the fact records themselves.

This streamlined approach can help reduce join complexity and storage overhead in data warehousing environments.

Fig 1. Illustration for Degenerated Dimension

Fig 1. Illustration for Degenerated Dimension

2. Conformed Dimension

A conformed dimension is descriptive master reference data that’s referenced in multiple dimensional models. Conformed dimensions are a fundamental element of the Kimball approach. Conformed dimensions allow DW/BI users to consistently slice-and-dice performance metrics from multiple business process data sources. Conformed dimensions allow data from different sources to be integrated based on common, unified dimension attributes. Finally, conformed dimensions allow a dimension table to be built and maintained once rather than recreating slightly different versions during each development cycle.

Reference: Design tip 135 conformed dimensions as the foundation for agile data warehousing

In a more specific scenario, ProductId and DateID are found in the Fact Sales and Fact Inventory tables and these identifiers establish essential links to the Dim Product and Dim Date dimensions, respectively.

This linkage designates the Dim Product and Dim Date dimensions as conformed dimensions within this particular context. This underscores the vital role of conformed dimensions, which play a pivotal role in harmonizing data across analyses, ensuring data uniformity and precise insights from various perspectives.

Moreover, calendar-date dimension extends beyond its role as a prime example of a conformed dimension and multiple hierarchies in dimensions - many dimensions inherently encompass more than one natural hierarchy. For instance, the calendar-date dimensions can encompass both a day-to-week-to-fiscal period hierarchy and a day-to-month-to-year hierarchy.

Fig 2. Illustration for Conformed Dimension

Fig 2. Illustration for Conformed Dimension

3. Role-Playing Dimension

Fig 3. Illustration for Role-Playing Dimensions

Fig 3. Illustration for Role-Playing Dimensions

A single physical dimension can be referenced multiple times in a fact table, with each reference linking to a logically distinct role for the dimension

Reference: Role-playing Dimension

For example, a fact table can have several dates, each of which is represented by a foreign key to the date dimension.  It is essential that each foreign key refers to a separate view of the date dimension so that the references are independent. These separate dimension views (with unique attribute column names) are called roles.

Role-playing dimensions are most commonly associated with accumulating snapshot fact tables due to their time-dependent nature, they can also be applied to transaction and periodic fact tables. The key is to determine whether analyzing data from different viewpoints within the same table provides meaningful insights and enhances your analytical capabilities.

For example, in an order fulfillment process, using a Date dimension with roles like placed date, paid date, ship date, and received date can provide insights into the different stages of order processing over time.

4. Junk Dimension

Transactional business processes typically produce a number of miscellaneous, low-cardinality* flags and indicators. Rather than making separate dimensions for each flag and attribute, you can create a single junk dimension combining them together. This dimension, frequently labeled as a transaction profile dimension in a schema, does not need to be the Cartesian product of all the attributes’ possible values, but should only contain the combination of values that actually occur in the source data.

Reference: Junk Dimension

*Low-cardinality implies that the attribute has a small number of possible values or options. This is in contrast to high-cardinality, where an attribute has a large number of distinct values.

Fig 4. Illustration for Junk Dimensions

Fig 4. Illustration for Junk Dimensions

When utilizing the cross-join technique, the SQL query generates what is known as a junk dimension key. This key is created by generating all possible combinations of rows, resulting in an n x m matrix, where n represents the number of distinct values in one dimension (e.g., dim_product), and m represents the number of distinct values in another dimension (e.g., dim_promo). The resulting key is a unique identifier for each combination of attributes.

This key is crafted by merging specific attributes extracted from both the dim_product and dim_promo tables. The purpose of this key is to act as a reference point within the fact table, allowing data analysts to easily associate and analyze related attributes. By consolidating these combinations into a single junk dimension, the data model becomes more streamlined, efficient, and conducive to insightful analysis.

5. Outrigger Dimension

Outrigger dimensions are similar to snowflake dimensions, but with a narrower focus. It become useful when we need to dig deep into specific columns for detailed data analysis. These dimensions involve tables or entities within a hierarchy that are connected to the fact table but not directly.

Outrigger dimensions come into play when a dimension table or entity has a relationship with another dimension. Their main role is to offer extra, unrelated information that enhances the overall data. The connection is made by linking the primary key of an outrigger with the foreign key of a dimension table or entity, creating a bridge between the two.

Fig 5. Illustration for Outrigger Dimension

Fig 5. Illustration for Outrigger Dimension

For example, consider a scenario where a DimPromo needs to reference a separate dimension that represents the PromoDateID - start date of the promotion. These secondary dimension references are called outrigger dimensions.

6. Slowly Changing Dimension

6.0. Type 0: Retain original

Retains the original data without tracking historical changes. This method is straightforward but lacks historical insight.

6.1. Type 1: Overwrite

In a Type 1 Slowly Changing Dimension (SCD), changes to attributes result in the direct replacement of old data with new information. Historical data is not preserved, and only the most current data is maintained. This approach is suitable for scenarios where historical tracking isn't critical, and the focus is on real-time updates. While it ensures simplicity, it might not be suitable for situations requiring historical analysis.

Fig 6. SCD Type 1

Fig 6. SCD Type 1

6.2. Type 2: Add new row

Type 2 SCD involves adding new rows to the dimension table whenever a change occurs. Each new row represents a different version of the data, preserving the historical record.

A surrogate key is typically assigned to each row for identification. While this approach captures historical data comprehensively, it can increase storage requirements and may lead to more complex querying due to multiple versions of the same entity.

Fig 7. SCD Type 2

Fig 7. SCD Type 2

6.3. Type 3: Add new attribute

Type 3 SCD maintains the original structure of Type 1 and Type 2 dimensions but introduces additional columns to record changes. This approach allows you to track specific changes, such as the most recent historical value.

For example, a product dimension might have an extra column to store the previous price. This method is simpler to manage than Type 2 but provides a limited scope of historical data.

Fig 8. SCD Type 1

Fig 8. SCD Type 3

6.4. Type 4: Add mini-dimension

In a Type 4 SCD, a separate table, often referred to as a mini-dimension or history table is created to store historical data. The main dimension table retains only the current data, while the mini-dimension maintains the historical information. Fact tables reference both keys for queries, allowing for efficient analysis of historical changes. This method optimizes storage and provides a good balance between historical tracking and query performance.

Fig 9. SCD Type 4

Fig 9. SCD Type 4

6.5. Type 5: Add mini-dimension and Type 1 outrigger

Consider a Product dimension with Type 4 historical tracking. With Type 5, a "current profile" key is added directly within the Product dimension.

This key represents the latest version of the product profile. Changes to the product's attributes update both the historical records in the Type 4 dimension and the "current profile" key in the base dimension. This approach combines elements of both Type 4 and Type 1 methodologies.

6.6. Type 6: Add Type 1 attributes to Type 2 dimension

Type 6 introduces an interesting twist by embedding a Type 1 attribute within a Type 2 dimension. In this scenario, an attribute that would typically follow the Type 1 strategy is added to a Type 2 dimension table. This attribute behaves as an alternate value of a normal Type 2 attribute.

Fig 10. SCD Type 6

Fig 10. SCD Type 6

For example, let's say you have a Type 2 dimension for product price. In a Type 6 scenario, an alternate attribute such as current price and previous price is added. Whenever the current price attribute changes, it overwrites the existing value in a Type 1 manner. This hybrid approach allows certain attributes to be updated quickly while maintaining historical tracking for the main Type 2 attributes.

6.7. Type 7: Dual Type 1 and Type 2 dimensions

Another approach involves placing both the surrogate and the natural key directly into the fact table. This setup provides users with flexibility in selecting the right dimension records based on various factors:

  1. Primary Effective Date: The date when the fact record becomes effective.
  2. Most Recent Information: Access to the latest or current data.
  3. Other Associated Dates: Selection based on any other relevant dates linked to the fact record.
Fig 11. SCD Type 7

Fig 11. SCD Type 7

For instance, consider a situation where you're monitoring Product Price . In this case, a Type 2 product dimension would handle historical changes over time. To enhance this approach, a Type 7 fact table could link to the product dimension using two keys: one key for tracking historical changes and another known as a durable key which remains constant even when attributes change. This durable key ensures consistent reference to the product, unaffected by attribute modifications.