Published on

Data Modeling

Authors
Article Cover
Table of Contents

Data Modeling

Data modeling is the process of creating a conceptual representation of data and its relationships, which helps us understand how data is stored, accessed, updated, and queried within the database. The output of that process is a formal description of the data that has been modeled, which is called a data model. A data model can be compared to a roadmap, an architect's blueprint, or any formal diagram that facilitates a deeper understanding of what is being designed IBM Data Modeling

  • A conceptual data model consists of highly abstract levels of data elements called entities and relationships between entities. It presents a high-level overview of the database system through a visual representation of the entities it contains and their relationships to one another. The conceptual model provides the basis for the logical data model.
  • A logical data model builds on the conceptual model by providing a more detailed overview of the entities in their respective relationships. It identifies the attributes of each entity, defines the primary keys, and specifies the foreign keys. The logical data model serves as an intermediary between the high-level conceptual model and the physical implementation. It describes how the data will be organized in the database and provides a blueprint for building the physical database.
  • A physical data model is used to create the schema of the database, which is then implemented in the database management system. The physical data model defines the structure of the database, including the data types, constraints, and attributes. It specifies how the data will be stored in the specific database management system being used.

In conclusion, a conceptual data model is a high-level view of the database that is used to communicate with stakeholders about the requirements of the database. The conceptual data model can then be used to create a logical data model, which is a more detailed view of the database that includes the relationships between entities, and the constraints and rules that apply to the data. Finally, the logical data model can be used to create a physical data model, which is the actual database schema that is implemented in a database management system.

Data Warehouse Model (OLTP vs OLAP)

Understanding the difference between OLTP and OLAP models is crucial for effective data warehouse design.

CRITERIAOLTP DATABASEDATA WAREHOUSE
PurposeExecute business processEvaluate business process
Transaction TypeInsert, update, delete, selectSelect
Transaction stylePredefinedUnpredictable
Optimized forEfficiency and consistencyQuery performance/ease of use
Update FrequencyReal Time – business eventPeriodic – scheduled – near rt
Update concurrencyHighLow
Historical data accessRecentYears of history
Selection criteriaPrecise, narrowFuzzy, broad
ComparisonInfrequentFrequent
Query complexityLowHigh
JoinsFewMany
Transaction per dayMillionsThousands
Data volumesTeraPeta
DataRaw – detailed dataDetailed and aggregated data
Design techniqueER – 3rd Normal Form 3NFDimensional/other

From Logical Data Model to Physical Data Model

The transition from a Logical Data Model (LDM) to a Physical Data Model (PDM) is a critical step in database implementation.

Physical Table Structure Should Include:

  • Technical metadata (job_id, insert/update data, etc)
  • Technical columns for managing the data quality
  • Primary and/or secondary indexes of the table to improve performance
  • Table partitioning functions
  • Adding surrogate key to simplify join, improve performance, optimize space

References

  • Lecture notes from Business Analytics course, Prof. Paolo Menna, University of Verona 2024-2025
  • IBM. (n.d.). What is data modeling?