— series
Database & Data Engineering
11 articles
- 01
Relational Database Management System
This article provides an overview of Relational Database Management Systems (RDBMS), covering key features such as data representation through tables, transaction principles (Atomicity, Consistency, Isolation, Durability), and concepts like primary and foreign keys. It highlights the importance of data integrity, security, normalization, and referential integrity in maintaining valid relationships between tables for effective data management.
Read article → - 02
ER Modeling and Normalization
This article covers Entity-Relationship (E-R) Modeling, which visually depicts the relationships between identifiable entities in a database. It discusses key concepts such as entities and their attributes, relationships and cardinality, and the importance of normalization (1NF, 2NF, 3NF) to optimize data structure and reduce redundancy for improved data integrity and efficiency.
Read article → - 03
Data Modeling
This article explores the importance of Data Modeling as a foundational blueprint for organizing information within a business, aiding in the development of a data warehouse. It emphasizes the role of a Logical Data Model (LDM) in establishing frameworks for business intelligence and analytics, ensuring data consistency, quality, and effective communication. The article also contrasts Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) models and summarizes the transition from a Logical to a Physical Data Model (PDM) for enhanced database performance.
Read article → - 04
Dimensional Modeling - Part 0: 4-Step Design Process
This article outlines a structured approach to Data Design, detailing essential steps for creating effective data models that align with business needs. It covers gathering business requirements, conducting collaborative workshops, and following the Four-Step Dimensional Design Process. The article also features a use case for modeling data in a Fast Food outlet and discusses managing changes in dimension data using Slowly Changing Dimensions (SCDs).
Read article → - 05
Dimensional Modeling - Part 1: Basic Fact Table Techniques
In this article, I will introduce the concept of the Basic Fact table in Dimensional data modeling. To understand this technique, we will explore the different types of data modeling and recap some fundamental knowledge, including the star and snowflake schemas, and the concepts of normalization.
Read article → - 06
Dimensional Modeling - Part 2: Basic Dimension Table Techniques
The topics covered include Degenerate Dimension, Conformed Dimension, Role-Playing Dimension, Junk Dimension, Outrigger Dimension, and Slowly Changing Dimensions (SCD). The SCD category further delves into different types, such as Type 0 to Type 7, each with its unique approach to handling historical and changing data.
Read article → - 07
Dimensional Modeling - Part 3: Dimensions Hierarchy
This article examines various hierarchy types in data modeling, including Fixed Depth Positional Hierarchies, Slightly Ragged Hierarchies, and Ragged Hierarchies. Fixed Depth Hierarchies feature clear many-to-one relationships, such as product to brand, allowing for easy navigation and quick queries. The article discusses strategies for managing ragged hierarchies through the use of bridge tables and pathstring attributes to simplify analysis and improve performance.
Read article → - 08
Data Integration - Part 1: ETL, Pushdown and Data Orchestrator
This article discusses data integration, focusing on combining data from multiple sources into a unified view for better analysis. It contrasts ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) methodologies, explaining their approaches to data handling. The article also highlights three types of ETL pushdown techniques and the concept of data orchestrators.
Read article → - 09
Data Integration - Part 2: Loading Strategies, Change Data Capture and Data Layers
This article explores data loading methodologies, including batch and streaming approaches, and various loading strategies such as full and incremental loads. It also examines Change Data Capture (CDC) techniques and the layered architecture of modern data warehouses, from raw data ingestion to presentation marts.
Read article → - 10
Big Data: Concepts, Architecture, and Technologies
This article explores the world of Big Data, covering core concepts like the 4Vs (Volume, Velocity, Variety, Veracity), key technologies including Hadoop, Kafka, and Spark, and modern architectures such as Persistent Staging Areas and real-time processing systems. It provides a comprehensive overview of the technologies that emerged to address the challenges of modern data growth beyond traditional database capabilities.
Read article → - 11
Monolithic Data Lake vs Data Mesh
This article compares monolithic data lake architecture with the decentralized data mesh approach. While data lakes centralize data for easier access, they face scalability challenges. Data mesh treats data as a product owned by domain teams, enhancing agility through four key principles: domain-oriented ownership, data as a product, self-serve infrastructure, and federated governance.
Read article →