m ai.

blog

Jun 30, 2025·11 min read

ER Modeling and Normalization

This article covers Entity-Relationship (E-R) Modeling, which visually depicts the relationships between identifiable entities in a database. It discusses key concepts such as entities and their attributes, relationships and cardinality, and the importance of normalization (1NF, 2NF, 3NF) to optimize data structure and reduce redundancy for improved data integrity and efficiency.

Article Cover

1. E-R Modelling

1.1. RDBMS Logical Data Modeling / E-R Logical Data Model

Entity - Relationship Model is a fundamental concept in database design that:

  • Is a type of flowchart that illustrates how "entities" such as people, objects or concepts relate to each other within a system
  • Defines a conceptual data schema
  • Defines the conceptual view of a database
  • Illustrates the logical structure of database

The ER model serves as a blueprint for database design, helping developers understand the structure and relationships within the data before implementing the physical database.

1.2. Entity

An entity represents a distinguishable object or concept in the real world:

  • An entity can be a real-world object, either animate or inanimate, that can be easily identifiable
  • All these entities have some attributes or properties that give them their identity
  • An entity set is a collection of similar types of entities
  • An entity set may contain entities with attributes sharing similar values
  • Entities can be thought of as Nouns
  • Typically shown as a rectangle in ER diagrams

Examples of entities:

  • Customer, Employee, Product, Order, Department

1.3. Relationships

Relationships define how entities are connected to each other:

  • Relationships represent logical associations between two or more entities
  • The association among entities is called a relationship
  • For example:
    • An employee works at a department
    • A student enrolls in a course
    • A customer places an order
  • Relationships are represented by diamond-shaped boxes in ER diagrams

1.4. Cardinality

Cardinality defines the numerical relationship between entities:

  • Cardinality defines the number of entities in one entity set that can be associated with the number of entities of another set via relationship
  • Cardinality is a couple of natural numbers that specifies the minimum and maximum number of instances of a relationship in which an entity instance can participate
  • It's possible to use the constant N to refer to a generic number greater than 1
  • In the ER diagram, the cardinality of the relationship is labeled on the arc that links the entity to the relationship
CONSULTANTPROJECT_PLANNINGOFFICEPROJECTALLOCATEDASSIGNEDPLANNED(1,N)(1,1)(1,1)(1,N)(1,1)(1,N)

Example Analysis:

In the relation between CONSULTANT and PROJECT PLANNING:

  • 1 CONSULTANT is allocated to 1 or more Project Planning (e.g., MRO01 is planned to work on 1 or more projects) – one-to-many relation
  • 1 PROJECT PLANNING can have one and only 1 consultant planned on it – one-to-one relation

In the relation between PROJECT and PROJECT PLANNING:

  • 1 PROJECT can be managed with 1 or more Project Planning (e.g., project Fashion#1 is managed with two project planning) – one-to-many relation
  • 1 PROJECT PLANNING is referred to one and only 1 project – one-to-one relation

2. Normalization

In database design, normalization is a systematic approach to organizing data efficiently:

Mathematical Analogy: In mathematics, a set B of elements of a vector space V is called a basis if every element of V can be written in a unique way as a finite linear combination of elements of B. Similarly, Normal Form is a systematic way to express every single tuple without losing information.

Key Definitions:

  • Normal Form: A state at which relations are organized in accordance with specific requirements
  • Normalization: The process of structuring a database in accordance with specific Normal Forms

If a relation has several concepts that are independent of each other, it is decomposed into smaller relations, one for each concept, until the desired Normal Form is reached.

2.1. First Normal Form (1NF)

For each relation contained in the database, a relation is in 1NF if and only if:

  • Each attribute is defined over a domain with atomic (indivisible) values
  • Each attribute contains a single value from that domain

Example: Global Tech Company — Employee / Project

The table below violates 1NF because PROJECT, PROJECT MANAGER, and PROJECT ALLOCATION contain multiple values per row:

EMPLOYEE CODEEMPLOYEELEVELOFFICEADDRESSCITYCOUNTRYZIPPROJECTPROJECT MANAGERPROJECT ALLOCATION
SJO01Sarah JohnsonMANAGERTechCo BerlinFriedrichstrasse, 8BerlinGermany10117CloudMigration#1, Analytics, CoreInfra#1David Kim, Maria Rossi, James Carter2w, 1w, 1w /month
KSA01Kenji SatoJUNIORTechCo Tokyo1 MarunouchiTokyoJapan100-0005CloudMigration#1, CloudMigration#2David Kim, David Kim2w, 2w /month
FAL01Fatima Al-SayedSENIORTechCo DubaiSheikh Zayed Rd, 33DubaiUAE00000CloudMigration#3, CoreInfra#1David Kim, James Carter3w, 1w /month
CME01Carlos MendesBATechCo Tokyo1 MarunouchiTokyoJapan100-0005CloudMigration#2, Analytics#1David Kim, Maria Rossi3w, 1w /month

To reach 1NF, flatten repeating groups so each cell holds one atomic value:

EMPLOYEE CODEEMPLOYEELEVELOFFICEADDRESSCITYCOUNTRYZIPPROJECTPROJECT MANAGERPROJECT ALLOCATION
SJO01Sarah JohnsonMANAGERTechCo BerlinFriedrichstrasse, 8BerlinGermany10117CloudMigration#1David Kim2 week/month
SJO01Sarah JohnsonMANAGERTechCo BerlinFriedrichstrasse, 8BerlinGermany10117AnalyticsMaria Rossi1 week/month
SJO01Sarah JohnsonMANAGERTechCo BerlinFriedrichstrasse, 8BerlinGermany10117CoreInfra#1James Carter1 week/month
KSA01Kenji SatoJUNIORTechCo Tokyo1 MarunouchiTokyoJapan100-0005CloudMigration#1David Kim2 week/month
KSA01Kenji SatoJUNIORTechCo Tokyo1 MarunouchiTokyoJapan100-0005CloudMigration#2David Kim2 week/month
FAL01Fatima Al-SayedSENIORTechCo DubaiSheikh Zayed Rd, 33DubaiUAE00000CloudMigration#3David Kim3 week/month
FAL01Fatima Al-SayedSENIORTechCo DubaiSheikh Zayed Rd, 33DubaiUAE00000CoreInfra#1James Carter1 week/month
CME01Carlos MendesBATechCo Tokyo1 MarunouchiTokyoJapan100-0005CloudMigration#2David Kim3 week/month
CME01Carlos MendesBATechCo Tokyo1 MarunouchiTokyoJapan100-0005Analytics#1Maria Rossi1 week/month

Because a single EMPLOYEE CODE now appears on multiple rows (one per project), the primary key must be the composite [EMPLOYEE CODE, PROJECT].

2.2. Second Normal Form (2NF)

A database is in 2NF (2nd normal form) when:

  1. It is in 1st normal form, and
  2. All non-key attributes functionally depend on the entire key (no partial dependencies)

Example:

With primary key [EMPLOYEE CODE, PROJECT], the following partial dependencies violate 2NF:

NON-PK ATTRIBUTEDEPENDS ONENTIRE KEY?
EMPLOYEEEMPLOYEE CODE onlyNo
LEVELEMPLOYEE CODE onlyNo
OFFICEEMPLOYEE CODE onlyNo
ADDRESSEMPLOYEE CODE onlyNo
CITYEMPLOYEE CODE onlyNo
COUNTRYEMPLOYEE CODE onlyNo
ZIPEMPLOYEE CODE onlyNo
PROJECT MANAGERPROJECT onlyNo
PROJECT ALLOCATIONEMPLOYEE CODE, PROJECTYes

To reach 2NF, group each partial dependency into its own table:

EMPLOYEE — attributes that depend on EMPLOYEE CODE

EMPLOYEE CODEEMPLOYEELEVELOFFICEADDRESSCITYCOUNTRYZIP
SJO01Sarah JohnsonMANAGERTechCo BerlinFriedrichstrasse, 8BerlinGermany10117
KSA01Kenji SatoJUNIORTechCo Tokyo1 MarunouchiTokyoJapan100-0005
FAL01Fatima Al-SayedSENIORTechCo DubaiSheikh Zayed Rd, 33DubaiUAE00000
CME01Carlos MendesBATechCo Tokyo1 MarunouchiTokyoJapan100-0005

PROJECT — attributes that depend on PROJECT

PROJECTPROJECT MANAGER
CloudMigration#1David Kim
CloudMigration#2David Kim
CloudMigration#3David Kim
AnalyticsMaria Rossi
Analytics#1Maria Rossi
CoreInfra#1James Carter

EMPLOYEE_PROJECT — preserves the many-to-many relationship; PROJECT ALLOCATION depends on the entire key [EMPLOYEE CODE, PROJECT]

EMPLOYEE CODEPROJECTPROJECT ALLOCATION
SJO01CloudMigration#12 week/month
SJO01Analytics1 week/month
SJO01CoreInfra#11 week/month
KSA01CloudMigration#12 week/month
KSA01CloudMigration#22 week/month
FAL01CloudMigration#33 week/month
FAL01CoreInfra#11 week/month
CME01CloudMigration#23 week/month
CME01Analytics#11 week/month

2.3. Third Normal Form (3NF)

A database is in 3NF (3rd normal form) if it is:

  1. In 2nd normal form, and
  2. All non-key attributes depend on the key only (no transitive dependencies)

There should be no non-key attribute that depends on another non-key attribute.

Example:

In the EMPLOYEE table, ADDRESS, CITY, COUNTRY, and ZIP do not depend directly on EMPLOYEE CODE — they depend on OFFICE, which itself depends on EMPLOYEE CODE. That is a transitive dependency: EMPLOYEE CODE → OFFICE → {ADDRESS, CITY, COUNTRY, ZIP}. Notice "TechCo Tokyo / 1 Marunouchi / Tokyo / Japan / 100-0005" repeats for both Kenji Sato and Carlos Mendes — redundancy.

To reach 3NF, move the office location attributes into a dedicated OFFICE table:

EMPLOYEE

EMPLOYEE CODEEMPLOYEELEVELOFFICE
SJO01Sarah JohnsonMANAGERTechCo Berlin
KSA01Kenji SatoJUNIORTechCo Tokyo
FAL01Fatima Al-SayedSENIORTechCo Dubai
CME01Carlos MendesBATechCo Tokyo

OFFICE

OFFICEADDRESSCITYCOUNTRYZIP
TechCo BerlinFriedrichstrasse, 8BerlinGermany10117
TechCo Tokyo1 MarunouchiTokyoJapan100-0005
TechCo DubaiSheikh Zayed Rd, 33DubaiUAE00000

PROJECT and EMPLOYEE_PROJECT remain unchanged from 2NF. The database is now in 3NF.