Jun 30, 2025·11 min read
ER Modeling and Normalization
This article covers Entity-Relationship (E-R) Modeling, which visually depicts the relationships between identifiable entities in a database. It discusses key concepts such as entities and their attributes, relationships and cardinality, and the importance of normalization (1NF, 2NF, 3NF) to optimize data structure and reduce redundancy for improved data integrity and efficiency.
1. E-R Modelling
1.1. RDBMS Logical Data Modeling / E-R Logical Data Model
Entity - Relationship Model is a fundamental concept in database design that:
- Is a type of flowchart that illustrates how "entities" such as people, objects or concepts relate to each other within a system
- Defines a conceptual data schema
- Defines the conceptual view of a database
- Illustrates the logical structure of database
The ER model serves as a blueprint for database design, helping developers understand the structure and relationships within the data before implementing the physical database.
1.2. Entity
An entity represents a distinguishable object or concept in the real world:
- An
entitycan be a real-world object, either animate or inanimate, that can be easily identifiable - All these entities have some attributes or properties that give them their identity
- An
entity setis a collection of similar types ofentities - An
entity setmay containentitieswith attributes sharing similar values Entitiescan be thought of as Nouns- Typically shown as a rectangle in ER diagrams
Examples of entities:
- Customer, Employee, Product, Order, Department
1.3. Relationships
Relationships define how entities are connected to each other:
Relationshipsrepresent logical associations between two or moreentities- The association among
entitiesis called a relationship - For example:
- An employee works at a department
- A student enrolls in a course
- A customer places an order
Relationshipsare represented by diamond-shaped boxes in ER diagrams
1.4. Cardinality
Cardinality defines the numerical relationship between entities:
- Cardinality defines the number of
entitiesin oneentity setthat can be associated with the number of entities of another set via relationship - Cardinality is a couple of natural numbers that specifies the minimum and maximum number of instances of a relationship in which an entity instance can participate
- It's possible to use the constant N to refer to a generic number greater than 1
- In the ER diagram, the cardinality of the relationship is labeled on the arc that links the entity to the relationship
Example Analysis:
In the relation between CONSULTANT and PROJECT PLANNING:
- 1 CONSULTANT is allocated to 1 or more Project Planning (e.g., MRO01 is planned to work on 1 or more projects) – one-to-many relation
- 1 PROJECT PLANNING can have one and only 1 consultant planned on it – one-to-one relation
In the relation between PROJECT and PROJECT PLANNING:
- 1 PROJECT can be managed with 1 or more Project Planning (e.g., project Fashion#1 is managed with two project planning) – one-to-many relation
- 1 PROJECT PLANNING is referred to one and only 1 project – one-to-one relation
2. Normalization
In database design, normalization is a systematic approach to organizing data efficiently:
Mathematical Analogy: In mathematics, a set B of elements of a vector space V is called a basis if every element of V can be written in a unique way as a finite linear combination of elements of B. Similarly, Normal Form is a systematic way to express every single tuple without losing information.
Key Definitions:
- Normal Form: A state at which relations are organized in accordance with specific requirements
- Normalization: The process of structuring a database in accordance with specific Normal Forms
If a relation has several concepts that are independent of each other, it is decomposed into smaller relations, one for each concept, until the desired Normal Form is reached.
2.1. First Normal Form (1NF)
For each relation contained in the database, a relation is in 1NF if and only if:
- Each attribute is defined over a domain with atomic (indivisible) values
- Each attribute contains a single value from that domain
Example: Global Tech Company — Employee / Project
The table below violates 1NF because PROJECT, PROJECT MANAGER, and PROJECT ALLOCATION contain multiple values per row:
| EMPLOYEE CODE | EMPLOYEE | LEVEL | OFFICE | ADDRESS | CITY | COUNTRY | ZIP | PROJECT | PROJECT MANAGER | PROJECT ALLOCATION |
|---|---|---|---|---|---|---|---|---|---|---|
| SJO01 | Sarah Johnson | MANAGER | TechCo Berlin | Friedrichstrasse, 8 | Berlin | Germany | 10117 | CloudMigration#1, Analytics, CoreInfra#1 | David Kim, Maria Rossi, James Carter | 2w, 1w, 1w /month |
| KSA01 | Kenji Sato | JUNIOR | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 | CloudMigration#1, CloudMigration#2 | David Kim, David Kim | 2w, 2w /month |
| FAL01 | Fatima Al-Sayed | SENIOR | TechCo Dubai | Sheikh Zayed Rd, 33 | Dubai | UAE | 00000 | CloudMigration#3, CoreInfra#1 | David Kim, James Carter | 3w, 1w /month |
| CME01 | Carlos Mendes | BA | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 | CloudMigration#2, Analytics#1 | David Kim, Maria Rossi | 3w, 1w /month |
To reach 1NF, flatten repeating groups so each cell holds one atomic value:
| EMPLOYEE CODE | EMPLOYEE | LEVEL | OFFICE | ADDRESS | CITY | COUNTRY | ZIP | PROJECT | PROJECT MANAGER | PROJECT ALLOCATION |
|---|---|---|---|---|---|---|---|---|---|---|
| SJO01 | Sarah Johnson | MANAGER | TechCo Berlin | Friedrichstrasse, 8 | Berlin | Germany | 10117 | CloudMigration#1 | David Kim | 2 week/month |
| SJO01 | Sarah Johnson | MANAGER | TechCo Berlin | Friedrichstrasse, 8 | Berlin | Germany | 10117 | Analytics | Maria Rossi | 1 week/month |
| SJO01 | Sarah Johnson | MANAGER | TechCo Berlin | Friedrichstrasse, 8 | Berlin | Germany | 10117 | CoreInfra#1 | James Carter | 1 week/month |
| KSA01 | Kenji Sato | JUNIOR | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 | CloudMigration#1 | David Kim | 2 week/month |
| KSA01 | Kenji Sato | JUNIOR | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 | CloudMigration#2 | David Kim | 2 week/month |
| FAL01 | Fatima Al-Sayed | SENIOR | TechCo Dubai | Sheikh Zayed Rd, 33 | Dubai | UAE | 00000 | CloudMigration#3 | David Kim | 3 week/month |
| FAL01 | Fatima Al-Sayed | SENIOR | TechCo Dubai | Sheikh Zayed Rd, 33 | Dubai | UAE | 00000 | CoreInfra#1 | James Carter | 1 week/month |
| CME01 | Carlos Mendes | BA | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 | CloudMigration#2 | David Kim | 3 week/month |
| CME01 | Carlos Mendes | BA | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 | Analytics#1 | Maria Rossi | 1 week/month |
Because a single EMPLOYEE CODE now appears on multiple rows (one per project), the primary key must be the composite [EMPLOYEE CODE, PROJECT].
2.2. Second Normal Form (2NF)
A database is in 2NF (2nd normal form) when:
- It is in 1st normal form, and
- All non-key attributes functionally depend on the entire key (no partial dependencies)
Example:
With primary key [EMPLOYEE CODE, PROJECT], the following partial dependencies violate 2NF:
| NON-PK ATTRIBUTE | DEPENDS ON | ENTIRE KEY? |
|---|---|---|
| EMPLOYEE | EMPLOYEE CODE only | No |
| LEVEL | EMPLOYEE CODE only | No |
| OFFICE | EMPLOYEE CODE only | No |
| ADDRESS | EMPLOYEE CODE only | No |
| CITY | EMPLOYEE CODE only | No |
| COUNTRY | EMPLOYEE CODE only | No |
| ZIP | EMPLOYEE CODE only | No |
| PROJECT MANAGER | PROJECT only | No |
| PROJECT ALLOCATION | EMPLOYEE CODE, PROJECT | Yes |
To reach 2NF, group each partial dependency into its own table:
EMPLOYEE — attributes that depend on EMPLOYEE CODE
| EMPLOYEE CODE | EMPLOYEE | LEVEL | OFFICE | ADDRESS | CITY | COUNTRY | ZIP |
|---|---|---|---|---|---|---|---|
| SJO01 | Sarah Johnson | MANAGER | TechCo Berlin | Friedrichstrasse, 8 | Berlin | Germany | 10117 |
| KSA01 | Kenji Sato | JUNIOR | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 |
| FAL01 | Fatima Al-Sayed | SENIOR | TechCo Dubai | Sheikh Zayed Rd, 33 | Dubai | UAE | 00000 |
| CME01 | Carlos Mendes | BA | TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 |
PROJECT — attributes that depend on PROJECT
| PROJECT | PROJECT MANAGER |
|---|---|
| CloudMigration#1 | David Kim |
| CloudMigration#2 | David Kim |
| CloudMigration#3 | David Kim |
| Analytics | Maria Rossi |
| Analytics#1 | Maria Rossi |
| CoreInfra#1 | James Carter |
EMPLOYEE_PROJECT — preserves the many-to-many relationship; PROJECT ALLOCATION depends on the entire key [EMPLOYEE CODE, PROJECT]
| EMPLOYEE CODE | PROJECT | PROJECT ALLOCATION |
|---|---|---|
| SJO01 | CloudMigration#1 | 2 week/month |
| SJO01 | Analytics | 1 week/month |
| SJO01 | CoreInfra#1 | 1 week/month |
| KSA01 | CloudMigration#1 | 2 week/month |
| KSA01 | CloudMigration#2 | 2 week/month |
| FAL01 | CloudMigration#3 | 3 week/month |
| FAL01 | CoreInfra#1 | 1 week/month |
| CME01 | CloudMigration#2 | 3 week/month |
| CME01 | Analytics#1 | 1 week/month |
2.3. Third Normal Form (3NF)
A database is in 3NF (3rd normal form) if it is:
- In 2nd normal form, and
- All non-key attributes depend on the key only (no transitive dependencies)
There should be no non-key attribute that depends on another non-key attribute.
Example:
In the EMPLOYEE table, ADDRESS, CITY, COUNTRY, and ZIP do not depend directly on EMPLOYEE CODE — they depend on OFFICE, which itself depends on EMPLOYEE CODE. That is a transitive dependency: EMPLOYEE CODE → OFFICE → {ADDRESS, CITY, COUNTRY, ZIP}. Notice "TechCo Tokyo / 1 Marunouchi / Tokyo / Japan / 100-0005" repeats for both Kenji Sato and Carlos Mendes — redundancy.
To reach 3NF, move the office location attributes into a dedicated OFFICE table:
EMPLOYEE
| EMPLOYEE CODE | EMPLOYEE | LEVEL | OFFICE |
|---|---|---|---|
| SJO01 | Sarah Johnson | MANAGER | TechCo Berlin |
| KSA01 | Kenji Sato | JUNIOR | TechCo Tokyo |
| FAL01 | Fatima Al-Sayed | SENIOR | TechCo Dubai |
| CME01 | Carlos Mendes | BA | TechCo Tokyo |
OFFICE
| OFFICE | ADDRESS | CITY | COUNTRY | ZIP |
|---|---|---|---|---|
| TechCo Berlin | Friedrichstrasse, 8 | Berlin | Germany | 10117 |
| TechCo Tokyo | 1 Marunouchi | Tokyo | Japan | 100-0005 |
| TechCo Dubai | Sheikh Zayed Rd, 33 | Dubai | UAE | 00000 |
PROJECT and EMPLOYEE_PROJECT remain unchanged from 2NF. The database is now in 3NF.