Dbs101_unit5
Unit V: Relational Database Design - The Secret Science Behind Smart Databases
๐ Introduction
This unit isnโt just about writing queries โ itโs about designing beautiful, reliable, and efficient databases. Welcome to the world of Relational Design, Normalization, and Functional Dependencies.
If SQL were a city, this unit is where we design the blueprint for how everything fits together. Think of it as teaching your database how to stay tidy, avoid gossip (redundancy), and deal with life changes smoothly (anomalies). ๐
๐งฑ Lesson 12: Building Better Schemas โ Designing with Integrity
โจ Features of a Good Relational Design
| Feature | Why It Matters |
|---|---|
| โ One Relation per Entity | Keeps data organized |
| โ Fewer NULLs | Prevents confusion |
| โ No Spurious Tuples | Accurate joins |
| โ No Redundancy | Saves space + avoids errors |
| โ No Modification Anomalies | Smooth updates |
๐จ 1. One Relation per Entity
Each table (relation) should describe a single thing (e.g., Student, Course). Donโt mix attributes from different entities in one table โ use foreign keys instead!
๐ Example:
Instead of mixing instructor and department in one table (in_dep), split them:
- Instructor(ID, name, salary, dept_name)
- Department(dept_name, building, budget)
๐ 2. Minimize NULLs
NULLs = unknowns. Too many of them make your database messy and harder to manage.
๐ Tip: Move frequently NULL columns to a separate table and use the primary key as a link.
๐ 3. No Spurious Tuples (Lossless Joins)
Bad design = weird results from joins. To prevent this, designs should guarantee a lossless join.
๐ก Definition: A join is lossless if no extra (spurious) tuples appear when we decompose and rejoin tables.
Formula:
ฯR1(r) โ ฯR2(r) = r
๐ซ 4. Avoid Redundancy
Storing the same data in multiple places = recipe for disaster.
Problems caused by redundancy:
- Extra storage
- Repeated data entry
- Complex deletes/updates
- Anomalies!
โ ๏ธ 5. No Modification Anomalies
| Type | What Happens |
|---|---|
| Update | Change in one place, not others โ inconsistency |
| Delete | Lose related data unintentionally |
| Insert | Canโt add data due to missing other info |
๐ญ Example: Canโt add a student unless they enroll in a course? Youโve got an insertion anomaly.
๐งช Normalization โ Cleaning Up the Clutter
Normalization = turning chaos into order.
Goal:
Split complex, messy tables into smaller, focused ones โ without losing data!
๐ Functional Dependencies (FDs)
FDs are rules that define how attributes relate.
Example:
A โ B means: If you know A, you must know B.
๐ง Armstrongโs Axioms
These are logic rules to find more dependencies:
- Reflexivity: If A โ B โ A โ B
- Augmentation: If A โ B โ AC โ BC
- Transitivity: If A โ B and B โ C โ A โ C
๐ Closure of FDs
Want to know everything a dependency set can tell you? Compute the closure!
Example:
F = { A โ B, B โ C }
Aโบ = { A, B, C }
๐ฅ Canonical Cover
Strip your dependencies down to the bare minimum (no extras!) while keeping meaning the same.
๐ Tip: Remove redundant attributes from both sides of FDs.
๐งฉ Normal Forms โ The Art of Designing Clean Databases
๐ Normal Forms
| Normal Form | What It Solves |
|---|---|
| 1NF | Atomic (no repeating groups) |
| 2NF | Removes partial dependencies |
| 3NF | Removes transitive dependencies |
| BCNF | Stronger version of 3NF |
| 4NF/5NF | Advanced multi-valued logic |
๐ Lossless Decomposition
Split large relations into smaller ones โ but donโt lose any data!
Goal:
R โ R1, R2 such that R1 โ R2 = R
โ Dependency Preservation
After decomposition, you should still be able to enforce all original FDs without joining everything back.
๐จ BCNF might break this! Thatโs why we sometimes prefer 3NF.
๐งฎ Testing Normal Forms
BCNF Check:
- Every non-trivial FD must have a superkey on the left side.
3NF Check:
- Same as BCNF or every dependent attribute is part of a candidate key.
๐งช Tip: Always start by finding the closure of attributes to check keys and violations.
๐ค Algorithms for Decomposition
๐ BCNF Algorithm:
- Find violating FD (where LHS isnโt superkey)
- Decompose relation
- Repeat for resulting relations
- Always gives lossless, BCNF-compliant schemas
โ ๏ธ Might lose dependency preservation.
โ๏ธ 3NF Algorithm (Synthesis):
- Compute canonical cover Fc
- Create new schema for each FD: ฮฑ โ ฮฒ
- Ensure at least one schema has a candidate key
- Remove redundant schemas
โ Always lossless
โ Always dependency preserving
โก Faster than BCNF decomposition!
๐ Lesson 13: Beyond Basics - MVDs, Atomicity, and Temporal Data
๐งฌ Multivalued Dependencies (MVD)
- BCNF doesnโt remove all redundancies.
- Example:
inst(ID, dept_name, name, street, city) - Instructor may have multiple addresses and departments.
Solution:
- Break into smaller tables:
- r1(ID, name)
- r2(ID, street, city)
Fourth Normal Form (4NF): Handles multivalued dependencies. X โโ Y, where X is a superkey.
๐ More Normal Forms
| Form | Purpose |
|---|---|
| 4NF | Removes multivalued dependency redundancy |
| 5NF / PJNF | Removes join dependency redundancy |
| DKNF | Perfect design with all constraints |
โก Reality Check: DKNF and PJNF are complex and rarely used.
๐งฑ Atomic Domains and First Normal Form (1NF)
- Atomic = Indivisible values.
- 1NF Rule: All attributes must be atomic.
Non-Atomic Problems:
- Redundancy
- Update complexity
- Difficult queries
Solution: Break complex attributes (e.g., phone_numbers โ phone1, phone2, etc.).
๐ Database Design Process
โ๏ธ Schema creation methods:
- Entity-Relationship (E-R) diagrams
- All-in-one large schema โ Normalize
- Ad-hoc design โ Normalize
Good Practices: Unique naming, listing keys first, clean E-R designs.
๐งน Normalize first for better scalability โ but sometimes denormalize for performance (carefully!).
โณ Modeling Temporal Data
๐งญ Why temporal data?
- Some data changes over time (addresses, course titles).
Handling Time:
- Add
start_dateandend_datecolumns. - Prevent overlapping valid periods with primary keys.
Challenges:
- Functional dependencies become time-sensitive.
- Primary/foreign keys must respect time.
๐ SQL:2011 added partial support for temporal data โ but not full temporal queries.
๐ก What Iโve Learned
- ๐ฏ How to design clean, efficient, and safe relational schemas
- ๐ Why NULLs, anomalies, and redundancy are dangerous
- ๐ง How to use Functional Dependencies and Armstrongโs rules
- ๐งฐ Algorithms to normalize data into 3NF or BCNF
๐ฑ Reflection
This unit transformed me from a query writer to a data architect.
I now see that behind every good database is a well-thought design โ not just good code.
๐ Conclusion
Databases arenโt just about queries. Theyโre about structure, logic, and efficiency. With FDs, Normal Forms, and Design Theory โ I can now build systems that scale, survive, and make data make sense.
๐ Did You Know? Edgar Codd, the father of relational databases, introduced normalization to reduce anomalies and improve integrity โ and weโre still using his rules today!