28
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li

Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li

Embed Size (px)

Citation preview

Design Theory for Relational Databases

2015, FallPusan National University

Ki-Joune Li

2

Properties of Table

• When we design relational DB,o It is a set of relations.o Relations can be derived from UML diagram

• But NOT all relations are correct.o We should carefully observe the properties of tableo Functional Dependencyo Keyo Decomposition of Table

3

Definition of Functional Dependency

• FD (Functional Dependency) on a Relation Ro iff A1 A2 A3 … An B

where A1 , A2 , A3 , … , An , B are attributes of Ro A set of attributes A1 A2 A3 … An functionally determines

Bo More than one B’s

A1 A2 A3 … An B1

A1 A2 A3 … An B2…

A1 A2 A3 … An Bk

A1 A2 A3 … An B1 B2 … Bk

A1 A2 A3 … An B1 B2 B3 … Bk

4

Functional Dependency: Example

• A Relationo Movies (title, year, length, filmType, studioName,

starName)(title year) length(title year) filmType(title year) studioName(title year) length filmType studioName? (title year) starName : more than one star in a film

• It is important to discover FD in a relationo It helps to decide the correctness of relation design.

5

Key

• Given a relation Ro A set of one or more attributes {A1, A2, A3, …, An} is a KEY

iff the set functionally determines all other attributes and no proper subset of {A1, A2, A3, …, An} functionally determines

other attributes (Minimal)o Primary Key:

If a relation has more than one keys, a key is defined as primary key

o Super Key a set of attributes containing a key No minimality condition

• Exampleo Movies (title, year, length, filmType, studioName,

starName)o What are keys ?

6

How to discover keys

• From E-R Diagram: Underlined Attributeso It means that keys are defined based on the understanding of the real

worldo Example: Movies (title, year, length, filmType, studioName, starName)

(year, starName) is not key if a star can make more than one film per year (year, starName) is a key if a star is allowed to make only one film per year

• Relation (A1, A2, B) for relationship between R1 and R2o One-Oneo One-Manyo Many-Oneo Many-Many

7

Rules about Functional Dependencies

• Functional Dependencyo An important property of Relation (or Table)o Some interesting properties or rules of FD

• Transitive Ruleo A B and B C then A C

• Splitting/Combining Ruleo A1 A2 A3 …An B1, A1 A2 A3 …An B2, …, A1 A2 A3 … An Bk

iff A1 A2 A3 … An B1 B2 … Bk

• Trivial FD Rule: Given a FD A1 A2 A3 …An B o FD is trivial if B is one of {A1 A2 A3 …An} : really trivialo FD is Completely non-trivial: B is not in {A1 A2 A3 …An}

8

Rules about Functional Dependencies

• Trivial Dependency Ruleo A1 A2 … An B1 B2 … Bm is equivalent to A1 A2 … An C1 C2

… Ck

if {C1 C2 … Ck } { B1 B2 … Bm } and

for any C {C1 C2 … Ck }, C {A1 A2 … An }o Example:

(year, title) (studioName, year), (year, title) studioName

Unnecessary

A1 A2 A3 … An C1 C2 C3 … Ck

B1 B2 B3 … Bm

9

Armstrong's Axioms

• Reflexivity: (Trivial FD)If {C1 C2 … Ck } { B1 B2 … Bm }, then B1 B2 … Bm C1 C2 … Ck

• Augmentation:If A1 A2 … An B1 B2 … Bm , thenA1 A2 … An C1 C2 … Ck B1 B2 … Bm C1 C2 … Ck

• Transitivity:A1 A2 … An B1 B2 … Bm and B1 B2 … Bm C1 C2 … Ck , thenA1 A2 … An C1 C2 … Ck

10

Closure of Attributes

• Closure : {A1, A2, … An }+

o {A1 A2 … An } is a set of attributes and S is a set of FDo Closure of {A1 A2 … An } under FD's in S: set of attributes B such that

A1 A2 … An Bo That is, under all functional dependencies, every Bi that we derive

A1 A2 … An B1

A1 A2 … An B2

. . .A1 A2 … An Bk

then {A1 A2 … An }+ = {B1 ,B2 ,… , Bk }

11

Algorithm to Find Closure

• Input: Set of Attributes {A1, A2, … An }, and set S of FDs

• Output: {A1, A2, … An }+

• Process1. Split FDs that each FD has a single attribute on the right.

e.g. A1 A2 B C then Split it to A1 A2 B and A1 A2 C 2. Initialize X = {A1, A2, … An }3. Search for some FD

e.g. B1 B2 ... Bm C such that B1, B2 , .. Bm are in X but C not in X 4. Repeat 3 until no more attribute to add in X

• Exampleo Given attributes A, B, C, D, E, and Fo S: A B C, B C A D, D E, and C F B

What is { A, B } + ?

12

Closure and Key

• If {A1, A2, … An }+ is the set of all attributes of relation R,then A1, A2, … An is a super keyo Example: R (A, B, C, D, E) and S: A B C, B C A D, D E

then { A, B } + = {A, B, C, D, E} : all attributes of R. {A, B} is a super key of R.

• if no attribute can be removed to cover the all attributed, then it is a key.o Example:

if we remove B from {A, B} then {A} + is not {A, B, C, D, E} .therefore {A, B} is a key

13

Closing Set of Functional Dependencies

• Closing Set of FD set S:o Basis T of S: If we can derive S from a T, then T is a basis of S.o Remove all duplicated FDso Minimal Basis B satisfies three conditions

All the FD in B have one attribute in right side If any FD is removed from S, then some FD becomes no longer valid. If for any FD in B, we remove one or more attributes from the left

side, then the result is no more a basis

• Exampleo for a S={AB, AC, BA, BC, CA, CB}, what is the minimal basis

of S?{ABC, ACB, BCA}?

14

Bad Design: Anomalies

• Bad Design: Example

• Redundancy

• Update Anomaly

• Deletion Anomaly

Title Year Length Film Type StudioName StarName

Star Wars 1977 124 Color Fox Carrie Fisher

Star Wars 1977 124 Color Fox Mark Hamill

Star Wars 1977 124 Color Fox Harrison Ford

Mighty Ducks 1991 104 Color Disney Emilio Estevez

Wayne’s World 1992 95 Color Paramount Dana Carvey

Wayne’s World 1992 95 Color Paramount Mike Meyers

15

Decomposing Relations

• Decomposition of Bad Relationo A good way to remove the problem of bad relations

• Decomposition: Lossless Decompositiono { A1 A2 … An } { B1 B2 … Bm }, {C1 C2 … Ck } such that

{ B1 B2 … Bm } {C1 C2 … Ck } = { A1 A2 … An } and{ B1 B2 … Bm } {C1 C2 … Ck } {}

16

Decomposing Relations: Example

• R={title, year, length, filmType, studioName, starName} {title, year, length, filmType, studioName} (=R1), {title, year, starName} (=R2)

• Redundancy

• Update Anomaly

• Deletion Anomaly

Title Year Length Film Type StudioName

Star Wars 1977 124 Color Fox

Mighty Ducks 1991 104 Color Disney

Wayne’s World 1992 95 Color Paramount

Title Year StarName

Star Wars 1977 Carrie Fisher

Star Wars 1977 Mark Hamill

Star Wars 1977 Harrison Ford

Mighty Ducks 1991 Emilio Estevez

Wayne’s World 1992 Dana Carvey

Wayne’s World 1992 Mike Meyers

17

Normal Form: Conditions for Good Relation

• 1st Normal Form (1NF)• 2nd Normal Form (2NF)• 3rd Normal Form (3NF)• Boyce-Codd Normal Form (BCNF)

18

1st Normal Form

• 1NF: Every component of relation should be ATOMICo No Table in component o No Seto No List etc..

19

2nd Normal Form

• 2NFo 1NF ando None of the non-prime attributes of the relation is

functionally dependent on a part of a candidate key Partial Dependency on non-prime attribute

• Exampleo Player (Team, Number, TeamAddress, Name, Position)o 1NF but not 2NF

B

CA

20

Example

• Player (Team, Number, TeamAddress, Name, Position)o FD1: Team, Name Name, Positiono FD2: Team TeamAddresso Key: {Team, Name}+={Team, Number, TeamAddress, Name, Position}o in FD2, TeamAddress (non-prime attribute) is dependent on {Team},

which is a subset of the key and o 2NF violation

• Should be decomposedo R1(Team, Number, Name, Position) and R2(Team, TeamAddress)o R1 R2 = R

21

Example

Employee Skill Current Work LocationJones Typing 114 Main StreetJones Shorthand 114 Main StreetJones Whittling 114 Main Street

Roberts Light Cleaning 73 Industrial WayEllis Alchemy 73 Industrial WayEllis Juggling 73 Industrial Way

Harrison Light Cleaning 73 Industrial Way

Candidate Key: {Employee, Skill} Not 2ND

Partial FD: Employee Current Work Location Should be decomposed

(Employee, Skill), (Employee, Current Work Location)

22

3rd Normal Form

• 2NF: Every non-prime attributes of the relation must be non-transitively dependent on every candidate key

• Exampleo Team (TeamName, Address, ManagerID, ManagerHireDate)o FD:

TeamNameAddress, TeamNameManagerID (TeamName )ManagerID ManagerHireDate Key: {TeamName} 2NF but Not 3NF

o To be decomposed (TeamName, Address, ManagerID), (Manager SS ID,

ManagerHireDate)

BCA

23

Example: 2NF but NOT 3NFTournament Year Winner Winner Date of Birth

Indiana Invitational 1998 Al Fredrickson 21 July 1975Cleveland Open 1999 Bob Albertson 28 September 1968Des Moines Masters 1999 Al Fredrickson 21 July 1975Indiana Invitational 1999 Chip Masterson 14 March 1977

Candidate Key: {Tournament, Year} 2NF: No Partial Dependency Not 3ND

Transitive Functional Dependency {Tournament, Year} Winner Winner Date of Birth Should be decomposed

(Tournament, Year, Winner), (Player, Birth date}

24

Boyce-Codd Normal Form (BCNF)

• BCNF: For every one of its non-trivial functional dependencies X Y, X is a super key o Remember: nontrivial means A is not a member of set

X.o Remember, a superkey is any superset of a key (not

necessarily a proper superset)

• BCNF is slightly stronger than 3NF

25

1NF

2NF

3NF

Relationship between 1NF, 2NF, 3NF and BCNF

BCNF

26

Example: 3NF but NOT BCNF

Prof. ID Prof. SS ID Student ID1078 088-51-0074 31850 1078 088-51-0074 37921 1293 096-77-4146 46224 1480 072-21-2223 31850

A table to show the assignment of students

Candidate Keys {Prof. ID, Student ID} {Prof. SS ID, Student ID}

1NF 2NF: no partial FD on non-prime attributes on candidate key 3NF: No transitive FD NOT BCNF:

Prof. ID Prof. SS ID : Functional Dependency but not candidate key Should be decomposed (Prof. ID, Student ID), (Prof. ID, Prof. SS ID)

27

Decomposition

• Three Conditionso Elimination of Anomalies

Update Redundancy Deletion

o Lossless Decomposition Original Relation by Natural Join

o Preservation of Dependencies

• Relation with two attributes: Always in BCNF (why?)

28

BCNF Decomposition Algorithm

• Algorithmo Input: Relation R0 and set S0 of FDso Output: R1, R2, … Rn such that R0 =R1 R2 … Rn o Process

1. Check R0 is in BCNF, then return R0 2. If there is any BCNF violation with X Y, then compute X+. Then R1= X+ and R2 =has the rest attributes and X3. Decompose FD set S0 into S1 and S2.4. Repeat 1-3 until no more BCNF violation.

• Exampleo Team (TeamName, Address, ManagerID, ManagerHireDate)o FD:

TeamNameAddress, TeamNameManagerID ManagerID ManagerHireDate