42
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms

Design Theory for Relational Databases

  • Upload
    reia

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Design Theory for Relational Databases. Functional Dependencies Decompositions Normal Forms. Redundancy. Dependencies between attributes cause redundancy Ex. All addresses in the same town have the same zip code. SSN Name Town Zip ……. - PowerPoint PPT Presentation

Citation preview

Page 1: Design Theory for Relational Databases

1

Design Theory for Relational Databases

Functional DependenciesDecompositionsNormal Forms

Page 2: Design Theory for Relational Databases

2

Redundancy

Dependencies between attributes cause redundancy Ex. All addresses in the same town

have the same zip code

SSN Name Town Zip …….1234 Joe Stony Brook 11733 ……..4321 Mary Stony Brook 117335454 Tom Stony Brook 11733…………………. redundant

Page 3: Design Theory for Relational Databases

3

Redundancy and Other Problems

Set valued attributes result in multiple rows in corresponding table

Example: Person (SSN, Name, Address, Hobbies) A person entity with multiple hobbies yields multiple rows

in table Person• Hence, Name, Address stored redundantly

SSN should be the key, but instead (SSN, Hobby) is key of corresponding relation

• Person can’t describe people without hobbies!

SSN Name Address Hobby1111 Joe 123 Main biking1111 Joe 123 Main hiking …………….

Page 4: Design Theory for Relational Databases

4

Anomalies

Redundancy leads to anomalies: Update anomaly: A change in

Address must be made in several places

Deletion anomaly: Suppose a person gives up all hobbies. Do we:• Set Hobby attribute to null (no, since

Hobby is part of key)• Delete the entire row (no, since we lose

other information in the row) Insertion anomaly: Hobby value must

be supplied for any inserted row (since Hobby is part of key)

Page 5: Design Theory for Relational Databases

5

Decomposition

Solution: use two relations to store Person information Person1 (SSN, Name, Address) Hobbies (SSN, Hobby)

The decomposition is more general: people with hobbies can now be described

No update anomalies: Name and address stored once A hobby can be separately supplied

or deleted

Page 6: Design Theory for Relational Databases

6

Normalization Theory

Anomalies: update anomaly occurs if changing the value

of an attribute leads to an inconsistent database state.

insertion anomaly occurs if we cannot insert a tuple due to some design flaw.

deletion anomaly occurs if deleting a tuple results in unexpected loss of information.

Normalization is the systematic process for removing all such anomalies in database design, based on functional dependencies

Page 7: Design Theory for Relational Databases

7

Functional Dependencies

X ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y. Say “X ->Y holds in R.” Convention: …, X, Y, Z represent sets of

attributes; A, B, C,… represent single attributes.

Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.

Page 8: Design Theory for Relational Databases

8

Splitting Right Sides of FD’s

X->A1A2…An holds for R exactly when each of X->A1

X->A2,…, X->An hold for R.

Example: A->BC is equivalent to A->B and A->C.

There is no splitting rule for left sides. We’ll generally express FD’s with

singleton right sides.

Page 9: Design Theory for Relational Databases

9

Functional Dependencies Dependencies

for this relation: A B A D B,C E,F

Do they all hold in this instance of the relation R?

R A B C D E F a1 b1 c1 d1 e1 f1 a1 b1 c2 d1 e2 f3 a2 b1 c2 d3 e2 f3 a3 b2 c3 d4 e3 f2 a2 b1 c3 d3 e4 f4 a4 b1 c1 d5 e1 f1

• Functional dependencies are specified by the database programmer based on the intended meaning of the attributes.

Page 10: Design Theory for Relational Databases

10

Example: FD’s

Drinkers(name, addr, beersLiked, manf, favBeer)

Reasonable FD’s to assert:1. name -> addr favBeer

Note this FD is the same as name -> addr and name -> favBeer.

2. beersLiked -> manf

Page 11: Design Theory for Relational Databases

11

Example: Possible Data

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. BudSpock Enterprise Bud-Lite A.B. Bud

Because name -> addr Because name -> favBeer

Because beersLiked -> manf

Page 12: Design Theory for Relational Databases

12

Keys of Relations

K is a superkey for relation R if K functionally determines all of R.

K is a key for R if K is a superkey, but no proper subset of K is a superkey (or K is minimal superkey).

Page 13: Design Theory for Relational Databases

13

Example: Superkey

Drinkers(name, addr, beersLiked, manf, favBeer)

{name, beersLiked} is a superkey because together these attributes determine all the other attributes. name -> addr favBeer beersLiked -> manf

Page 14: Design Theory for Relational Databases

14

Example: Key

{name, beersLiked} is a key because neither {name} nor {beersLiked} is a superkey. name doesn’t -> manf; beersLiked

doesn’t -> addr. There are no other keys, but lots of

superkeys. Any superset of {name, beersLiked}.

Page 15: Design Theory for Relational Databases

15

Where Do Keys Come From?

1. Just assert a key K. The only FD’s are K -> A for all

attributes A.

2. Assert FD’s and deduce the keys by systematic exploration.

Page 16: Design Theory for Relational Databases

16

More FD’s From Derivation

Example: “no two courses can meet in the same room at the same time” tells us: hour room -> course.

Page 17: Design Theory for Relational Databases

17

Inferring FD’s We are given FD’s

F={X1 -> A1, X2 -> A2,…, Xn -> An }, and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD’s.

What if we what all possible FD’s that follow from F? Example: If A -> B and B -> C hold,

surely A -> C holds, even if we don’t say so.

Important for design of good relation schemas.

Page 18: Design Theory for Relational Databases

18

Closure of FDs If F is a set of FDs on schema R and f

is another FD on R, then F entails f if every instance r of R that satisfies F also satisfies f Ex: F = {A B, B C} and f is A C

If Streetaddr Town and Town Zip then Streetaddr Zip

The closure of F, denoted F+, is the set of all FDs entailed by F

F and G are equivalent if F entails G and G entails F

Page 19: Design Theory for Relational Databases

19

Computing the Closure F+: Armstrong’s Axioms of FDs Reflexivity: If Y X then X Y

(trivial FD) Name, Address Name

Augmentation: If X Y then X Z YZ If Town Zip then

Town, Name Zip, Name Transitivity: If X Y and Y Z then

X Z

Page 20: Design Theory for Relational Databases

20

Other derived rules

Union: If X Y and X Z, then X YZ Derivation using only Armstrong’s Axioms:

• X YX (augment), YX YZ (augment)• thus X YZ (transitive)

Decomposition: If X YZ, then X Y and X Z Derivation using only Armstrong’s Axioms:

• YZ Y (reflexive), thus X Y (transitive)• YZ Z (reflexive), thus X Z (transitive)

Page 21: Design Theory for Relational Databases

F

AB C AB BCD A D AB BD AB BCDE AB CDE

D E BCD BCDE

21

Generating F+

Thus, AB BD, AB BCD, AB BCDE, and AB CDE are all elements of F+

unionaug trans

aug

decomp

Page 22: Design Theory for Relational Databases

22

Computing the Closure F+ via

Attribute Closure(Y+) Calculating attribute closure is a more efficient way of checking entailment

The attribute closure of a set of attributes Y with respect to a set of functional dependencies, F, (denoted YF

+ or just Y+) is the set of all attributes,

A, such that F entails Y A Checking entailment: Given a set of

FDs, F, then Y A if and only if A Y+

Page 23: Design Theory for Relational Databases

23

Closure Test

F={X1 -> A1, X2 -> A2,…, Xn -> An } Test to compute the closure of Y,

denoted Y +. Basis: Y + = Y. Induction: Look for an FD’s left side

Xi that is a subset of the current Y +. If the FD is Xi -> Ai, add Ai to Y +.

Page 24: Design Theory for Relational Databases

24

Computation of Attribute Closure

AB C (a) A D (b) D E (c) AC B (d)

AB C (a) A D (b) D E (c) AC B (d)

Problem: Compute the attribute closure of Y=AB With respect to the set of FDs:

Initially Y+ = Y = {AB}Solution:

Page 25: Design Theory for Relational Databases

25

Algorithm for Computing F+ via Attribute Closure

1. For each subset of attributes X, compute X +.2. Add non-trivial FDs: X ->A for all A in X + - X.3. Drop non-minimal FDs: XY ->A can be

dropped whenever we discover X ->A4. Some Tricks:

1.No need to compute the closure of the empty set or of the set of all attributes.

2.If we find X + = all attributes, so is the closure of any superset of X.

Page 26: Design Theory for Relational Databases

26

Example – Computing F+ via Attribute Closure

F = {AB C, A D, D E, AC B}

F = {AB C, A D, D E, AC B}

Y Y+ New FDs

A {A,D,E} A->EB {B} (trivial)C {C} (trivial)D {D,E} (original FD)E {E} (trivial) AB {A,B,C,D,E} (AB is a key) AB->CDEAC {A,B,C,D,E} (AC is a key) AC->BDEAD {A,D,E} (not minimal)AE {A,D,E} (not minimal)BC {B,C} (trivial)BD {B,D,E} BD->EBE {B,E} (trivial)CD {C,D,E} CD->ECE {C,E} (trivial)DE {D,E} (trivial)… (and so on)

Y Y+ New FDs

A {A,D,E} A->EB {B} (trivial)C {C} (trivial)D {D,E} (original FD)E {E} (trivial) AB {A,B,C,D,E} (AB is a key) AB->CDEAC {A,B,C,D,E} (AC is a key) AC->BDEAD {A,D,E} (not minimal)AE {A,D,E} (not minimal)BC {B,C} (trivial)BD {B,D,E} BD->EBE {B,E} (trivial)CD {C,D,E} CD->ECE {C,E} (trivial)DE {D,E} (trivial)… (and so on)

Page 27: Design Theory for Relational Databases

27

Finding All Implied FD’s in a Projected Schema

Motivation: “normalization,” the process where we break a relation schema into two or more schemas.

Example: ABCD with FD’s AB ->C, C ->D, and D ->A. Decompose into ABC, AD What FD’s hold in ABC ?

• Not only AB ->C, but also C ->A !

Same as previous algorithm, but restrict to those FD’s that involve only attributes of the projected schema

Page 28: Design Theory for Relational Databases

28

Example: Projecting FD’s

ABC with FD’s A ->B and B ->C. Project onto AC. A +=ABC ; yields A ->B, A ->C.

• We do not need to compute AB + or AC +. B +=BC ; yields B ->C. C +=C ; yields nothing. BC +=BC ; yields nothing.

Resulting FD’s: A ->B, A ->C, andB ->C.

Projection onto AC : A ->C. Only FD that involves a subset of {A,C }.

Page 29: Design Theory for Relational Databases

29

Normal Forms & Relation Decomposition

Page 30: Design Theory for Relational Databases

30

Relational Schema Design

Goal of relational schema design is to avoid anomalies and redundancy. Update anomaly : one occurrence of a fact

is changed, but not all occurrences. Deletion anomaly : valid fact is lost when a

tuple is deleted Insertion anomaly : cannot insert a valid

fact when some information is unknown (i.e., extraneous attributes in primary key)

Page 31: Design Theory for Relational Databases

31

Example of Bad Design

Drinkers(name, addr, beersLiked, manf, favBeer)

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway ??? WickedAle Pete’s ???Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ???’s can be figuredout by using the FD’s name -> addr favBeer andbeersLiked -> manf.

Page 32: Design Theory for Relational Databases

32

This Bad Design AlsoExhibits Anomalies

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. Bud

• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.• Insertion anomaly: Cannot insert Picard if we do not know what beers he likes

Page 33: Design Theory for Relational Databases

33

Decompositions Do we need to decompose a relation?

Several normal forms for relations. If schema in these normal forms certain problems don’t arise

The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

What problems does decomposition cause? Lossless-join property: get original relation by

joining the resulting relations Dependency-preservation property: enforce

constraints on original relation by enforcing some constraints on resulting relations

Queries may require a join of decomposed relations!

Page 34: Design Theory for Relational Databases

34

Boyce-Codd Normal Form

We say a relation R is in BCNF if whenever X ->Y is a nontrivial FD that holds in R, X is a superkey. Remember: nontrivial means Y is not

contained in X. Remember, a superkey is any

superset of a key (not necessarily a proper superset).

Page 35: Design Theory for Relational Databases

35

Example

Drinkers(name, addr, beersLiked, manf, favBeer)FD’s: name->addr favBeer, beersLiked->manf

Only key is {name, beersLiked}. In each FD, the left side is not a

superkey. Any one of these FD’s shows

Drinkers is not in BCNF

Page 36: Design Theory for Relational Databases

36

Another Example

Beers(name, manf, manfAddr)FD’s: name->manf, manf->manfAddr Only key is {name} . name->manf does not violate BCNF,

but manf->manfAddr does.

Page 37: Design Theory for Relational Databases

37

Decomposition into BCNF

Given: relation R with FD’s F. Look among the given FD’s for a BCNF

violation X ->Y. If any FD following from F violates BCNF,

then there will surely be an FD in F itself that violates BCNF.

Compute X +. Not all attributes, or else X is a superkey.

Page 38: Design Theory for Relational Databases

38

Decompose R Using X -> Y

Replace R by relations with schemas:

1. R1 = X +.

2. R2 = X + (R – X +)

1. Project given FD’s F onto the two new relations.

Page 39: Design Theory for Relational Databases

39

Example: BCNF Decomposition

Drinkers(name, addr, beersLiked, manf, favBeer)

F = name->addr, name -> favBeer,beersLiked->manf

Pick BCNF violation name->addr. Close the left side: {name}+ = {name,

addr, favBeer}. Decomposed relations:

Drinkers1(name, addr, favBeer) Drinkers2(name, beersLiked, manf)

Page 40: Design Theory for Relational Databases

40

Example -- Continued

We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.

Projecting FD’s is easy here. For Drinkers1(name, addr, favBeer),

relevant FD’s are name->addr and name->favBeer. Thus, {name} is the only key and

Drinkers1 is in BCNF.

Page 41: Design Theory for Relational Databases

41

Example -- Continued

For Drinkers2(name, beersLiked, manf), the only FD is beersLiked->manf, and the only key is {name, beersLiked}.

Violation of BCNF. beersLiked+ = {beersLiked, manf},

so we decompose Drinkers2 into: Drinkers3(beersLiked, manf) Drinkers4(name, beersLiked)

Page 42: Design Theory for Relational Databases

42

Example -- Concluded

The resulting decomposition of Drinkers :1. Drinkers1(name, addr, favBeer)2. Drinkers3(beersLiked, manf)3. Drinkers4(name, beersLiked)

Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.