Design Theory for Relational Databases

1

Design Theory for Relational Databases

Functional DependenciesDecompositionsNormal Forms

2

Redundancy

Dependencies between attributes cause redundancy Ex. All addresses in the same town

have the same zip code

SSN Name Town Zip …….1234 Joe Stony Brook 11733 ……..4321 Mary Stony Brook 117335454 Tom Stony Brook 11733…………………. redundant

3

Redundancy and Other Problems

Set valued attributes result in multiple rows in corresponding table

Example: Person (SSN, Name, Address, Hobbies) A person entity with multiple hobbies yields multiple rows

in table Person• Hence, Name, Address stored redundantly

SSN should be the key, but instead (SSN, Hobby) is key of corresponding relation

• Person can’t describe people without hobbies!

SSN Name Address Hobby1111 Joe 123 Main biking1111 Joe 123 Main hiking …………….

4

Anomalies

Redundancy leads to anomalies: Update anomaly: A change in

Address must be made in several places

Deletion anomaly: Suppose a person gives up all hobbies. Do we:• Set Hobby attribute to null (no, since

Hobby is part of key)• Delete the entire row (no, since we lose

other information in the row) Insertion anomaly: Hobby value must

be supplied for any inserted row (since Hobby is part of key)

5

Decomposition

Solution: use two relations to store Person information Person1 (SSN, Name, Address) Hobbies (SSN, Hobby)

The decomposition is more general: people with hobbies can now be described

No update anomalies: Name and address stored once A hobby can be separately supplied

or deleted

6

Normalization Theory

Anomalies: update anomaly occurs if changing the value

of an attribute leads to an inconsistent database state.

insertion anomaly occurs if we cannot insert a tuple due to some design flaw.

deletion anomaly occurs if deleting a tuple results in unexpected loss of information.

Normalization is the systematic process for removing all such anomalies in database design, based on functional dependencies

7

Functional Dependencies

X ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y. Say “X ->Y holds in R.” Convention: …, X, Y, Z represent sets of

attributes; A, B, C,… represent single attributes.

Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.

8

Splitting Right Sides of FD’s

X->A1A2…An holds for R exactly when each of X->A1

X->A2,…, X->An hold for R.

Example: A->BC is equivalent to A->B and A->C.

There is no splitting rule for left sides. We’ll generally express FD’s with

singleton right sides.

9

Functional Dependencies Dependencies

for this relation: A B A D B,C E,F

Do they all hold in this instance of the relation R?

R A B C D E F a1 b1 c1 d1 e1 f1 a1 b1 c2 d1 e2 f3 a2 b1 c2 d3 e2 f3 a3 b2 c3 d4 e3 f2 a2 b1 c3 d3 e4 f4 a4 b1 c1 d5 e1 f1

• Functional dependencies are specified by the database programmer based on the intended meaning of the attributes.

10

Example: FD’s

Drinkers(name, addr, beersLiked, manf, favBeer)

Reasonable FD’s to assert:1. name -> addr favBeer

Note this FD is the same as name -> addr and name -> favBeer.

2. beersLiked -> manf

11

Example: Possible Data

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. BudSpock Enterprise Bud-Lite A.B. Bud

Because name -> addr Because name -> favBeer

Because beersLiked -> manf

12

Keys of Relations

K is a superkey for relation R if K functionally determines all of R.

K is a key for R if K is a superkey, but no proper subset of K is a superkey (or K is minimal superkey).

13

Example: Superkey


{name, beersLiked} is a superkey because together these attributes determine all the other attributes. name -> addr favBeer beersLiked -> manf

14

Example: Key

{name, beersLiked} is a key because neither {name} nor {beersLiked} is a superkey. name doesn’t -> manf; beersLiked

doesn’t -> addr. There are no other keys, but lots of

superkeys. Any superset of {name, beersLiked}.

15

Where Do Keys Come From?

1. Just assert a key K. The only FD’s are K -> A for all

attributes A.

2. Assert FD’s and deduce the keys by systematic exploration.

16

More FD’s From Derivation

Example: “no two courses can meet in the same room at the same time” tells us: hour room -> course.

17

Inferring FD’s We are given FD’s

F={X1 -> A1, X2 -> A2,…, Xn -> An }, and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD’s.

What if we what all possible FD’s that follow from F? Example: If A -> B and B -> C hold,

surely A -> C holds, even if we don’t say so.

Important for design of good relation schemas.

18

Closure of FDs If F is a set of FDs on schema R and f

is another FD on R, then F entails f if every instance r of R that satisfies F also satisfies f Ex: F = {A B, B C} and f is A C

If Streetaddr Town and Town Zip then Streetaddr Zip

The closure of F, denoted F+, is the set of all FDs entailed by F

F and G are equivalent if F entails G and G entails F

19

Computing the Closure F+: Armstrong’s Axioms of FDs Reflexivity: If Y X then X Y

(trivial FD) Name, Address Name

Augmentation: If X Y then X Z YZ If Town Zip then

Town, Name Zip, Name Transitivity: If X Y and Y Z then

X Z

20

Other derived rules

Union: If X Y and X Z, then X YZ Derivation using only Armstrong’s Axioms:

• X YX (augment), YX YZ (augment)• thus X YZ (transitive)

Decomposition: If X YZ, then X Y and X Z Derivation using only Armstrong’s Axioms:

• YZ Y (reflexive), thus X Y (transitive)• YZ Z (reflexive), thus X Z (transitive)

F

AB C AB BCD A D AB BD AB BCDE AB CDE

D E BCD BCDE

21

Generating F+

Thus, AB BD, AB BCD, AB BCDE, and AB CDE are all elements of F+

unionaug trans

aug

decomp

22

Computing the Closure F+ via

Attribute Closure(Y+) Calculating attribute closure is a more efficient way of checking entailment

The attribute closure of a set of attributes Y with respect to a set of functional dependencies, F, (denoted YF

+ or just Y+) is the set of all attributes,

A, such that F entails Y A Checking entailment: Given a set of

FDs, F, then Y A if and only if A Y+

23

Closure Test

F={X1 -> A1, X2 -> A2,…, Xn -> An } Test to compute the closure of Y,

denoted Y +. Basis: Y + = Y. Induction: Look for an FD’s left side

Xi that is a subset of the current Y +. If the FD is Xi -> Ai, add Ai to Y +.

24

Computation of Attribute Closure

AB C (a) A D (b) D E (c) AC B (d)

AB C (a) A D (b) D E (c) AC B (d)

Problem: Compute the attribute closure of Y=AB With respect to the set of FDs:

Initially Y+ = Y = {AB}Solution:

25

Algorithm for Computing F+ via Attribute Closure

1. For each subset of attributes X, compute X +.2. Add non-trivial FDs: X ->A for all A in X + - X.3. Drop non-minimal FDs: XY ->A can be

dropped whenever we discover X ->A4. Some Tricks:

1.No need to compute the closure of the empty set or of the set of all attributes.

2.If we find X + = all attributes, so is the closure of any superset of X.

26

Example – Computing F+ via Attribute Closure

F = {AB C, A D, D E, AC B}

F = {AB C, A D, D E, AC B}

Y Y+ New FDs

A {A,D,E} A->EB {B} (trivial)C {C} (trivial)D {D,E} (original FD)E {E} (trivial) AB {A,B,C,D,E} (AB is a key) AB->CDEAC {A,B,C,D,E} (AC is a key) AC->BDEAD {A,D,E} (not minimal)AE {A,D,E} (not minimal)BC {B,C} (trivial)BD {B,D,E} BD->EBE {B,E} (trivial)CD {C,D,E} CD->ECE {C,E} (trivial)DE {D,E} (trivial)… (and so on)

Y Y+ New FDs

A {A,D,E} A->EB {B} (trivial)C {C} (trivial)D {D,E} (original FD)E {E} (trivial) AB {A,B,C,D,E} (AB is a key) AB->CDEAC {A,B,C,D,E} (AC is a key) AC->BDEAD {A,D,E} (not minimal)AE {A,D,E} (not minimal)BC {B,C} (trivial)BD {B,D,E} BD->EBE {B,E} (trivial)CD {C,D,E} CD->ECE {C,E} (trivial)DE {D,E} (trivial)… (and so on)

27

Finding All Implied FD’s in a Projected Schema

Motivation: “normalization,” the process where we break a relation schema into two or more schemas.

Example: ABCD with FD’s AB ->C, C ->D, and D ->A. Decompose into ABC, AD What FD’s hold in ABC ?

• Not only AB ->C, but also C ->A !

Same as previous algorithm, but restrict to those FD’s that involve only attributes of the projected schema

28

Example: Projecting FD’s

ABC with FD’s A ->B and B ->C. Project onto AC. A +=ABC ; yields A ->B, A ->C.

• We do not need to compute AB + or AC +. B +=BC ; yields B ->C. C +=C ; yields nothing. BC +=BC ; yields nothing.

Resulting FD’s: A ->B, A ->C, andB ->C.

Projection onto AC : A ->C. Only FD that involves a subset of {A,C }.

29

Normal Forms & Relation Decomposition

30

Relational Schema Design

Goal of relational schema design is to avoid anomalies and redundancy. Update anomaly : one occurrence of a fact

is changed, but not all occurrences. Deletion anomaly : valid fact is lost when a

tuple is deleted Insertion anomaly : cannot insert a valid

fact when some information is unknown (i.e., extraneous attributes in primary key)

31

Example of Bad Design


name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway ??? WickedAle Pete’s ???Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ???’s can be figuredout by using the FD’s name -> addr favBeer andbeersLiked -> manf.

32

This Bad Design AlsoExhibits Anomalies

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. Bud

• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.• Insertion anomaly: Cannot insert Picard if we do not know what beers he likes

33

Decompositions Do we need to decompose a relation?

Several normal forms for relations. If schema in these normal forms certain problems don’t arise

The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

What problems does decomposition cause? Lossless-join property: get original relation by

joining the resulting relations Dependency-preservation property: enforce

constraints on original relation by enforcing some constraints on resulting relations

Queries may require a join of decomposed relations!

34

Boyce-Codd Normal Form

We say a relation R is in BCNF if whenever X ->Y is a nontrivial FD that holds in R, X is a superkey. Remember: nontrivial means Y is not

contained in X. Remember, a superkey is any

superset of a key (not necessarily a proper superset).

35

Example

Drinkers(name, addr, beersLiked, manf, favBeer)FD’s: name->addr favBeer, beersLiked->manf

Only key is {name, beersLiked}. In each FD, the left side is not a

superkey. Any one of these FD’s shows

Drinkers is not in BCNF

36

Another Example

Beers(name, manf, manfAddr)FD’s: name->manf, manf->manfAddr Only key is {name} . name->manf does not violate BCNF,

but manf->manfAddr does.

37

Decomposition into BCNF

Given: relation R with FD’s F. Look among the given FD’s for a BCNF

violation X ->Y. If any FD following from F violates BCNF,

then there will surely be an FD in F itself that violates BCNF.

Compute X +. Not all attributes, or else X is a superkey.

38

Decompose R Using X -> Y

Replace R by relations with schemas:

1. R1 = X +.

2. R2 = X + (R – X +)

1. Project given FD’s F onto the two new relations.

39

Example: BCNF Decomposition


F = name->addr, name -> favBeer,beersLiked->manf

Pick BCNF violation name->addr. Close the left side: {name}+ = {name,

addr, favBeer}. Decomposed relations:

Drinkers1(name, addr, favBeer) Drinkers2(name, beersLiked, manf)

40

Example -- Continued

We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.

Projecting FD’s is easy here. For Drinkers1(name, addr, favBeer),

relevant FD’s are name->addr and name->favBeer. Thus, {name} is the only key and

Drinkers1 is in BCNF.

41

Example -- Continued

For Drinkers2(name, beersLiked, manf), the only FD is beersLiked->manf, and the only key is {name, beersLiked}.

Violation of BCNF. beersLiked+ = {beersLiked, manf},

so we decompose Drinkers2 into: Drinkers3(beersLiked, manf) Drinkers4(name, beersLiked)

42

Example -- Concluded

The resulting decomposition of Drinkers :1. Drinkers1(name, addr, favBeer)2. Drinkers3(beersLiked, manf)3. Drinkers4(name, beersLiked)

Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.

Documents

Design Theory for Relational Databases