View
226
Download
6
Category
Preview:
Citation preview
MIS 335 - Database SystemsSchema Refinement and Normal Forms
Ahmet Onur Durahimhttp://www.mis.boun.edu.tr/durahim/
Learning Objectives
• Anomalies
• Functional Dependencies
• Normal Forms
– 1stNF, 2ndNF, 3rdNF, Boyce-Codd (BCNF)
• Decompositions
– Lossless-Join Decompositions
– Dependency Preserving Decompositions
credit: Yücel Saygın
Schema Refinement
• Redundant (useless) storage of information is the root cause of problems
– Storing the same information redundantly => in more than one place within a database
• Refinement approach based on decompositions
– A relation with redundancy is refined by decomposition => replacing it with smaller relationsthat contain the same information without redundancy
Anomalies• Modification (Update) anomalies
– If one copy of the repeated data is updated, an inconsistency is created unless all copies are updated similarly
• Insertion anomalies– It may not be possible to store certain information
unless some other, unrelated, information is stored
• Deletion anomalies– It may not be possible to delete certain information
without losing some other, unrelated, information as well
Anomalies• Insertion anomalies
– Cannot record filmType without starName
• Deletion anomalies– If we delete the last starName, we also lose the movie info.
title year length filmType studioName starName
Star Wars 1977 124 Action Fox Carrie Fisher
Star Wars 1977 124 Action Fox Mark Haill
Star Wars 1977 124 Action Fox Harrison Ford
Mighty Ducks 1991 104 Animation Disney Emilo Estevez
Wayne’s World 1992 95 Comedy Paramount Dana Carvey
Wayne’s World 1992 95 Comedy Paramount Mike Meyers
Decompositiontitle year length filmType studioName starName
Star Wars 1977 124 Action Fox Carrie Fisher
Star Wars 1977 124 Action Fox Mark Haill
Star Wars 1977 124 Action Fox Harrison Ford
Mighty Ducks 1991 104 Animation Disney Emilo Estevez
Wayne’s World 1992 95 Comedy Paramount Dana Carvey
Wayne’s World 1992 95 Comedy Paramount Mike Meyers
title year starName
Star Wars 1977 Carrie Fisher
Star Wars 1977 Mark Haill
Star Wars 1977 Harrison Ford
Mighty Ducks 1991 Emilo Estevez
Wayne’s World 1992 Dana Carvey
Wayne’s World 1992 Mike Meyers
title year length filmType studioName
Star Wars 1977 124 Action Fox
Mighty Ducks 1991 104 Animation Disney
Wayne’s World 1992 95 Comedy Paramount
Anomalies
• Modification (Update) anomalies– To update address of a student who occurs twice or more in a table,
we will have to update S_Address column in all rows. Otherwise, data will become inconsistent
• Insertion anomalies– Suppose for a new admission, we have a S_id, name, and address of
a student, but if student has not opted for any subjects yet, then we have to insert NULL there
• Deletion anomalies– If S_id 401 has only one subject and temporarily he drop it, when we
delete that row, entire student record will be deleted along with it
Schema Refinement
• Functional Dependencies, can be used to identify schemas with problems and to suggest refinements– Functional dependency is a kind of IC (Integrity
Constraint) that generalizes the concept of a key
– An instance r of a relational schema R satisfies FD: X Y where X, Y are non-empty sets of attributes in R• If t1.X = t2.X, then t1.Y = t2.Y
– t1 and t2 are two different tuples of r
– t1.X: projection of tuple t1 onto the attributes in X
• Decomposition is used for schema refinement
Functional Dependency Example
• Database of beverage drinkers
NAME ADDR BEVERAGE
LIKED MANUF
FAV
BEVERAGE
John Doe NY, Soho CocaCola Light CocaCola CocaCola
John Doe NY, Soho CocaCola CocaCola CocaCola
Elisa Day DC, Dupont Pepsi Light Pepsi Pepsi Max
Elisa Day DC, Dupont Fanta CocaCola Pepsi Max
Functional Dependency Example• title - year length• title - year filmType• title - year studioName• title - year length - filmType - studioName
TITLE YEAR LENGTH FILMTYPE studioName starName
Star Wars 1977 124 Action Fox Carrie Fisher
Star Wars 1977 124 Action Fox Mark Hamill
Star Wars 1977 124 Action Fox Harrison Ford
Mighty Ducks 1991 104 Animation Disney Emilio Estevez
Wayne’s World 1992 95 Comedy Paramount Dana Carvey
Wayne’s World 1992 95 Comedy Paramount Mike Meyers
Functional Dependencies (FDs)• A functional dependency X Y holds over relation R,
if for every allowable instance r of R:– t1 ∈ r, t2 ∈ r, 𝜋X(t1) = 𝜋X(t2) implies 𝜋Y(t1) = 𝜋Y(t2)– i.e., given two tuples in r, if the X values agree, then the Y
values must also agree. (X and Y are sets of attributes)
X Y Z
1 a p
2 b q
1 a r
2 b p
t1
t2
Functional Dependencies (FDs)
• Does the following relation instance satisfy X Y ?
X Y Z
1 a p
2 b q
1 a r
2 c p
Functional Dependencies (FDs)• A functional dependency X Y holds over relation R if,
for every allowable instance r of R:– t1 ∈ r, t2 ∈ r, 𝜋X(t1) = 𝜋X(t2) implies 𝜋Y(t1) = 𝜋Y(t2)– i.e., given two tuples in r, if the X values agree, then the Y
values must also agree. (X and Y are sets of attributes)
• An FD is a statement about all allowable relations– Must be identified based on semantics of application
– Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R
• K is a candidate key for R means that K R– However, K R does not require K to be minimal
– K being a primary key is a special case of an FD
Functional Dependencies (FDs)
• Does the following relation instance satisfy X Y ?
X Y Z
1 a p
2 b q
1 a r
3 b p
Functional Dependencies (FDs)
• If X is a candidate key, then X YZ ?
X Y Z
1 a p
2 b q
1 a r
3 b p
Functional Dependencies (FDs)
• If YZ X, can we say that YZ is a candidate key ?
X Y Z
1 a p
2 b q
1 a r
3 b p
Constraints on Entity Sets• Consider relation obtained from Hourly_Emps:
– Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
• Notation: We will denote this relation schema by listing the attributes: SNLRWH– This is really the set of attributes {S,N,L,R,W,H}.– Sometimes, we will refer to all attributes of a relation by using
the relation name• e.g., Hourly_Emps for SNLRWH
S N L R W HHourly_Emps
Constraints on Entity Sets
• Some FDs on Hourly_Emps:– ssn is the key: S SNLRWH – rating determines hrly_wages: R W
S N L R W H1 100
2 200
3 250
2 300
Hourly_Emps
• Did you notice anything wrong with the following instance ?
Constraints on Entity Sets
• Some FDs on Hourly_Emps:– ssn is the key: S SNLRWH – rating determines hrly_wages: R W
S N L R W H1 100
2 200
3 250
2 200
Hourly_Emps
• Salary should be the same for a given rating!
Example
• Problems due to R W:– Redundant Storage: The rating value 8 corresponds to the
hourly wage 10 (This association is repeated 3 times)– Update anomaly: Can we change W in just the 1st tuple of
SNLRWH?– Insertion anomaly: What if we want to insert an employee
and don’t know the hourly wage for his rating?– Deletion anomaly: If we delete all employees with rating 5,
we lose the information about the wage for rating 5
S N L R W H
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
S N L R W H
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
S N L R H
123-22-3666 Attishoo 48 8 40
231-31-5368 Smiley 22 8 30
131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
R W
8 10
5 7
Hourly_Emps2
Wages
Hourly_Emps
Reasoning About FDs
• Given some FDs, we can usually infer additional FDs:– ssn did, did lot implies ssn lot
• An FD f is implied by a given set F of FDs if f holds whenever all FDs in F hold– F+ = closure of F is the set of all FDs that are implied by F
• Armstrong’s Axioms (X, Y, Z are sets of attributes):– Reflexivity: If X ⊆ Y, then Y X (a trivial FD)
– Augmentation: If X Y, then XZ YZ for any Z
– Transitivity: If X Y and Y Z, then X Z
• These are sound and complete inference rules for FDs– S: generate only FDs in F+, C: generate all FDs in F+
Reasoning About FDs
• In the following schema
– SN S is a trivial FD (by reflexivity)
– since {S,N} is a superset of {S}
S N L R W H
Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
Reasoning About FDs
• In the following schema
– If SN RW, then SNL RWL (by augmentation)
S N L R W H
Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
Reasoning About FDs
• Couple of additional rules (that follow from AA):
– Union: If X Y and X Z, then X YZ
• Proof:
– From X Y, we have XX XY (by augmentation)
– Note that XX is X, therefore X XY
– From X Z, we have XY YZ (by augmentation)
– From X XY and XY YZ, we have X YZ (by transitivity)
Reasoning About FDs
• Couple of additional rules (that follow from AA):
– Decomposition: If X YZ, then X Y and X Z• Try to prove it at home/dorm/ICs
Reasoning About FDs• Example: Contracts(Cid,Sid,prJid,Did,Pid,Qty,Value),
where we denote schema as CSJDPQV and:– C is the key: C CSJDPQV– Project purchases each part using single contract: JP C– Department purchases at most one part from a supplier:
SD P
• JP C and C CSJDPQV imply JP CSJDPQV• SD P implies SDJ JP• SDJ JP and JP CSJDPQV imply SDJ CSJDPQV
• We cannot conclude that SD CSDPQV by cancelling J from both sides of SDJ CSJDPQV !!!
Example
• Suppose that we are given;
– a relation scheme R = (A,B,C,G,H,I)
– the set of functional dependencies F:
• A B : A C : CG H
• CG I : B H
• Is A H logically implied by F?
• Is AG I logically implied by F?
Reasoning About FDs• Computing the closure of a set of FDs (F+) can be
expensive– Size of closure is exponential in # attrs
• Example: – A database with 4 attributes (A,B,C,D)– F = {A B, B C}– Find the closure of F denoted by F+
– A A, A B, A C, B B, B C, C C, D D, – AB A, AB B, AB C, AC A, AC B, AC C, AD
A, AD B, AD C, AD D, BC B, BC C, BD B, BD C, BD D, CD C, CD D,
– ABC A, ABC B, ABC C, ABD A, ABD B, ABD C, ABD D, BCD B, BCD C, BCD D,
– ABCD A, ABCD B, ABCD C, ABCD D
Reasoning About FDs
• Computing the closure of a set of FDs (F+) can be expensive– Size of closure is exponential in # attrs
• Typically, we just want to check if a given FD, X Y, is in the closure of a set of FDs F
• An efficient check:– Compute attribute closure of X (denoted X+) wrt F:
• Set of all attributes A such that X A is in F+
• There is a linear time algorithm to compute this;– For each FD Y Z in F, if X+ is a superset of Y then add Z to X+
Reasoning About FDs
• Does F = {A B, B C, CD E} imply A E?– i.e., Is A E in the closure F+?
– Equivalently, is E in A+?
• Lets compute A+
– Initialize A+ to {A} : A+ = {A}
– From A B, we can add B to A+ : A+ = {A, B}
– From B C, we can add C to A+: A+ = {A, B, C}
– We can not add any more attributes, and A+ does not contain E• Therefore A E does not hold
DB Design Guidelines• Design a relation schema with a clearly defined
semantics
• Design the relation schemas so that there are no insertion, deletion, or modification anomalies
– If there may be anomalies, state them clearly
• Avoid attributes which may frequently have nullvalues as much as possible
• Make sure that relations can be combined by key-foreign key links
Normal Forms
• Normal forms are standards for a good DB schema (introduced by Codd in 1972)
• If a relation is in a certain normal form (such as BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/minimized.
• Normal forms help us decide if decomposing a relation helps
Normal Forms: 1NF
• First Normal Form: Relation in 1NF if every field containes only atomic values
– No set valued attributes (no lists or sets)
sid name phones
1 ali {5332344568,
2165533561}
2 veli …
3 ayse …
4 fatma …
First Normal FormStudent – Not in 1NF
Student – in 1NF
Normal Forms
• Role of FDs in detecting redundancy;
• Consider a relation R with 3 attributes, ABC
– No FDs hold: There is no redundancy here
– Given a FD, A B: Several tuples could have the same A value, and if so, they’ll all have the same B value
• This potential redundancy can be predicted using this FD information
Normal Forms: 2NF• Second Normal Form: Every non-prime (non-key)
attribute should be fully functionally dependent on every key (no partial dependency)– i.e., candidate keys
• In other words: “No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key”– Prime attribute: any attribute that is part of a key– Non-prime attributes: rest of the attributes
• Ex: If AB is a key, and C is a non-prime attribute, then if A C holds then A partially determines C– there is a partial functional dependency to a key
2nd Normal FormStudent – Not in 2NF
Student Age
Adam 15
Alex 14
Stuart 17
Student Subject
Adam Biology
Adam Math
Alex Math
Stuart Math
Student Age Subject
Adam 15 Biology
Adam 15 Math
Alex 14 Math
Stuart 17 Math
Student – in 2NF Student Subject – in 2NF
candidate key is {Student, Subject}
2nd Normal Form• Composite primary key is
[Customer ID, Store ID]• The non-key attribute is
[Purchase Location]
• Not in 2nd normal form– [Purchase Location] only depends on [Store ID]– [Store ID] is only part of the primary key
PURCHASE_DETAIL
PURCHASE STORE
2nd Normal Form• Composite primary key is
[Employee, Skill]• The non-key attribute is
[Current Work Location]
• Not in 2nd normal form– [Current Work Location] only depends on [Employee]– [Employee] is only part of the primary key
Normal Forms: 3NF• Relation R with FDs F is in Third Normal Form if, for
all X A in F+ (Zaniolo’s def.)– A ∈ X (called a trivial FD), (=> X contains A) or– X contains a key for R, (=> X is a superkey) or– A is part of some key for R (=> Every element of A-X is a
prime attribute (contained in some candidate key))
• R is in 2NF & there is no transitive functional dependency (Codd’s def.)– B is functionally dependent on A, and C is functionally
dependent on B. Therefore, C is transitively dependent on A via B
• If R is in 3NF, some redundancy is possible
What Does 3NF Achieve?• If 3NF violated by X A, one of the following holds:
– X is a subset of some key K (partial dependency)• We store (X, A) pairs redundantly
– X is not a proper subset of any key (transitive dependency)• There is a chain of FDs K X A, which means that we cannot associate an X
value with a K value unless we also associate an A value with an X value
• But: even if relation is in 3NF, these problems could arise– e.g., Reserves SBDC (C: Credit Card), S C, C S is in 3NF (SBD & CBD
are keys),• S C: Sailor uses a unique CreditCard to pay for reservations (Only key is SBD)
(S is not a key and C is not part of a key)– Hence not in 3NF (redundantly stored SC pairs)
• If also C S: Credit cards uniquely identify the owner (which means CBD is also a key)
– Hence in 3NF
– but for each reservation of sailor S, same (S, C) pair is stored
• There is a stricter normal form (BCNF)
Partial/Transitive Dependencies
• Partial Dependency
• Transitive Dependencies
Key Attributes X Attribute A Case 1: A not in KEY
Key Attributes X Attribute A Case 1: A not in KEY
Key Attributes XAttribute A Case 2: A is in KEY
Not violate 3NF
Violates 3NF
3rd Normal Form• [Book ID] (key) determines
[Genre ID]• [Genre ID] determines
[Genre Type]
• Not in 3rd normal form– [Book ID] determines [Genre Type] via [Genre ID]– There is transitive functional dependency
BOOK DETAIL
GENREBOOK
=> non-key attribute
3rd Normal Form• [Tournament, Year] is a
minimal set of attributes guaranteed to uniquely identify a row– candidate key for the table
• Not in 3rd normal form– [Tournament, Year] determines [Date of Birth] via [Winner]– Non-prime attribute [Date of Birth] is transitively dependent on
the candidate key
Tournament Winners
Winner Dates of BirthTournament Winners
=> non-key attribute
Boyce-Codd Normal Form (BCNF)
• Relation R with FDs F is in BCNF if, for all X A in F+
– A ∈ X (called a trivial FD), or
– X contains a key for R. (i.e., X is a superkey)
• In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints
KeyNonkey
Attr1Nonkey
Attr2Nonkey
AttrK
FDs in a BCNF Relation
X Y A
x y1 a
x y2 ?
Boyce-Codd Normal Form (BCNF)• BCNF ensures that No Redundancy in R can be predicted
using FDs alone– if a relation is in BCNF, every field of every tuple records a
piece of information that cannot be inferred (using only FDs) from the values in all other fields in relation instance
• If we are shown two tuples that agree upon the X value, we cannot infer the A value in one tuple from the A value in the other
• If example relation is in BCNF (where X A), the 2 tuples must be identical (X is a key since R in BCNF) – this situation cannot arise in relational DBs
BCNF
• FDs:– GPA rank– cid cname, cInstructor– sid sname, address, GPA
• Keys:– {sid, cid} • Not in BCNF
– Not every LHS of FDs contain a key– None of the FDs contain a key
Student Course
sid sname address GPA cid cname cInstructor rank
111 Onur 123 st. 3.8 335 DB sys Durahim 1
222 Ahmet 999 st. 2.9 335 DB sys Durahim 12
111 Onur 123 st. 3.8 413 Info sys Jackman 1
• Not even in 2nd NF– 2nd and 3rd FDs lead to
partial dependencies
BCNF
• FDs:– GPA rank– cid cname, cInstructor– sid sname, address, GPA
• Keys:– {sid, cid} • Not in BCNF
– Not every LHS of FDs contain a key– None of the FDs contain a key
Student Course
sid sname address GPA cid cname cInstructor rank
111 Onur 123 st. 3.8 335 DB sys Durahim 1
222 Ahmet 999 st. 2.9 335 DB sys Durahim 12
111 Onur 123 st. 3.8 413 Info sys Jackman 1
sid sname address GPA cid rank
111 Onur 123 st. 3.8 335 1
222 Ahmet 999 st. 2.9 335 12
111 Onur 123 st. 3.8 413 1
cid cname cInstructor
335 DB sys Durahim
413 Info sys Jackman
• Not even in 2nd NF– 2nd and 3rd FDs lead to
partial dependencies
BCNF
• FDs:– GPA rank– cid cname, cInstructor– sid sname, address, GPA
• Keys:– {sid, cid} • Not in BCNF
– Not every LHS of FDs contain a key– None of the FDs contain a key
Student Course
sid sname address GPA cid cname cInstructor rank
111 Onur 123 st. 3.8 335 DB sys Durahim 1
222 Ahmet 999 st. 2.9 335 DB sys Durahim 12
111 Onur 123 st. 3.8 413 Info sys Jackman 1
sid sname address GPA cid
111 Onur 123 st. 3.8 335
222 Ahmet 999 st. 2.9 335
111 Onur 123 st. 3.8 413
cid cname cInstructor
335 DB sys Durahim
413 Info sys Jackman
GPA rank
3.8 1
2.9 12
• Not even in 2nd NF– 2nd and 3rd FDs lead to
partial dependencies
BCNF
• FDs:– GPA rank– cid cname, cInstructor– sid sname, address, GPA
• Keys:– {sid, cid} • Not in BCNF
– Not every LHS of FDs contain a key– None of the FDs contain a key
Student Course
sid sname address GPA cid cname cInstructor rank
111 Onur 123 st. 3.8 335 DB sys Durahim 1
222 Ahmet 999 st. 2.9 335 DB sys Durahim 12
111 Onur 123 st. 3.8 413 Info sys Jackman 1
sid sname address GPA
111 Onur 123 st. 3.8
222 Ahmet 999 st. 2.9
cid cname cInstructor
335 DB sys Durahim
413 Info sys Jackman
GPA rank
3.8 1
2.9 12
sid cid
111 335
222 335
111 413
• All of these four tables are now in BCNF
BCNF sid instrid CourseCode OffHourAppnt
111 123 MIS335 12.10.2014
111 999 MIS413 13.10.2014
222 123 MIS335 14.10.2014
222 999 MIS413 15.10.2014• FDs:
– sid, instrid CourseCode, OffHourAppnt– courseCode instrid
• In 3NF BUT NOT in BCNF
BCNF sid instrid CourseCode OffHourAppnt
111 123 MIS335 12.10.2014
111 999 MIS413 13.10.2014
222 123 MIS335 14.10.2014
222 999 MIS413 15.10.2014• FDs:
– sid, instrid CourseCode, OffHourAppnt– courseCode instrid
• In 3NF BUT NOT in BCNF– No partial key or transitive key dependencies
BCNF sid instrid CourseCode OffHourAppnt
111 123 MIS335 12.10.2014
111 999 MIS413 13.10.2014
222 123 MIS335 14.10.2014
222 999 MIS413 15.10.2014• FDs:
– sid, instrid CourseCode, OffHourAppnt– courseCode instrid
• In 3NF BUT NOT in BCNF– No partial key or transitive key dependencies
– courseCode is not a superkey
sid CourseCode OffHourAppnt
111 MIS335 12.10.2014
111 MIS413 13.10.2014
222 MIS335 14.10.2014
222 MIS413 15.10.2014
instrid CourseCode
123 MIS335
999 MIS413
Normal Form Shortcuts
• All attributes are prime
– At least in 3NF
• Singleton keys
– At least in 2NF
Decomposition of a Relation Scheme
• Suppose that relation R contains attributes A1, ..., An
• A decomposition of R consists of replacing R by two or more relations such that:– Each new relation scheme contains a subset of the attributes
of R (and no attributes that do not appear in R), and
– Every attribute of R appears as an attribute of one of the new relations
• Intuitively, decomposing R means we will store instances of the relation schemes produced by the decomposition, instead of instances of R– e.g., Can decompose SNLRWH into SNLRH and RW
Decomposition of a Relation Scheme
• We can decompose SNLRWH into SNL and RWH
S N L R W H
S N L R W H
Example Decomposition• SNLRWH has FDs {S SNLRWH, R W}
– Is this in 3NF?– R W violates 3NF
• W values repeatedly associated with R values
• In order to fix the problem, we need to create a relation RW to store the R W associations, and to remove W from the main schema: – i.e., we decompose SNLRWH into SNLRH and RW
S N L R H R W
Problems with Decompositions• There are three potential problems to consider:
– Some queries become more expensive (Performance loss due to required joins)• e.g., How much did sailor Joe earn? (salary = W*H)
– Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation• Fortunately, not in the SNLRWH example.
– Checking some dependencies may require joining the instances of the decomposed relations• Fortunately, not in the SNLRWH example.
• Tradeoff: Must consider these issues vs. redundancy
R WS N L R H
Problems with DecompositionsWhat problems does a given decomposition cause, if any?
• Lossless-join property– Enables to recover any instance of the decomposed
relation from corresponding instances of the smaller relations• Given instances of the decomposed relations, we may not be able
to reconstruct the corresponding instance of the original relation!
• Dependency-preservation property– Enables us to enforce any constraint on the original
relation by simply enforcing some constraints on each of the smaller relations
– Checking some dependencies may require joining the instances of the decomposed relations
Lossless Join Decompositions
• Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r that satisfies F:– 𝝅X(r) ⋈ 𝝅Y(r) = r
• It is always true that r ⊆ 𝜋X(r) ⋈ 𝜋Y(r)– In general, the other direction does not hold!
– If it does, the decomposition is lossless-join.
• Definition extended to decomposition into 3 or more relations in a straightforward way
• It is essential that all decompositions used to deal with redundancy be lossless
Lossless Join• The decomposition of R into X and
Y is lossless-join wrt FDs F if and only if the closure of F (F+) contains:– X ⋂ Y X, or
– X ⋂ Y Y
• The attributes common to X and Y must contain a key for either X or Y
• If a FD U V holds over R and U ⋂ V is empty, the decomposition of R into R – V and UV is lossless
A B C
1 2 3
4 5 6
7 2 8
1 2 8
7 2 3
A B C
1 2 3
4 5 6
7 2 8
A B
1 2
4 5
7 2
B C
2 3
5 6
2 8
• Person(SSN, Name, Address, Hobby)• F = {SSN, Hobby Name, Address;
SSN Name, Address}
SSN Name Address Hobby
111111 Celalettin Sabanci D. Stamps
111111 Celalettin Sabanci D. Coins
555555 Elif Mutlukent Skating
555555 Elif Mutlukent Surfing
666666 Sercan Esentepe Math
SSN Hobby
111111 Stamps
111111 Coins
555555 Skating
555555 Surfing
666666 Math
SSN Name Address
111111 Celalettin Sabanci D.
555555 Elif Mutlukent
666666 Sercan Esentepe
Person
Person1 Hobby
• Person(SSN, Name, Address, Hobby)• F = {SSN, Hobby Name, Address;
SSN Name, Address}
SSN Name Address Hobby
111111 Celalettin Sabanci D. Stamps
111111 Celalettin Sabanci D. Coins
555555 Elif Mutlukent Skating
555555 Elif Mutlukent Surfing
666666 Sercan Esentepe Math
SSN Hobby
111111 Stamps
111111 Coins
555555 Skating
555555 Surfing
666666 Math
SSN Name Address
111111 Celalettin Sabanci D.
555555 Elif Mutlukent
666666 Sercan Esentepe
Problems with Decompositions (Contd.)
• Checking some dependencies may require joining the instances of the decomposed relations
Dependency Preserving Decomposition• Consider CSJDPQV, C is key, JP C and SD P
– SD is not a key, thus SD P causes a violation of BCNF– BCNF decomposition: CSJDQV and SDP– Problem: Checking JP C for each insertion requires a join
(expensive!) => decomposition is not dependency-preserving
• Dependency preserving decomposition:– A dependency X Y that appear in F should either appear in
one of the sub relations or should be inferred from the dependencies in one of the sub relations
• Projection of set of FDs F: If R is decomposed into X, ... projection of F onto X (denoted FX) is the set of FDs U V in F+ (closure of F) such that U, V are in X– Ex: R = ABC, F = {A B, B C, C A}
• F+ includes FDs, {A B, B C, C A, B A, A C, C B}• FAB = {A B, B A}, FAC = {C A, A C}
Dependency Preserving Decomposition
• Decomposition of R into schemas with attribute sets X and Y is dependency preserving if (FX ⋃ FY)+ = F+
– i.e., if we consider only dependencies in the closure F+
that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F+
• Important to consider F+, not F, in this definition:– ABC, {A B, B C, C A}, decomposed into AB and BC
– Is this dependency preserving? Is C A preserved???• F+ includes FDs, {A B, B C, C A, B A, A C, C B}
• FAB = {A B, B A}, FBC = {B C, C B},
• FAB U FBC = {A B, B A, B C, C B}
• Does the closure of FAB U FBC imply C A?
Dependency Preserving Decomposition
• Dependency preserving does not imply lossless join:
– Ex: ABC, A B, decomposed into AB and BC, is a lossy decomposition
• And vice-versa!
– Ex: CSJDPQV, {C is key, JP C and SD P}, decomposed into CSJDQV and SDP, is lossless but not dependency preserving
Normalization
• Converting relations to BCNF– Possible to obtain a lossless-join decomposition
into a collection of BCNF relation schemas
– But, there may be no dependency-preservingdecomposition into a collection of BCNF relation schemas
• Converting relations to 3NF– There is always a dependency-preserving, lossless-
join decomposition into a collection of 3NF relation schemas
Decomposition into BCNF• Consider relation R with FDs F
• If X Y violates BCNF, decompose R into R-Y and XY– Y is a single attribute and not in X
• Repeated application of this idea will give us a collection of relations that are in– BCNF, lossless join decomposition, and guaranteed to
terminate
• In general, several dependencies may cause violation of BCNF– The order in which we “deal with” them could lead to
very different sets of relations!
BCNF decomposition
• Given a relation R and FDs F for R
• Compute keys for R (using FDs)
• Repeat until all relations are in BCNF
– Pick any R’ with A B that violates BCNF
– Decompose R’ into R1(A,B) and R2(A, rest)
– Compute FDs for R1 and R2
– Compute keys for R1 and R2
Example: Decomposition into BCNF
• R = ABCDEFGH with FDs– ABH C : A DE : BGH F
– F ADH : BH GE
• Is R in BCNF?
• Which FD violates the BCNF ?– ABH C ?
• No, since ABH is a superkey
– A DE violates BCNF• Since attribute closure of A is ADE and therefore A is not a
superkey
• Decompose R = ABCDEFGH into R1 = ADE and R2 = ABCFGH
Example: Decomposition into BCNF
• R = ABCDEFGH with FDs– ABH C : A DE : BGH F – F ADH : BH GE
• R1 = ADE, F1 = {A DE} • R2 = ABCFGH, F2 = {ABH C, BGH F, F AH, BH G}
– New FDs are obtained by projecting the original FDs on the attributes in the new relations
– For example: BH GE is decomposed into {BH G, BH E} and BH E is not included in F1 or F2, BH G is included into R2
– Is the decomposition of R into R1 and R2 dependency preserving?
• R1 is in BCNF, but we need to apply the algorithm on R2 since it is not in BCNF
BCNF and Dependency Preservation
• In general, there may not be a dependency preserving decomposition into BCNF– e.g., CSZ, CS Z, Z C– Can’t decompose while preserving 1st FD; not in BCNF
• Similarly, decomposition of CSJDPQV into SDP, JS and CJDQV is not dependency preserving (w.r.t. the FDs JP C, SD P and J S)– However, it is a lossless join decomposition– In this case, adding JPC to the collection of relations
gives us a dependency preserving decomposition• JPC tuples stored only for checking FD! (Redundancy!)
Decomposition into 3NF
• The algorithm for lossless-join decomposition into BCNF can be used to obtain a lossless join decomposition into 3NF (typically, can stop earlier)
• To ensure dependency preservation, one idea:– If X Y is not preserved, add relation XY
– Problem is that XY may violate 3NF! • e.g., consider the addition of CJP to “preserve” JP C. What if
we also have J C ?
• Refinement: Instead of the given set of FDs F, use a minimal cover for F
Minimal Cover for a Set of FDs• Minimal cover FD set of G for a set of FDs F s.t.:
– The closure of F (F+) = The closure of G (G+)– Right hand side of each FD in G is a single attribute– If we modify G by deleting an FD or by deleting attributes
from an FD in G, the closure changes
• Intuitively, every FD in G is “needed”, and “as small as possible” in order to get the same closure as F– each attribute on the left side is necessary– the right side is a single attribute– Every dependency in it is required for the closure to be
equal to F+
• e.g., A B, ABCD E, EF GH, ACDF EG has the following minimal cover:– A B, ACD E, EF G and EF H
Minimal Cover for a Set of FDs• A B, ABCD E, EF GH, ACDF EG has the
following minimal cover by:
– ACDF EG => ACDF E and ACDF G
– EF GH => EF G and EF H
– ACDF G is implied by A B, ABCD E, EF GH
• A B => A AB => AC ABC => ACD ABCD => ACD E => ACDF EF => ACDF GH => ACDF G (can be deleted)
• A B => A AB => AC ABC => ACD ABCD => ACD E => ACDF EF => ACDF E (can be deleted)
• ABCD E can be replaced by ACD E since A B holds
– A B, ACD E, EF G and EF H
Obtaining the Minimal Cover• Algorithm Steps:
– Put the FDs in standard form
• single attribute on the right hand side
– Minimize the left hand side of each FD
• Check if an attribute can be deleted while preserving equivalence to closure of F
– Delete the redundant FDs
• It is necessary to minimize the left sides of FDs before checking for redundant FDs
Obtaining the Minimal Cover• Example: F = {ABCD E, E D, A B, AC D}
– Notice that the right hand sides have a single attribute • if not we had to decompose the right hand sides first
• Can we remove B from the left hand side of ABCD E?– Check if ACD E is implied by F
• In order to do this, find the attribute closure ACD wrt F
– If B is in the attribute closure, then ACD E is implied by F, and therefore we can replace ABCD E with ACD E • note that given ACD E, we have ABCD E
– A B => A AB => ACD ABCD => ACD E
• Can we remove D from ACD E– Check if AC E is implied by F’
• obtained by replacing ABCD E in F with ACD E
• F’’ = {AC E, E D, A B, AC D}– Can we drop any FDs in F’’?– Could we drop any FDs in F before minimizing the left hand sides?
Dependency Preserving Decomposition into 3rdNF
• Let R be the relation to be decomposed into 3rdNF and F be the FDs that is a minimal cover
• Algorithm Steps– Perform lossless-join decomposition of R into R1, R2, …, Rn– Project the FDs in F into F1, F2, …, Fn
• that correspond to R1, R2, …, Rn
– Identify the set of FDs that are not preserved• i.e., that are not in the closure of the union of F1, F2, …, Fn
– For each FD X A that is not preserved, create a relation schema XA and add it to the decomposition
Example• Consider the relation R, Contracts(CSJDPQV) with FDs
(C is a key):– JP C , SD P , J S
• Decomposed into R1(SDP), R2(CSJDQV)– R1 in BCNF, R2 not in 3NF
• Decompose R2(CSJDQV) into R3(JS), R4(CJDQV)– Both R3 and R4 in 3NF (in BCNF also)
• Decomposition of R into R1, R3, R4 is lossless-join• But not dependency-preserving (JP C is not preserved)
• Add R5(CJP) into relation
• Resulting decomposition is CSJDPQV into SDP, JS, CJDQV, CJP
Synthesis Approach• Consider the relation R, Contracts(CSJDPQV) with FDs (C is
a key):– JP C , SD P , J S
• Find minimal cover– C CSJDPQV into C S, C J , C D, C P, C Q, C V
• C S is implied by C J and J S• C P is implied by C S, C D and SD P
– Final set is (C J , C D, C Q, C V, JP C, SD P, J S)
• So add corresponding schemas for all the FDs in minimal cover– CJ, CD, CQ, CV, CJP, SDP, JS
• Improve this set by combining relations for which C is the key into CDJPQV– CDJPQV, SDP, JS
Minimal Cover• F is minimal set of FDs if each X Y is
– |Y| = 1– Left-reduced: X can’t be replaced by a subset– Non-redundant: X Y can’t be removed
• R(ABCDIJ) and F={ABE, AB DE, AC G} is given. Find minimal cover of F:– A B, A E AB D, AB E AC G– Left-reduced: A B, A E– A+={A,B,E,D}, so A D implied by F, and also from A D we
get AB D• instead of using AB D, we can use A D since we can derive it
from each direction
– For AC G we see immediately that it is already the left reduction, since neither A C nor A G cannot be deduced from F
• The minimal cover: F’ = {A B, A E, A D, AC G}
Dependency Preserving Lossless-join 3NF Decomposition Algorithm
• Dependency Preserving Lossless-join 3NF Decomposition Algorithm– Find minimal cover
– Put FDs agreeing on the LHS in the same schema
– Have extra schema for a key, if none of the above schemas contain a key
• Example: R(A,B,C,D,E,G,I,J)– MinCover: F’ = {A B, A E, A D, AC G}
– R1(ABDE), R2(ACG)
– R3(ACIJ)
BCNF vs 3NF
• BCNF: For every functional dependency X Y in a set F of functional dependencies over relation R, either: – Y is a subset of X, or
– X is a superkey of R
• 3NF: For every functional dependency X Y in a set F of functional dependencies over relation R, either: – Y is a subset of X, or
– X is a superkey of R, or
– Y is a subset of K for some key K of R• no subset of a key is a key
Overlapping Candidate Keys• BCNF acts differently from 3NF only when there are
multiple overlapping candidate keys.– The functional dependency X Y is true if Y is a subset of X– In any table that has only one candidate key and is in 3NF, it
is already in BCNF • because there is no column (either key or non-key) that is
functionally dependent on anything besides that key
• Assume that each pizza must have exactly one of each topping type– (Pizza, Topping Type) is a candidate key
• Also assume that a given topping cannot belong to different types simultaneously– (Pizza, Topping) must be unique and therefore is also a
candidate key
• We have two overlapping candidate keys
Determine Keys of a Relation from FDs
• Relation R(ABC)
• F={AB C, C B, C D}– Left={A}, Middle={B,C}, Right={D}
• Find attribute closure sets– A+={A}
– (AB)+={ABCD} => Key
– (AC)+={ABCD} => Key
• Prime attributes = {A, B, C}
• Non-prime attributes = {D}
Recommended