67
7-1 Unit 4 Relational Database Design 英英英Chap 7 “Relational Database Design” 英英英 6 英 “英英英英英英英英”

7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

Embed Size (px)

Citation preview

Page 1: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-1

Unit 4

Relational Database Design

英文版: Chap 7 “Relational Database Design”中文版:第 6 章 “關聯式資料庫設計”

Page 2: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-2

Contents

4.1 Introduction

4.2 Functional Dependency

4.3 First, Second, and Third Normal Forms (1NF, 2NF, 3NF)

4.4 Boyce/Codd Normal Form (BCNF)

4.5 Fourth Normal Form (4NF)

4.6 Fifth Normal Form (5NF)

4.7 The Entity-Relationship Model

Page 3: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-3

4.1 Introduction Logical Database Design vs. Physical Database Design

Problem of Normalization

Normal Forms

Page 4: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-4

請查出本公司所採購,價格最貴的零件之相關資料,任取前二名。

(SP, PN, city, price, qty, weight, color)

select S.SN, P.PN, city, price, qty, weight, color from S, P, SP where S.SN=SP.SN and P.PN=SP.PN order by price desc limit 20;

Page 5: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-5

Logical Database Design

Logical database design vs. Physical database design

Logical Database Design• Normalization

• Semantic Modeling, eg. E-R model

Problem of Normalization• Given some body of data to be represented in a database, h

ow to decide the suitable logical structure they should have?

• what relations should exist?

• what attributes should they have?

正確性等特質 v.s. 運作效率

分割表格,考量內外要素

Page 6: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-6

Problem of Normalization

<e.g.> S1, Smith, 20, London, P1, Nut, Red, 12, 300S1, Smith, 20, London, P2, Bolt, Green, 17, 200 . .

S4, Clark, 20, London, P5, Cam, Blue, 12, 400

NormalizationSS# SNAME STATUS CITY

s1 . . London

. . . .

P SPP# ... ... ...

. . . .

. . . .

S# P# QTY

. . .

. . .

S'S# SNAME STATUS

S1 Smith . S2 . . . . .

P SP'P# ... ... ...

. . . .

. . . .

S# CITY P# QTY

S1 London P1 300

S1 London P2 200

. . . .

( 異常 )Redundancy Update Anomalies!

orE.g. 2: (‘A-Rod’,’R’,'R'…)資料隨著球員表,不必記在每場比賽

Dimension 最大的集合,保險起見,資料可能不會全部用到,但總有用到之時。

Page 7: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-7

Normal Forms A relation is said to be in a particular normal form if it satisfies

a certain set of constraints.

<e.g.> 1NF: A relation is in First Normal Form (1NF) iff it contains only atomic values.

universe of relations (normalized and un-normalized)

1NF relations (normalized relations)

2NF relations

3NF relations

4NF relationsBCNF relations

5NF relations

Fig. 4.1: Normal Forms

Page 8: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-8

4.2 Functional Dependency

Functional Dependency (FD)

Fully Functional Dependency (FFD)

Page 9: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-9

Functional Dependency

Functional Dependency

• Def: Given a relation R, R.Y is functionally dependent on R.X iff each X-value has associated with it precisely one Y-value (at any time).

• Note: X, Y may be the composite attributes.

Notation:R.X R.Y

read as "R.X functionally determines R.Y"

.

.

.

.

.

R X Y

E.g. : (ID,Name,…)身份證字號決定姓名也就決定;但也許 (ID, time) 才是這個表格的 Primary Key i.e. 同樣的 ID 會重複出現,而計入 (ID, time) 就不會重複出現

回顧: Primary Key, Candidate Key 有 F.D. 性質但就今天的課程而言,這是結果論。 (Key FD)

Page 10: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-10

Functional Dependency (cont.)

<e.g.1> S S.S# S.SNAME S.S# S.STATUS S.S# S.CITY S.STATUS S.CITY

FD Diagram:

S#

SNAME CITY

STATUS

S# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

S

Note: Assume STATUS is some factor of Supplier and no any relationship with CITY.

Page 11: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-11

Functional Dependency (cont.)

<e.g.2> P

<e.g.3> SP

P#

PNAME

CITY_P

COLOR

WEIGHT

S#

P#

QTY

If X is a candidate key of R, then all attributes Y

of R are functionally dependent on X. (i.e. X Y)

P# PNAME COLOR WEIGHT CITY_PP1 Nut Red 12 LondonP2 Bolt Green 17 ParisP3 Screw Blue 17 RomeP4 Screw Red 14 LondonP5 Cam Blue 12 ParisP6 Cog Red 19 London

P

S# P# QTY S1 P1 300

S1 P2 200 S1 P3 400 S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400

SP

Page 12: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-12

Fully Functional Dependency (FFD)

Def: Y is fully functionally dependent on X iff

• (1) Y is FD on X

• (2) Y is not FD on any proper subset of X.

<e.g.> SP' (S#, CITY, P#, QTY)

S#

P#QTY

FFD

S# CITY P# QTY

S1 London P1 300S1 London P2 200

… …. … ...

S#

P#CITY

FD

not FFDS# CITY

FD S# CITY… …..

S# city

S1 London

SP'

Naturally FFD,Since S# is not aComposite attribute.

Page 13: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-13

Fully Functional Dependency (cont.)

<Note> 1. Normally, we take FD to mean FFD.

2. FD is a semantic notion.

<e.g.> S# CITY

Means: each supplier is located in precisely one city.

3. FD is a special kind of integrity constraint.

CREATE INTEGRITY RULE SCFD

CHECK FORALL SX FORALL SY

(IF SX.S# = SY.S# THEN SX.CITY = SY.CITY);

4. FDs considered here applied within a single relation.

<e.g.> SP.S# S.S# is not considered!FD (FFD) 只適用於討論單一表格內部屬性 (attribute) 間的性質

但是 2NF 之後,這樣的指定變成多餘 ( 復習下頁 , SX=S’, SY=SP’)

Page 14: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-14

Problem of Normalization

<e.g.> S1, Smith, 20, London, P1, Nut, Red, 12, 300S1, Smith, 20, London, P2, Bolt, Green, 17, 200 . .

S4, Clark, 20, London, P5, Cam, Blue, 12, 400

NormalizationSS# SNAME STATUS CITY

s1 . . London

. . . .

P SPP# ... ... ...

. . . .

. . . .

S# P# QTY

. . .

. . .

S'S# SNAME STATUS

S1 Smith . S2 . . . . .

P SP'P# ... ... ...

. . . .

. . . .

S# CITY P# QTY

S1 London P1 300

S1 London P2 200

. . . .

( 異常 )Redundancy Update Anomalies!

orE.g. 2: (‘A-Rod’,’R’,'R'…)資料隨著球員表,不必記在每場比賽

Dimension 最大的集合,保險起見,資料可能不會全部用到,但總有用到之時。

Recall:

Page 15: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-15

4.3 First, Second, and Third Normal Forms (1NF, 2NF, 3NF)

Page 16: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-16

Normal Forms: 1NF Def: A relation is in 1NF iff all underlying simple domains contain atomic

values only.

S# STATUS CITY (P#, QTY)

S1 20 London {(P1, 300), (P2, 200), ..., (P6, 100)} S2 10 Paris {(P1, 300), (P2, 400)} S3 10 Paris {(P2, 200)} S4 20 London {(P2, 200), (P4, 300), (P5, 400)}

S# STATUS CITY P# QTY

S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400

Key:(S#,P#), Normalized 1NF

FIRST

Suppose 1. CITY is the main office of the supplier.

2. STATUS is some factor of CITY

fact

Page 17: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-17

1NF Problem: Update Anomalies!<1> Update If suppler S1 moves from London to Paris, then 6 tuples must be updated!

<2> Insertion Cannot insert a supplier information if it doesn't supply any part, because that will cause a null key value.

<3> Deletion

Delete the information that "S3 supplies P2", then the fact "S3 is located in Paris" is also deleted.

S# STATUS CITY P# QTY

. . . . . S3 20 Paris P2 300 . . . . . . . . . . S5 30 Athens NULL NULL

FIRST

S# STATUS CITY P# QTY

S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400

Key:(S#,P#), Normalized 1NF

FIRST

1NF 還必須改進之處Insert(S1,20,”Seattle”,P7,60) : inconsistency

不一起改會有 inconsistency 的問題,如人 - 住址

Page 18: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-18

1NF Problem: Update Anomalies! (cont.)

<e.g.> Suppose 1. CITY is the main office of the supplier.

2. STATUS is some factor of CITY

Primary key of FIRST: (S#, P#)

FD diagram of FIRST:

S#

P# CITY

STATUS

QTYFFD

x FFD

x FFD

FFD

FFD

FD:1. S# STATUS2. S# CITY3. CITY STATUS

4. (S#, P#) QTY

FFD

FFD

primary key (S#, P#) STATUS

primary key (S#, P#) CITY

S# STATUS CITY P# QTY

S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400

Key:(S#,P#), Normalized 1NF

FIRST

此處 “ CITYSTATUS” 必須到講解 2NF/3NF 才會用到,但因第一次出現 FD diagram 故全畫出來

Page 19: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-19

Normal Form: 2NF

Def: A relation R is in 2NF iff

(1) R is in 1NF (i.e. atomic ) (2) Non-key attributes are FFD on primary key. (e.g. QTY, STATUS, CITY in FIRST)

<e.g.> FIRST is in 1NF, but not in 2NF

(S#, P#) STATUS, and

(S#, P#) CITYFFD

FFD

Decompose FIRST into:

<1> SECOND (S#, STATUS, CITY):

primary key: S#

S#CITY

STATUS FD:

1. S# STATUS

2. S# CITY

3. CITY STATUS

S#

P# CITY

STATUS

QTYFFD

x FFD

x FFD

FFD

FFD

<2> SP (S#, P#, QTY): Primary key: (S#, p#)

S#

P#QTY

FD: 4. (S#, P#) QTY

其他屬性必須 FFD 於 Primary Key ,不可有只是 FD 而非 FFD, 以免暗藏關聯 (S#CITY)

Primary Key 只存在一次,必與其他屬性 FD 其他課本可能只寫 FD但他們定義的 FD 就是 FFD

Page 20: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-20

Normal Form: 2NF (cont.)

S# STATUS CITY

S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens

SECOND (in 2NF)

S# P# QTY) S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P4 300 S4 P5 400

SP (in 2NF)

S# STATUS CITY P# QTY

S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400

FIRST

<1> Update: S1 moves from London to Paris

<2> Insertion: (S5 30 Athens)

<3> Deletion Delete "S3 supplies P2 200", then the fact

"S3 is located in Paris" is NOT deleted.

2NF 的效果

Page 21: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-21

Normal Form: 2NF (cont.)

A relation in 1NF can always be reduced to an equivalent collection of 2NF relations.

The reduction process from 1NF to 2NF is non-loss decomposition.

FIRST(S#, STATUS, SCITY, P#, QTY)

projections natural joins

SECOND(S#, STATUS, CITY), SP(S#,P#,QTY)

The collection of 2NF relations may contain “more” information than the equivalent

1NF relation.

<e.g.> (S5, 30, Athens)

2nd SP

1st

Page 22: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-22

Problem: Update Anomalies in SECOND!

• Update Anomalies in SECOND<1> UPDATE: if the status of London is changed from 20

to 60, then two tuples must be updated

<2> DELETE: delete supplier S5, then the fact "the

status of Athens is 30" is also deleted!

<3>INSERT: cannot insert the fact "the status of Rome

is 50"!

• Why: S.S# S.STATUS S.S# S. CITY S.CITY S.STATUS cause a transitive dependency

S# STATUS CITY

S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens

SECOND (in 2NF)

S#CITY

STATUS

FD:

1. S# STATUS

2. S# CITY

3. CITY STATUS

2NF 還是有必須改進之處

Page 23: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-23

Normal Forms: 3NF

Def : A relation R is in 3NF iff

(1) R is in 2NF(2) Every non-key attribute is non-transitively dependent on the primary key.

e.g. STATUS is transitively on S#

(i.e., non-key attributes are mutually independent)

<e.g.> SP is in 3NF, but SECOND is not!

S#

P#

QTY

SP FD diagramS# STATUS CITY

S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens

SECOND (not 3NF)

S#CITY

STATUS

SECOND FD diagram

其他屬性不可以輾轉關聯於 Primary Key

但因每個屬性都已關聯於 Primary Key ,故而意即其他屬性之間不可私訂關聯

Page 24: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-24

Normal Forms: 3NF (cont.)

Decompose SECOND into:<1> SC(S#, CITY)

primary key : S# FD diagram:

<2> CS(CITY, STATUS): primary key: CITY FD diagram:

S# CITY

CITY STATUS

S# CITY S1 London S2 Paris S3 Paris S4 London S5 Athens

SC (in 3NF) CITY STATUS Athens 30 London 20 Paris 10 Rome 50

CS (in 3NF)

S# STATUS CITY

S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens

SECOND

S#

STATUS

CITY

Page 25: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-25

Normal Forms: 3NF (cont.)

Note:

(1) Any 2NF diagram can always be reduced to a collection of 3NF relations.

(2) The reduction process from 2NF to 3NF is non-loss decomposition.

(3) The collection of 3NF relations may contain "more information" than the equivalent 2NF relation.

Page 26: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-26

Good and Bad Decomposition

ConsiderS#

CITY

STATUStransitive FD

S# CITY

CITY STATUS

Good !(Rome, 50) can be inserted.

①Decomposition A:

SC:

CS:

S# CITY

S# STATUS

Bad ! (Rome, 50) can not be inserted unless there is a supplier located at Rome.

② Decomposition B:

SC:

CS:

③ Decomposition C:

S# -> status

city -> status

‘Good’ or ‘Bad’ is dependent on item’s semantic meaning!!

S#

STATUS

CITYS#

STATUS

CITY

Suppose 1. CITY is the main office of the supplier.

2. STATUS is some factor of CITY

Decomposition B:S#CITY, S#STATUS 那就仍合在一起就好了當作 CITYSTATUS 不存在

Page 27: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-27

Good and Bad Decomposition (cont.)

Independent Projection: (by Rissanen '77)

Def: Projections R1 and R2 of R are independent iff(1) Any FD in R can be reduced from those in R1 and R2.(2) The common attribute of R1 and R2 forms a candidate key for at least one of R1 and R2.

<e.g.>• Decomposition A: SECOND(S#, STATUS, CITY) (R)

(1) FD in SECOND (R): S# CITY CITY STATUS S# STATUS FD in SC and CS R1: S# CITY R2: CITY STATUS (2) Common attribute of SC and CS is CITY, which is the primary key of CS.SC and CS are independent Decomposition A is good!

SC (S#, CITY)CS (CITY, STATUS)

ProjectionR1

RR2

(R1)(R2){

S# STATUS

(1) 原有 FD 關係延續到分割後的表格(2) 分割後的表格之共同屬性,至少為 其中一個表格的 candidate key

Page 28: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-28

Good and Bad Decomposition (cont.)

Decomposition B: SECOND(S#, STATUS, CITY)

(1) FD in SC and SS: S# CITY S# STATUS

SC and SS are dependent decomposition B is bad! though common attr. S# is primary key of both SC, SS.

Decomposition C: SECOND(S#, STATUS, CITY) (1) FD in SS and CS

S# STATUS CITY STATUS

(2) Common attribute STATUS is not only a candidate key of either SS or CS.

Decomposition C is not only a bad decomposition, but also an invalid decomposition, since it is not non-loss.

SC (S#, CITY)SS (S#, STATUS){

CITY STATUS

SS (S#, STATUS)CS (CITY, STATUS){

S# CITY

Loss 非 P-key 關連的

Loss 由 P-key 關連的

Page 29: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-29

Atomic Relation Def: Atomic Relation -- A relation that cannot be decomposed

into independent components.<e.g.> SP (S#, P#, QTY) is an atomic relation, while

S (S#, SNAME, STATUS, CITY) is not.

Page 30: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-30

Break

Page 31: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-31

4.4 Boyce/Codd Normal Form (BCNF)

Problems of 3NF:Do not deal with the cases:<1> A relation has multiple candidate keys,<2> Those candidate keys were composite,<3> The candidate keys are overlapped.

Page 32: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-32

Example– S: student

– J: subject

– T: teacher

– Meaning of a tuple: student S is taught subject J by teacher T.– Suppose

– For each subject, each student of that subject is taught by only one teacher.

i.e. (S, J) T

– Each teacher teaches only one subject.

i.e. T J

– Candidate keys

(S, J) and (S, T)

– FD diagram

SJT(S, J, T)

S J T Smith Math. Prof. White Smith Physics Prof. Green Jones Math. Prof. White Jones Physics Prof. Brown

S

J

T

Page 33: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-33

BCNF

Def: A relation R is in BCNF iff every determinant is a candidate key.

• A B: A determines B, and A is a determinant.

• <e.g.1> [only one candidate key]

SP (S#, P#, QTY): in 3NF & BCNF

SC (S#, CITY): in 3NF & BCNF

CS (CITY, STATUS): in 3NF & BCNF

S# CITY

CITY STATUS

S#

P#QTY

SECOND(S#, STATUS, CITY): not in 3NF &

not in BCNF

S#

CITY

STATUS

2NF1NF

3NFBCNF

in 3NF, not in BCNF e.g.3, e.g.4 (P.7-33)

Page 34: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-34

BCNF (cont.)

<e.g.2> [two disjoint (nonoverlapping) candidate keys]

S(S#, SNAME, STATUS, CITY)

Assume :

(1) CITY, STATUS are independent

(2) SNAME is a candidate key

S#, SNAME (determinants) are candidate keys.

S is in BCNF (also in 3NF).

S#

SNAME CITY

STATUS

3NF

BCNF

• 3NF but not BCNF

‧e.g.2

Page 35: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-35

BCNF (cont.)

<e.g.3> [overlapping candidate keys -1]

SSP (S#, SNAME, P#, QTY)

key in SSP: (S#, P#), (SNAME, P#)

FD in SSP

1. S# SNAME

2. SNAME S#

3. {S#, P#} QTY

4. {SNAME, P#} QTY

P#

S# SNAME

QTY

SSP

S# SName P# QTY

- in 3NF nonkey attribute is FFD on primary key and

mutually independent. e.g. QTY only

- not in BCNF S# is a determinant but not a

candidate key. S# SNAME

Decompose:

SS (S#, SNAME): in BCNF SP (S#, P#, QTY): in BCNF

S# SNAMES#

P#QTY

Page 36: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-36

BCNF (cont.)

<e.g.4> [overlapping candidate keys-2]

SJT(S, J, T)– S: student

– J: subject

– T: teacher

– meaning of a tuple: student S is taught subject J by teacher T.

– Suppose

– For each subject, each student of that subject is taught by only one teacher.

i.e. (S, J) T

– Each teacher teaches only one subject.

i.e. T J

S J T Smith Math. Prof. White Smith Physics Prof. Green Jones Math. Prof. White Jones Physics Prof. Brown

– Candidate keys

(S, J) and (S, T)

– FD diagram

S

J

T

Page 37: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-37

BCNF (cont.)

– In 3NF, no nonkey attribute.

– not in BCNF, T J but T is not a candidate key

– update anomalies occur!

e.g. (delete "Jones is studying Physics" the fact

"Brown teaches Physics" is also deleted!)

Decompose 1: Decompose 2:

ST (S, T) TJ(T, J)

S T Smith Prof. White Smith Prof. Green Jones Prof. White Jones Prof. Brown

T J

Prof. White Math. Prof. Green Physics Prof. Brown physics

S T

in BCNF

T J

in BCNF

S J T J

– Is this decomposition Good or Bad?

– In Rissanen's sense, ST(S, T) and TJ(T, J) are not

independent!

the FD: (S,T) T cannot be deduced from FD: T J

The two objectives:

<1> decomposing a relation into BCNF, and

<2> decomposing it into independent components may

be in conflict!

S J T Smith Math. Prof. White Smith Physics Prof. Green Jones Math. Prof. White Jones Physics Prof. Brown

Page 38: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-38

BCNF (cont.)

<e.g.5> [overlapping candidate keys-3]

EXAM(S, J, P); S: student, J: subject, P: position.– meaning of a tuple: student S was examined in subject J and

achieved position P in the class.

– suppose no two students obtained the same position in the same subject.

i.e. (S, J) P and (J, P) S

– FD diagram:

– candidate keys: (S,J) and (J, P), overlap key: J.

– in BCNF !

J

P

S

S J P

A DBMS 5

B DBMS 8

A Network 1

EXAM

Page 39: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-39

Why Normal Form?

Avoid update anomalies

Consider the SSP(S#, SNAME, P#, QTY)

Common sense will tell us SS(S#, SNAME) & SP(S#, P#, QTY) is a better design.

The concepts of FD, 1NF, 2NF, 3NF and BCNF to formalize common sense.

Mechanization is possible!• i.e., we can write a program to do the work of normalization for us!

Page 40: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-40

4.5 Fourth Normal Form (4NF)

Page 41: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-41

Un-Normalized Relation

COURSE TEACHER TEXT

Physics {Prof. Green, {Basic Mechanics,

Prof. Brown} Principle of Optics}

Math. {Prof. Green} {Basic Mechanics,

Vector Analysis,

Trigonometry}

CTX

Math

Text

1

2

3

meaning of a record: the specified course can be taught by any of the specified teachers and uses all of the specified texts as references.

Assume:

- For a given course, there exists any number of teachers and any number of texts.

- Teachers and texts are independent.

- A given teacher or a given text can be associated with any number of courses.

Page 42: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-42

Un-normalized Relation (cont.)

Note: No FD exists in this relation!

Normalized

COURSE TEACHER TEXT

Physics(c) Prof. Green(t1) Basic Mechanics (x1)

physics(c) Prof. Green(t1) Principle of Optics (x2)

physics(c) Prof. Brown(t2) Basic Mechanics (x1)

physics(c) prof. Brown(t2) Principles of Optics(x2)

Math prof. Green Basic Mechanics

Math prof. Green Vector Analysis

Math prof. Green Trigonometry

C T X

Function

.

.

.

.

.

Page 43: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-43

Un-normalized Relation (cont.)

Meaning of a tuple: course C can be taught by teacher T and uses text X as a reference.

• primary key: (COURSE, TEACHER, TEXT)

Check:

• in 1NF (simple domain contains atomic value only)

• in 2NF (Nonkey attributes are FFD on primary key,

no key attributes)

• in 3NF (Nonkey attributes are mutually independent.)

• in BCNF (Every determinant is a candidate key)

COURSE TEACHER TEXT

Physics(c) Prof. Green(t1) Basic Mechanics (x1)

physics(c) Prof. Green(t1) Principle of Optics (x2)

physics(c) Prof. Brown(t2) Basic Mechanics (x1)

physics(c) prof. Brown(t2) Principles of Optics(x2)

Math prof. Green Basic Mechanics

Math prof. Green Vector Analysis

Math prof. Green Trigonometry

Page 44: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-44

Un-normalized Relation (cont.)

Problem: a good deal of redundancy!• property:

if (c, t1, x1), (c, t2, x2) both appearthen (c, t1, x2) , (c, t2, x1) both appear also!

• reason: No FD, but has MVD!

• the decomposition cannot be made on the basis of FD.

COURSE TEACHERPhysics Prof. GreenPhysics Prof. BrownMath Prof. Green

CT:

COURSE TEXTPhysics Basic MechanicsPhysics Principles of OpticsMath Basic MechanicsMath Vector AnalysisMath Trigonometry

CX:

Physics

Math

Not FD!

COURSE TEACHER TEXT

Physics(c) Prof. Green (t1) Basic Mechanics (x1)

physics(c) Prof. Green (t1) Principle of Optics (x2)

physics(c) Prof. Brown (t2) Basic Mechanics (x1)

physics(c) prof. Brown (t2) Principles of Optics (x2)

Math prof. Green Basic Mechanics

Math prof. Green Vector Analysis

Math prof. Green Trigonometryintuitively decomposed

Page 45: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-45

MVD ( Multi-Valued Dependencies) Def: Given R(A, B, C), the multivalued dependence (MVD)

R.A R.B holds in R iff the set of B-values matching a given (A-value, C-value)

pair is R, depend only on A-value, and is independent of C-value.

<e.g> COURSE TEACHER, COURSE TEXT

Thm: Given R(A, B, C), the MVDR.A R.B holds iff the MVD R.A R.C also holds.

• Notation: R.A R.B | R.C <e.g.> COURSE TEACHER | TEXT

<Note> 1. FD is a special case of MVD all FD's are also MVD's 2. MVDs (which are not also FD's) can exist only if the relation R has at least 3 attributes.

physics Basic MechanicsBrown

Green, ,

A B C

{ { { {

MVD

..

.

.

FD ‧ ‧

Page 46: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-46

Norma Forms: 4NF Problem of CTX: involves MVD's that are not also FD's.

Def: A relation R is in 4NF iff whenever there exists an MVD in R, say A R, then all attributes of R are also FD on A. i.e. R is in 4NF iff (i) R is in BCNF, (ii) all MVD's in R are in fact FD's. i.e. R is in 4NF iff (i) R is in BCNF, (ii) no MVD's in R.

<e.g.1> CTX (COURSE, TEACHER, TEXT)

COURSE TEACHER

COURSE TEXT

<e.g.2> S (S#, SNAME, STATUS, CITY)

S# STATUS

SNAME CITY

not in 4NF

no MVD which is not FDin 4NF

Page 47: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-47

Norma Forms: 4NF (cont.)

Thm: Relation R(A, B, C) can be no loss decomposed into R1(A, B) and R2(A, C) iff A B | C holds in R.

<e.g.> CTX (COURSE, TEACHER, TEXT)

COURSE TEACHER | TEXT

CT (COURSE, TEACHER)

CX (COURSE, TEXT)

no MVD in 4NFno MVD in 4NF

Page 48: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-48

4.6 Fifth Normal Form (5NF)

Page 49: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-49

A Surprise There exist relations that cannot be nonloss-decomposed into two projections,

but can be decomposed into three or more.

Def: n-decomposable (for some n > 2)

the relation can be nonloss-decomposed into n projections, but not into m projection for any m < n.

<e.g.> SPJ (S#, P#, J#); S: supplier, P: part, J: project.• Suppose in real world if (a) Smith supplies monkey wrenches, and (b) Monkey wrenches are used in Manhattan project, and (c) Smith supplies Manhattan project. then (d) Smith supplies Monkey wenches to Manhatan project.i.e. If (s1, p1, j2), (s2, p1, j1), (s1, p2, j1) appear in SPJ Then (s1, p1, j1) appears in SPJ also.– no MVD in 4NF

Page 50: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-50

A Surprise (cont.)

update problem of SPJ

• If (S2, P1, J1) is to be inserted then (S1, P1, J1) must also be inserted

• If (S1, P1, J1) is to be deleted, then one of the following must also be deleted (i) (S1, P1, J2): means S1 no longer supplies P1. (ii) (S1, P2, J1): means S1 no longer supplies J1. (iii) (S2, P1, J1): means J1 no longer needs P1.

SPJ: S# P# J#S1 P1 J2S1 P2 J1

SPJ: S# P# J#S1 P1 J2S1 P2 J1S2 P1 J1S1 P1 J1

Page 51: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-51

A Surprise (cont.)

SPJ is not 2-decomposable, but is 3-decomposable!

SPJ S# P# J#S1 P1 J2S1 P2 J1S2 P1 J1S1 P1 J1

S# P# S1 P1S1 P2S2 P1

P# J# P1 J2P2 J1P1 J1

J# S# J2 S1J1 S1J1 S2

SP PJ JS

S# P# J#S1 P1 J2S1 P1 J1S1 P2 J1S2 P1 J2S2 P1 J1

joinover P#

spurious join over (J#, S#)

ORIGINAL SPJ

Page 52: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-52

Join Dependency (JD) Def: A Relation R satisfies the join dependency (JD)

* (X, Y, ..., Z)

iff R is equal to the join of its projections on X, Y, ..., Z, where X, Y, ..., Z are subsets of the set of attributes of R.

<e.g.> SPJ satisfies the JD *(SP, PJ, JS) i.e. SPJ is 3-decomposable.

MVD is a special case of JD.Thm: R (A, B, C) can be nonloss-decomposed into R1(A, B) and R2(A, C) iff A B|C holds.Thm: R (A, B, C) satisfies the JD *(AB, AC) iff A B|C holds.

<Note>JD's are the most general form of dependency possible, so long as we concentrate on the dependencies that dealwith a relation being decomposed via projection and recomposed via join.

Page 53: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-53

Norma Forms: 5NF Def: A relation R is in 5NF (or PJ/NF) iff every JD in R is

a consequence of the candidate keys of R.

<e.g.1> Suppose S# and SNAME are candidate keys of

S (S#, SNAME, STATUS, CITY).

<i> * ((S#, SNAME, STATUS), (S#, CITY))

is a consequence of S# (a candidate key of S)

<ii> * ((S#, SNAME), (S#, STATUS), (SNAME, CITY))

is a consequence of the candidate keys S# and SNAME.

Page 54: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-54

Norma Forms: 5NF (cont.)

<e.g.2> Consider SPJ (S#, P#, J#), the candidate key of SPJ is

(S#, P#, J#).

However, there exists a JD

*((S#, P#), (P#, J#), (J#, S#))

which is not a consequence of (S#, P#, J#)

SPJ not in 5NF!

decomposed:

SP (S#, P#), PJ (P#, J#), JS (J#, S#):

( no JD in them all in 5NF!)

Note: 1. Discovering all the JD's is a nontrivial operation.2. Intuitive meaning of JD may not be obvious.3. A relation in 4NF but not in 5NF is a pathological case, and likely to be rare in

practice.

Page 55: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-55

Concluding Remarks

The technique of non-loss decomposition is an aid to logical database design .

The overall processes of Normalization:• step1: eliminate non-full dependencies.

• step2: eliminate any transitive FDs.

• step3: eliminate those FDs in which the determinant is not a candidate key.

• step4: eliminate any MVDs that are not FDs.

• step5: eliminate any JDs that are not a consequence of candidate keys.

General objective: • reduce redundancy, and then

• avoid certain update anomalies.

Normalization Guidelines are only guidelines. • Sometime there are good reasons for not normalizing all the way.

Page 56: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-56

Concluding Remarks (cont.)

<e.g.> NADDR (NAME, STREET, CITY, STATE, ZIP)

NAME

STREET

CITY

STATE

ZIP

not in 5NF(in which NF?)

decompose

NSZ (NAME, STREET, ZIP) ZCS (ZIP, CITY, STATE)

NAME STREET ZIP

ZIP CITY

STATE

Page 57: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-57

Concluding Remarks (cont.)

However, (1) STREET, CITY, STATE are almost required together.

(2) so ZIP do not change very often, such a decomposition seems unlikely to be worthwhile.

Not all redundancies can be eliminate by projection.

Research topic: decompose relations by other operator.

e.g. restriction.

Projection1 2 3 4 5 1 2 3 1 4 5

Page 58: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

7-58

4.7 The Entity/Relationship Model

Proposed by Peter Pin-Shan Chen. “The Entity- Relationship Model,” in ACM Trans. on Database System Vol. 1, No. 1, pp.9-36, 1976. E-R Model contains:

• Semantic Concepts and • E/R Diagram.

Page 59: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-59

Entity/Relationship Diagram

Entity :

Property:

Relationship:

RegularEntity

WeakEntity

Base Derived Multi

CompositeKey

Regular Weak

total partial

<e.g>

Department

Employee

Dependent

SupplierPart

Project

Department_Emp

SP

SPJ

Part_Struct

Proj_Work Proj_Mngr

Emp_Dependent

Emp# Salary

Ename

First LastMI

S# Sname

Status City

Qty

Qty

1

1

1

M

M

M

M

M

MM M

M M

M

Exp Imp

... ...

...P#

Qty

...

Page 60: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-60

Semantic Concepts

Entity: a thing which can be distinctly identified. e.g. S, P

• Weak Entities: an entity that is existence-dependent on some other entity. e.g. Dependent

• Regular Entities: an entity that is not weak.

Property: • Simple or Composite Property• key: unique value• Single or

Multi-valued: i.e. repeating data • Missing: unknown or not applicable.• Base or

Derived: e.g. "total quantity"

Page 61: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-61

Semantic Concepts (cont.)

Relationship: association among entities.• participant: the entity participates in a relationship.

e.g. Employee, Dependent• degree of relationship:

number of participants.• total participant:

every instance is used. e.g. Dependent

• partial participant: e.g. Employee

Note: An E/R relationship can be one-to-one, one-to- many, many-to-one, or many-to-many.

.

.

.

.

.

.

.

.

Employee Dependent

E1

E2

E3

E4

Dependent M

1 Employee

Page 62: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-62

Semantic Concepts (cont.)

Subtype: Any given entity can be a subtype of other entity.

e.g. An example type hierarchy:

• Programmer Employee• Application Programmer Programmer

Employee

Programmer

Application Programmer

System Programmer

Page 63: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-63

Transfer E-R Diagram to SQL Definition

ER-Diagram

SQL Definition

(1) Regular Entities Base Relations e.g. Department DEPT

Employee EMP Supplier S Part P Project J

attributes: properties.

primary key: key property.<e.g> CREATE TABLE EMP (EMP# ...... . . . primary key (EMP#)); . . .

(2) Properties attributes multivalued property: normalized first.

Page 64: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-64

Transfer E-R Diagram to SQL Definition (cont.)

(3) Many-to-Many Relationships Base Relations

<e.g.> Consider SUPP-PART SP

– foreign keys: key attributes of the participant entity.

– primary key:(1) combination of foreign keys, or

(2) introduce a new attribute.

– attributes:

primary key foreign keys properties

ID S# P# QTY

1

2

3

...

S

S#

SPP

QTY

P#

M M

<e.g.> CREATE TABLE SP

( S#, ..., P#, ..., Qty, ..., FOREIGN KEY (S#) REFERENCES S

NULL NOT ALLOWED DELETE OF S RESTRICTED UPDATE OF S.S# CASCADES FOREIGN KEY (P#) REFERENCES P ...... PRIMARY KEY (S#, P#) )

Page 65: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-65

Transfer E-R Diagram to SQL Definition (cont.)

(4) Many-to-One or One-to-Many or One-to-One Relationships → 不用給 Base-Relation, 但… .

e.g. Consider

DEPT_EMP(Department_Employee)

1° no any new relation is necessary.

2° Introduce a foreign key in "many" side relation (eg. EMP) that reference the "one" side relation (eg. DEPT).

CREATE TABLE EMP ( EMP#, ... DEPT#, ... ...... FOREIGN KEY (DEPT#) REFERENCES DEPT ...... );

Department (DEPT)

DEPT_EMP

EMP#

DEPT#

Employee (EMP)

M

1

DEPT#

Page 66: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-66

Transfer E-R Diagram to SQL Definition (cont.)

(5) Weak Entities Base Relation + Foreign Key

• foreign key: key property of its dependent entity

• primary key:

(1) combination of foreign key and its own primary key

(2) introduce a new attribute

e.g. CREATE TABLE DEPENDENT

( EMP#, …

name,

......

FOREIGN KEY (EMP#) REFERENCES EMP

NULL NOT ALLOWED

DELETE OF EMP CASCADES

UPDATE OF EMP.EMP# CASCADES

PRIMARY KEY ( EMP#, DEPENDENT_NAME ) );

Dependent

EMPEMP#

name

M

1

EMP#

e.g. EMP# e.g. name

Page 67: 7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-67

Transfer E-R Diagram to SQL Definition (cont.)

(6) Supertypes and SubtypesBase Relation + Foreign Key

<e.g.> CREATE TABLE EMP ( EMP#, ... DEPT#, ... SALARY, ... ...... PRIMARY KEY (EMP#), ...);

CREATE TABLE PGMR ( EMP#, ... LANG#, ... SALARY, ... ...... PRIMARY KEY (EMP#) FOREIGN KEY (EMP#) REFERENCES EMP ......);

EMP

PGMR