Database Design 建構一個能滿足某一應用需求的資料庫 結構. Steps in database...

Preview:

Citation preview

Database Design

•建構一個能滿足某一應用需求的資料庫結構

Steps in database

Requirements formulation & analysis

Requirements formulation & analysis

conceptual

schema

conceptual

schema

Logical(or implementation)

design

Logical(or implementation)

design

Physical designPhysical design

Logical database design

Physical database design

Requirements

Spec.

Informationstructure

ERD or 3NF relations

Logical database structure(DBMS processible)

Access methods,Storage structures

Requirement Formation& Analysis

• Purpose: identify and describe the data that are required by the organization

• Inputs: user information requirements, data items(attributes), data association, processing requirements (reports frequencies, response time requirements, etc.)

• Outputs: a set of requirements specifications for conceptual design

Conceptual Design

• Purpose: synthesize the various user views and information requirements into a global database design.

• The result is a conceptual schema in ERD or normalized telations

Implementation Design

• Purpose: map the conceptual data model into logical schema that can be processed by a particular DBMS.

Conceptual model is mapped into hierarchical, network or relational data model.

Schemas & subschemas are developed using DDL

Physical Design

• Designing stored formats, selecting access methods (index), security, integrity, backup & recovery.

• Output: internal schema

Steps in Requirements Analysis

• Identifying & documenting what data user requires

• Study the data flows&decision-making process, particularly answering the following questions:1. User views?2. Data elements (attributes) required in the views?3. Primary keys?4. Relationships among data elements?5. Operational requirements such as security, integrity &

response time?

Data-oriented Approach

• Tools: view analysis, definition, normalization

Data-oriented Approach

• Steps:

Define database scopeDefine database scope

Establish metadata collection standards

Establish metadata collection standards

Identify user viewsIdentify user views

Build data dictionaryBuild data dictionary

Identify data volumes and Usage

Identify data volumes and Usage

Identify operational requirements

Identify operational requirements

For physical Database design

For logical database design

Define database scope

• Define the scope of the application• Mini-world

Student Registrationsystem

Mini-world

Univ.

Establish metadata collection standards

• Use consensus metadata collection forms

Typical forms

Typical forms

Identify user views

• Identify the user views in the application• Used for logical database design

User views

• A user view is a subset of data required by a particular user to make a decision or carry out some action, can be derived from Reports, displays, files Determine semantic rules Standard form for recording information about user

views

A user view example

Lakewood collegeGrade Report

Fall Semester 198x

Student#: 38214 Major: Information Systems

Student-name: Jane Bright

Course # Course-title Instructor-name

Instructor-Location

Grade

IS 350IS445

DatabaseSystems Analysis

CoddKemp

B104B213

AC

Data Associations:

Student#Student_name, MajorCourse# Course-title, Instructor-name, Instructor-locationInstructor-name Instructor-location(Student#, Course#) Grade

Build a data dictionary

• Each data item type that appears in a user view must be defined and described in details.

• Standard form can be used.

Identify data volume and usage

• Data volumes and usage pattern is required for physical database design

• It is done after the conceptual model has been completed

Identify operational requirements

• Security• Integrity• Response time• Back & recovery• Archiving-how long must data be retained? In

what form?

View analysis using normalization

• Normalization (正規化 ) • 當我們得到 user views(經由 RA)之後 , 每個 view內含多個 data items ,分別屬於不 同的 entity, 或 relationship

Normalization

• 例 : Grade-Report 內含 STUDENT#, STUDENT-NAME, MAJOR,COURSE# COURSE-TITLE, INSTRUCTOR-NAME,INSTRUCTOR- LOCATION , GRADE

• 分別屬於 STUDENT , COURSE , 和 INSTRUCTOR ENTITY TYPE

Normalization

• Normalization provides a foundation for logical database design

• Normalization can be used to reduce complex user views to a set of small, stable conceptual schema

Normalization 的步驟

• 定出 user views→ 表示為 Unnormalized relations →除去重複群 (1NF) →除去部分相依 (2NF) →除去遞移性相依 (3NF) 

→view integration (conceptual schema)

User views

Unnormalizedrelations

Normalizedrelations(1NF)

Second normalform(2NF)relations

Third normalform(3NF)relations

Remove repeating groups

Remove partialdependency

Remove transitive dependency

Unnormalized relation

• Relation 中含有 repeating groups.• 例 : Grade-report • 表示法 GRADE-REPORT (STUDENT#,STUDENT-

NAME,MAJOR,{COURSE#,COURSE-TITLE,INSTRUCTOR-NAME,INSTRUCTOR-LOCATION,GRADE})

Example of unnormalized relation

Student# Studentname

Major Course# CourseTitle

Instructorname

InstructorLocation

grade

38214 Bright IS IS 350 

IS 465

database 

sys anal

CODD 

KEMP

B104 

B213

A C

69173 Smith PM IS 465PM 300QM 440

sys analprod mgt

op res

KEMPLEWISKEMP

B213D317B213

ABC

…              

Normalized relation(1NF)

• 中每一個列與行的交會處只能放一個 single value . 亦即每一個 attribute只能有一個 single value

• Atomic attribute• 又叫做 1NF的表格

Normalization

• 經由 Normalization的分析,可以由一堆 views之中整理出一個 conceptual data model (schema), 此 conceptual

schema 能完整 (completely),簡單地(simply)支持

所有的 user views.

Normalization

• Normalization 的一般用途是免除在對表格作insert , update,和 delete時所造成的不方便或異常 (anomaly)

Unnormalized relation 的正規化

• 分離 repeating group使成一新的 relation. 例 :STUDENT(STUDENT# , STUDENT-NAME , MAJOR)STUDENT-COURSE(STUDENT# , COURSE# ,COURSE-TITLE , INSTRUCTOR-NAME , INSTRUCTOR-LOCATION , GRADE)

• 要帶 foreign key以保持損切割• 找出 Primary key ---STUDENT-COURSE 中

COURSE# 不能獨立形成 primary key, 需 (STUDENT# , COURSE#) 聯合成為 primary key

Remove repeating group

MAJORSTUDENT NAME

STUDENT#

GRADEINSTRUCTOR LOCATION

INSTRUCTOR NAME

COURSE TITLE

COURSE#

PMSmith69173

ISBright38214

MAJORSTUDENT NAME

STUDENT#

GRADE-REPORT

STUDENT

3NF

CB213KEMPOp resQM 44069173

BD317LEWISProd mgtPM 30069173

AB213KEMP Sys analIS 46569173

CB213KEMPSys analIS 46538214

AB104CODDDatabaseIS 35038214

GRADEINSTRUCTOR LOCATION

INSTRUCTOR NAME

COURSE TITLE

COURSE#STUDENT#STUDENT COURSE

1NF

Problems in STUDENT-COURSE

• Data redundancy---IS 465• Insertion anomaly---insert a new course

e.g., BA200, INTR DP, We cannot do this unless one student registers in BA200

• Deletion anomaly---課程若只有一個人修,若該學生退選,則 delete該 tuple會導致資訊遺失 例如, delete student#=69173 修 Prod mgt,會遺失Lewis教 Prod mgt的訊息。

• Update anomaly---改 IS465的課名為 sys anal&Des,必須修改全部有關的 tuple以免造成不一致。

Reasons for anomaly

• Some attributes do not fully depend on the primary key

STUDENT-COURSE表格的 primary key為 (STUDENT#, COURSE#)

COURSE#→ COURSE-TITLE, INSTRUCTOR-NAME, INSTRUCTOR-LOCATION• 我們說 COURSE-TITLE,INSTRUCTOR-NAME ,

INSTRUCTOR-LOCATION partially dependent(部份相關) on the primary key .

Functional dependency(FD)

• 正規化是在分析函數相依關係 (Functional dependency),因此介紹函數相依關係

• 定義: Given a relation R , attribute Y of R is functionally dependent on attribute X of R if and only if each X-value in R has associated with it precisely one Y-value in R (at any time)

• 函數相依關係是一種語意規則,不能以某一時間表格的內容論定

Functional dependency

RX Y

x1x2

y1y1

X→YX determines YY functional dependent on X

X 與 Y 的關係為多對一

第二正規化 (2NF)

• 定義: A relation is said to be in 2NF , if it is already 1NF , and all non-key attributes are all fully dependent on the primary key

• 正規化的方法:將 partially dependent 的attributes分到另外一個表格中

第二正規化範例

CB213KEMPOp resQM 44069173

BD317LEWISProd mgtPM 30069173

AB213KEMP Sys analIS 46569173

CB213KEMPSys analIS 46538214

AB104CODDDatabaseIS 35038214

GRADEINSTRUCTOR LOCATION

INSTRUCTOR NAME

COURSE TITLE

COURSE#STUDENT#

Student-Course relation

Dependency diagram

COURSETITLE

INSTRUCTORNAME

INSTRUCTORLOCATION

STUDENT#

COURSE#

GRADE}Partially function dependency

表格第二正規化

STUDENT#

COURSE# COURSE TITLE

INSTRUCTOR NAME

INSTRUCTOR LOCATION

GRADE

STUDENT# COURSE# GRADE

38214 IS 350 A

38214 IS 465 C

69173 IS 465 A

69173 PM 300 B

69173 QM 400 C

Student-Course

REGISTERATION

COURSE# COURSE TITLE

INSTRUCTOR NAME

INSTRUCTOR LOCATION

IS 350 Database CODD B104

IS 465 Sys anal KEMP B213

PM 300 Prod mgt LEWIS D317

QM 440 OP res KEMP B213

COURSE-INSTRUCTOR

3NF 2NF

Anomaly in 2NF relation

• insert 一個新的 instructor data 必須 instructor 開授某一課程

• delete 某一個 course 可能會失去一個 instructor 的資料 ,

例 : delete IS350CODD 的資料會遺失

• update 由於 instructor 的資料重複 , 改 instructor 的location 較不易

FD in COURSE-INSTRUCTOR

COURSE#COURSE TITLE

INSTRUCTORNAME

INSTRUCTORLOCATION

Transitive dependency

Transitive dependency

• A transitive dependency occurs when one non-key attribute is dependent on one or more non-key attributes

Primary key A B

Transitive dependency

Primary key →AA→B

Primary key→B}

3NF

• A relation is in 3NF, if it is already in 2NF and no transitive dependency exists.

Primary key

Attribute1 Attribute2 Attribute n…

3NF的 FD型態

Each non-key attribute is fully dependent on the primary key and there is no transitive dependency.

3NF的正規化

• 將造成 transitive dependency 的 attributes分離至另外一個 relation 中 .

3NF的正規化

• Conversion of a relation to third normal form (3NF) by removing transitive functional dependency (FD) COURSE-INSTRUCTOR

  COURSE# COURSETITLE

INSTRUCTOR NAME INSTRUCTOR LOCATION

COURSE INSTRUCTOR

COURSE# COURSETITLE

INSTRUCTOR NAME

IS 350 Database CODD

IS 465 Sys anal KEMP

PM 300 Prod mgt LEWIS

QM 440 Op res KEMP

INSTRUCTOR NAME INSTRUCTORLOCATION

CODD B104

KEMP B213

LEWIS D317

3NF

3NF

3NF的正規化

• Instructor-name 必須放入 COURSE relation 之中 以保存 COURSE-INSTRUCTOR relationship,如此COURSE relation 之中才可以參考到 INSTRUCTOR.

•  INSTRUCTOR-NAME 為 COURSE中的一個 foreign key

•   Normalization 到 3NF就可以結束 , 因為 3NF排除了大部分的 anomaly, 每個 entity都由各自的一個 relation 表示 .insert ,delete , 或 update一個 entity不須參考到別的 entity.因為一個 relation 只代表一個 entity,或relationship 可以繼續作 Normalization至 4NF, BCNF , 5NF , DKNF … , 但會產生太多的小 relation ,通常不必 .

表格分析後的結果

Summary of 3NF relations for GRADE-REPORTSTUDENT #

STUDENTNAME

MAJOR

38214 Bright IS

69173 Smith PM

STUDENT ( STUDENT#, STUDENT-NAME , MAJOR )

COURSE#COURSE

TITLEINSTRUCTOR

NAME

IS 350 Database CODD

IS 465 Sys anal KEMP

PM 300 Prod mgt LEWIS

QM 440 Op res KEMP

COURSE ( COURSE# , COURSE-TITLE , INSTRUCTOR-NAME)

INSTRUCTOR NAMEINSTRUCTOR

LOCATION

CODD B104

KEMP B213

LEWIS D317

INSTRUCTOR ( INSTRUCTOR-NAME , INSTRUCTOR-LOCATION )

STUDENT # COURSE# GRADE

38214 IS 350 A

38214 IS 465 C

69173 IS 465 A

69173 PM 300 B

69173 QM 440 C

REGISTRATION ( STUDENT# , COURSE# , GRADE )

ER Diagram

STUDENT

COURSEINSTRUCTOR

GRADE

M

N

Data volume analysis

Normalization 做完之後就可以作 view integration . 若只有一個 view , 可以作 Conceptual Model . 例 : Conceptual Model 中可以表示 Mapping 的關係及 Data Volumes . Data Volume : relation 中最多 tuple 時的 tuple 個數 .

例 : 1.   3000 Student 2.   每個 Student 平均選 3 門課 9000 個註冊 ( 修課 ) 3.   100 個 Instructors 4.   平均一個 Instructor 教 3 門課 300 門課 ( 班 )

5. 平均一班有 30 人 (9000/300=30) - Data volumes 在 conceptual 中表示 . 同時 mapping 中也表示對應的元素個數 .

Data structure diagram

Data Structure Diagram :

STUDENT#STUDENT-

NAMEMAJOR

STUDENT

REGISTRATION

COURSE

INSTRUCTOR INSTRUCTOR-

NAME INSTRUCTOR- LOCATION

STUDENT#COURSE-

TITLEINSTRUCTOR-

NAME

STUDENT# COURSE# GRADE

3000

9000

300

100

3

30

3

Boyce-Codd Normal Form

• Problem in 3NF: some non-key attribute may determine part of the key attributes

• Determinant: an attribute on which some other attributes is fully functional dependent

• A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

• BCNF is related to functional dependency

Relation not in BCNF

STUDENT#

MAJOR ADVISOR

123 PHYSICS

EINSTEIN

123 MUSIC MOZART

456 BIOL DARWIN

789 PHYSICS

BOHR

999 PHYSICS

EINSTEIN

ST-MAJ-ADV

STUDENT

MAJOR

ADVISOR}

Anomaly

• Student# 456 changes from BIOL to MATH→ lose DARWIN advises in BIOL• Cannot insert WATSON advises in COMPSCI until

at least one student majoring in COMPSCI and is assigned WATSON as an advisor

Normalization

STUDENT# ADVISOR

123 EINSTEIN

123 MOZART

456 DARWIN

789 BOHR

999 EINSTEIN

ST-ADV

ADVISOR MAJOR

EINSTEIN PHYSICS

MOZART MUSIC

DARWIN BIOL

BOHR PHYSICS

EINSTEIN PHYSICS

ADV-MAJ

4NF

• 4NF is related to a notion called multi-value dependency• Multi-value dependency exists when thre are three

attribute A,B,C satisfies For a value of A, there exists a well-defined set of value of B For a value of A, there exists a well defined set of value of C The set of B values associated with a given A value is

independent of the C values

COURSE INSTRUCTOR TEXTBOOK

A relation not in 4NF

COURSE INSTRUCTOR

TEXTBOOK

Management

WhiteGreenBlack

DruckerPeters

Finance Gray WestonGilford

Offering

(a) Unnormalized Relation

COURSE INSTRUCTOR

TEXTBOOK

Management

White Drucker

Management

Green Drucker

Management

Black Drucker

Management

White Peters

Management

Green Peters

Management

Black Peters

Finance Gray Weston

Finance Gray Gilford

(b) Normalized relationProblem: 加入 Middleston textbook 給 management course需加入 3個Tuples(Insert anomaly)

4NF 正規化

• Remove multi-value dependencies

Relations in fourth normal form

COURSE INSTRUCTOR

Management White

Management Green

Management Black

Finance Gray

TEACHER

COURSE TEXTBOOK

Management Drucker

Management Peters

Finance Weston

Finance Gilford

TEXT

Normal form 關係圖

unnormalized1NF2NF

3NF

BCNF4NF

5NF(PJNF)DKNF

Normal form越高 relation越多3NF即可

BCNF與 FD有關

4NF與 multi-valueDependency有關

5NF與 Project-Join有關