関係データベースの 第三正規化の形式的検証 Formally Verifying the Third...

Preview:

DESCRIPTION

関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases. 産 総研 平井洋一 AIST, Yoichi Hirai 2013-11-22, Nagano (TPP 2013). ACID p roperties of the database systems. Atomicity Changes are applied in “all or nothing” manner. Partial changes must be rolled back. - PowerPoint PPT Presentation

Citation preview

1

関係データベースの第三正規化の形式的検証Formally Verifying

the Third Normalization ofRelational Databases

産総研 平井洋一AIST, Yoichi Hirai

2013-11-22, Nagano (TPP 2013)

2

ACID properties of the database systems

• AtomicityChanges are applied in “all or nothing” manner. Partial changes must be rolled back.

• ConsistencyChanges on valid states result in valid states.

• IsolationEven concurrent changes simulate a temporally serial execution.

• DurabilityOnce changes are applied, they remain forever unless overwritten.

3

Anomalities: failures of consistency

• Update anomalitytitleID Title Author Library

3 Istanbul Orhan Pamuk Central

3 Istanbul Orhan Pamuk East

titleID Title Author Library

3 Istanbul Orhan Pamuk Central

3 My name is red Orhan Pamuk East

Tried to change the title, but failed to change all occurrences.Consistency is violated.

4

Anomalities: failures of consistency

• Deletion anomalityFacultyID Faculty

nameFaculty hire date

Course name

Couse day

Couse time

33 R. Wavey 1951-09-01

Physics 2A

Wed 15:00-

34 … … … … …

FacultyID Faculty name

Faculty hire date

Course name

Couse day

Couse time

34 … … … … …

Just removed a course, but removed a faculty as a result.

5

Codd’s first normal form

titleID Title Author Library Library

3 Istanbul Orhan Pamuk

East Central

5

1st normal form excludes repetition of the same attributes.

6

Functional dependenciestitleID Title Author Library

3 Istanbul Orhan Pamuk Central

3 Istanbul Orhan Pamuk East

{titleID} → {Title, Author}{titleID, Library} → {titleID, Title, Author, Library}

7

Functional dependenciesFacultyID Faculty

nameFaculty hire date

Course name

Couse day

Couse time

33 R. Wavey 1951-09-01

Physics 2A

Wed 15:00-

34 … … … … …

{FacultyID} → {Faculty name, Faculty hire date}{FacultyID, Course name} → {Course day, Course time}{FacultyID, Course name} → {FacultyID, Faculty name, Faculty hire date, Course name, Course day, Course time}

8

Armstrong’s laws

• Mizar has formalization, soundness and completeness with respect to the relational semantics

1. Reflexivity: Y X implies X → Y⊆2. Augmentation: Z W and X → Y imply⊆

X W → Y Z∪ ∪3. Transitivity: X → Y and Y → Z imply X → Z

sound and complete with respect to the relational semantics

9

Codd’s second normal form• Excludes this

FacultyID Faculty name

Faculty hire date

Course name

Couse day

Couse time

33 R. Wavey 1951-09-01

Physics 2A

Wed 15:00-

34 … … … … …

Because of these conditions1. {FacultyID, course name} is a minimal set X with functional dependency

X → {FacultyID, faculty name, faculty hire date, course name, course day, course time} ({Faculty ID, couse name} is a candidate key).

2. Faculty hire date is not contained in any candidate key (faculty hire date is non-prime attribute)

3. Faculty hire date is dependent on {FacultyID}, which is a proper subset of a candidate key {FacultyID, couse name}.

10

The third normal form• Excludes this (example from Wikipedia)

Tournament Year Winner Winner Date of Birth

Indiana Invitational 1998 Al Fredrickson 21 July 1975Des Moines Masters 1999 Al Fredrickson 21 July 1975

Indiana Invitational 1999 Chip Masterson 14 March 1977

Because a non-prime attribute “Winner Date of Birth” is transitively dependent on a candidate key. Concretely,1. “Winner Date of Birth” is a

non-prime attribute2. {Tournament, Year} is a

candidate key3. {Tournament, Year} → Winner

holds

4. Winner → {Tournament, Year} does not hold

5. Winner → {Winner Date of Birth} holds

6. “Winner Date of Birth” is not in {Tournament, Year}

7. “Winner Date of Birth” is not in {Winner}

11

Obtaining the third normal form:the input and output

• Input: a finite set of functional dependencies

• Output: a finite set of relations and their keys (in 3NF)

Tournament

Year

Winner Winner Date of Birth

Tournament Year Winner

Winner Winner Date of Birth

12

Bernstein’s algorithm 1[Bernstein, 1976]

Obtained after two earlier erroneous attempts!

13

Bernstein’s algorithm 1, step 1Eliminating extraneous attributes.

Tournament

Year Winner Winner Date of Birth

Place

Tournament

Year

Winner Winner Date of Birth

Place

Smaller, but equivalent (after taking closure of Armstrong’s laws)

14

Bernstein’s algorithm 1, step 2Finding nonredundant covering

• A set of functional dependencies is nonredundant when no element can be inferred from the others using Armstrong’s laws.

• Step 2 removes functional dependencies until the whole set becomes nonredundant.

15

Bernstein’s algorithm 1, step 3Partition

Tournament

YearWinner Winner

Date of Birth

Place These two functional dependencies share the left hand side.

16

Bernstein’s algorithm 1, step 4Construct Relations

Tournament

YearWinner Winner

Date of Birth

PlaceRelation 2{Tournament, Year, Place, Winner}

Relation 1{Winner, Winner Date of Birth}

Underlined attributes are keys.

These relations are in the third normal form. Why?

17

Formalization Strategies

• Never mention the relational semantics• Attributes are just elements of a type (with

equalities)• A functional dependency is a pair of sequents

of attributes• Derivations based on armstrong’s laws are

defined in an inductive manner.

18

Termination of algorithms.(coq computes only total, terminating functions)

• Termination of closure (on Armstrong’s laws)– Sizes converge because increasing and bounded– When sizes converge, the closure converges

• Termination of Bernstein’s algorithm 1– This is easier because all steps are simplification in

some case.– Repeat simplifying something until it cannot

simplified further.

19

Proving Preservation Properties

• Each step preserves the closure of functional dependencies!

• This property holds entirely without exception, so very easy to formalize and to prove (straightforward divide and conquer).

20

Proving 3NF

• Mostly followed the text(first, I omitted step 1 then the proof attempt failed)

• Changed a little to allow easier formalization.

• Some proof steps not understood entirely–Refactoring should bring enlightenments.

21

Some changes on Bernstein’s original proof.

Removed this graphical reasoning

“If there exists a (graphical) derivation using a functional dependency g,”

The root cause of such graphical objects

“If all (graphical) derivation uses a functional dependency g,”

A reformulation

22

Amount of codeParts Lines of code Comments

Properties of Armstrong’s laws &closure operation

~600 Took ~100 lines for proving that monotinic bounded sequence of natural numbers converge.

Definition of steps,Steps keep closures,When steps terminate, certain things are removed totally.

~700 Somewhat boilerplate.

The whole algorithm produces 3NF

~200 Very involved monolithic proof.

23

Still to be seen: Bernstein’s algorithm 2

• The number of relations produced by Bernstein’s algorithm 1 is not optimal

• Bernstein’s algorithm 2 gives optimal (= smallest) number of relations, answering Codd’s challenge.

• We just formalized the algorithm 2.

• And multi-dependencies, normal forms 4 and 5.

Recommended