24
Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University [email protected] October 28, 2011 06/16/22 1

Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University [email protected]

Embed Size (px)

Citation preview

Page 1: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Discovering Functional Dependencies in Relational Databases Using Data Mining

Techniques

Abrar Fawaz AlAbed-AlHaq

Kent State University

[email protected]

October 28, 2011

04/19/23 1

Page 2: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Outline

Introduction.

Problem Statement.

Data Mining Techniques.

Discovering Functional Dependencies: Example.

Experimental Results.

Conclusion.

References.

04/19/23 2

Page 3: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

IntroductionNowadays there is a fast growing amount of data that are collected

and stored in large databases. As a result, the databases may contain redundant or inconsistent data.

An important concept in relational schema design is that of a functional dependency. Functional dependencies (FD) plays key role in the design of relational databases, FD is a property of the semantic or meaning of the attributes. It helps in simplifying the structure of databases. Discovering functional dependencies (FDs) from an existing relational instance is an important technique in data mining and database design.

04/19/23 3

Page 4: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Introduction (cont.)

Functional dependencies are relationships between attributes of a database relation; a functional dependency states that the value of an attribute is uniquely determined by the value of some other attributes.

04/19/23 4

Page 5: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Outline

Introduction.

Problem Statement.

Data Mining Techniques.

Discovering Functional Dependencies: Example.

Experimental Results.

Conclusion.

References.

04/19/23 5

Page 6: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Problem Statement

The problem is to find all functional dependencies among attribute in a relation database. The early methods for discovering functional dependencies is based on repeatedly sorting and comparing tuples to determine whether or not these tuples meet FD definition, so this approach does not utilize the discovered FDs as knowledge to obtain new knowledge. As a result this approach is highly sensitive to the number of tuples and attributes, it is not practical for large database. Using FD_Mine and TANE algorithms, by using these algorithms, there is no need to sort on any attribute or compare any value.

04/19/23 6

Page 7: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Outline

Introduction.

Problem Statement.

Data Mining Techniques.

Discovering Functional Dependencies: Example.

Experimental Results.

Conclusion.

References.

04/19/23 7

Page 8: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Data Mining Techniques

Data mining is the process of producing useful knowledge and information, it uses an analysis tools to discover patterns and a relationships in large data that could be use. We can benefit from data mining algorithm for discovering functional dependencies in large databases to get useful information. Data mining have several techniques that we can use it for discovering functional dependencies such as Apriori algorithm.

We have two algorithms here:FD_MineTANE

04/19/23 8

Page 9: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

The FD_Mine Algorithm

The mechanism of the FD_Mine is as follow. It uses a level-wise search, where results from level k are used to explore level k +1. First, at level 1, all FDs X →Y where X and Y are single attributes are found and stored in FD_SET F1. The set of candidates that are considered at this level is denoted L1. F1 and L1 are used to generate the candidates Xi Xj of L2. At level 2, all FDs of the form Xi Xj → Y are found and stored in FD_SET F2, F1, F2, L1 and L2 are used to generate the candidates of L3, and so on, until the candidates at level Ln-1 have been checked or no candidates remain.

04/19/23 9

Page 10: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

The TANE Algorithm

TANE searches the set containment lattice in a levelwise manner. A level LI is the collection of attributes sets of size I. TANE starts with L1 = {{A} | A R}, and computes L2 from L1, L3 from L2 and so on according to the information obtained during the algorithm. To computing partitions; in the beginning, partitions with respect to the singleton attribute sets are computed straight from relation r. A partition ∏{A}, where ∏ denoted partition, is computed from the column r[A] as follows. First, the values of the column are replaced with integers 1, 2, 3… so the same values are replaced by same integers and different values with different integers. Then the value t[A] is the identifier of the equivalence class [t]{A} of ∏{A}, and ∏ {A} is then easy to construct.

04/19/23 10

Page 11: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Outline

Introduction.

Problem Statement.

Data Mining Techniques.

Discovering Functional Dependencies: Example.

Experimental Results.

Conclusion.

References.

04/19/23 11

Page 12: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Discovering functional dependencies: Example

04/19/23 12

Suppose that FD_Mine is applied to database D, as shown in Table, with R = {A, B, C, D, E}.

A B C D E

t1 0 0 0 2 0

t2 0 1 0 2 0

t3 0 2 0 2 2

t4 0 3 1 2 0

t5 4 1 1 1 4

t6 4 3 1 1 2

t7 0 0 1 2 0

Page 13: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Discovering functional dependencies: Example

04/19/23 13

Next table Summarizes the actions of FD_Mine. In iteration 1, since |ПA| = |ПAD| = 2, Closure’(A) is set to D, and A→D is deduced. In same way, D→A is discovered, so the equivalence A↔D is obtained. As a result, we only need to combine A, B, C, and E to generate the next level candidates {AB, AC, AE, BC, BE, CE}. At the same time, the nontrivial closure of each generated candidate is computed. For example, the closure’(AB)=closure’(A) U closure’(B) = {D} U φ = {D}. In iteration 2, for candidate AB, only AB→C and AB→E need to be checked, because R–{A, B}–Closure’(AB)={A, B, C, D, E}–{A, B}–{D}={C, E}. Since |ПAB| = |ПABE| = 6, then AB→E is obtained. In the same way, at this level, BE→A and CE→A are also discovered, so the equivalence AB↔BE is obtained. As a result, we only need to combine AB, AC, AE, BC, and CE to form the level 3 candidates, which are {ABC, ACE}. Since CE→A, ACE is pruned by pruning rule 4. Since AB→E, then ABC→E. Since A↔D, then ABC→D, so ABC is a key, and ABC is also pruned by pruning rule 2. No other candidate remains, so the algorithm halts.

Page 14: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Discovering functional dependencies: Example

Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences, Canada. 14

Page 15: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Discovering functional dependencies: Example

04/19/23 15

The result after removal is shown in table (b).

A B C D E

t1 0 0 0 2 0

t2 0 1 0 2 0

t3 0 2 0 2 2

t4 0 3 1 2 0

t5 4 1 1 1 4

t6 4 3 1 1 2

t7 0 0 1 2 0

A B C E

t1 0 0 0 0

t2 0 1 0 0

t3 0 2 0 2

t4 0 3 1 0

t5 4 1 1 4

t6 4 3 1 2

t7 0 0 1 0

(a) Before (b) After

Page 16: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Outline

Introduction.

Problem Statement.

Data Mining Techniques.

Discovering functional dependencies: Example.

Experimental Results.

Conclusion.

References.

04/19/23 16

Page 17: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Experimental ResultsFD_Mine was applied to fifteen datasets, obtained from the

UCI Machine Learning Repository and the results were compared to TANE. TANE was selected for comparison because it establishes the theoretical framework for the problem. For the dataset given in Table, Figure (a) shows the semi-lattice for FD_Mine, and Figure (b) shows that for TANE. Each node represents a combination of attributes. If an edge is shown between nodes X and XY, then X → Y needs to be checked. Hence, the number of edges is the number of FDs that need to be checked. Both semi-lattices shown in Figure(a) have fewer edges than the lattice. In addition, the semi-lattice for FD_Mine has fewer edges than that for TANE.

04/19/23 17

Page 18: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Experimental Results

18

Figure ( a ) FD_Mine Figure ( b ) TANE

Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences, Canada.

Page 19: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Experimental Results

Table 5.1 compares the number of FDs that are checked on data by FD_Mine and TANE for 15 UCI datasets. Figure 5.2 shows more detailed results for the Imports-85 dataset. At levels 1 through 5, both algorithms check approximately the same number of FDs, but at levels 6 through 11, FD_Mine checks fewer FDs than TANE, because it prunes more unnecessary candidates than TANE by using the equivalences and FDs discovered at previous levels. For more results.

04/19/23 19

Page 20: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Experimental Results

20Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences, Canada.

Page 21: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Outline

Introduction.

Problem Statement.

Data Mining Techniques.

Discovering functional dependencies: Example.

Experimental Results.

Conclusion.

References.

04/19/23 21

Page 22: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Conclusion

04/19/23 22

Discovering Functional Dependency (FD) is employed to extract the Functional Dependency from a datasets. This addresses two issues; one related to the discovering process of the FDs from the datasets and the second issues is to measure the discovering Functional Dependency (FDs) algorithm. 

The empirical comparison on 15 UCI datasets between FD_Mine and TANE shows FD_Mine examined fewer FDs than TANE, because it prunes the unnecessary candidates than TANE by using the equivalences and FDs discovered at previous levels. The result based on the experiments on 15 UCI datasets show that the FD_Mine algorithm can prune more candidates than TANE algorithm and it also show that FD_Mine can discovering FDs more than TANE algorithm.

Page 23: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

Outline

Introduction.

Problem Statement.

Data Mining Techniques.

Discovering functional dependencies: Example.

Experimental Results.

Conclusion.

References.

04/19/23 23

Page 24: Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu

References[1] Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional

Dependencies in a Database Using Equivalences, Canada.[2] Wyss, C., Giannella, C., and Robertson, E. (2001), FastFDs: A Heuristic-Driven,

Depth-First Algorithm for Mining Functional Dependencies from Relation Instances, U.S.A.

[3] Huhtala, Y., Karkkainen, J., Porkka, P., and Toivonen, H., (1999), TANE: An Efficient Algorithm for discovering Functional and Approximate Dependencies, Computing Journal, V.42, No.20, pp.100-107.

[4] Elmasri, R. and Navathe, S. (2004), Fundamentals of Database Systems, Addison Wesley, Fourth edition.

[5] Mannila, H. (2000), Theoretical Frameworks for Data Mining, SIGKDD Explorations, V.1, No.2, pp.30-32.

[6] Huhtala, Y., Karkkainen, J., Porkka, P., and Toivonen, H. (2000), Efficient Discovery of Functional and Approximate Dependencies Using Partitions, University of Helsinki, Finland.

Thank You

04/19/23 24