38
Are We Really Discovering “Interesting” Knowledge from Data? Alex A. Freitas University of Kent, UK

Are We Really Discovering “Interesting” Knowledge from Data?

  • Upload
    gala

  • View
    48

  • Download
    3

Embed Size (px)

DESCRIPTION

Are We Really Discovering “Interesting” Knowledge from Data?. Alex A. Freitas University of Kent, UK. Outline of the Talk. Introduction to Knowledge Discovery Criteria to evaluate discovered patterns Selecting “interesting” rules/patterns User-driven (subjective) approach - PowerPoint PPT Presentation

Citation preview

Page 1: Are We Really Discovering “Interesting” Knowledge from Data?

Are We Really Discovering “Interesting” Knowledge from Data?

Alex A. FreitasUniversity of Kent, UK

Page 2: Are We Really Discovering “Interesting” Knowledge from Data?

Outline of the Talk

Introduction to Knowledge Discovery– Criteria to evaluate discovered patterns

Selecting “interesting” rules/patterns– User-driven (subjective) approach– Data-driven (objective) approach

Summary Research Directions

Page 3: Are We Really Discovering “Interesting” Knowledge from Data?

The Knowledge Discovery Process– the “standard” definition

“Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”

(Fayyad et al. 1996) Focus on the quality of discovered patterns

– independent of the data mining algorithm

This definition is often quoted, but not very seriously taken into account

Page 4: Are We Really Discovering “Interesting” Knowledge from Data?

Criteria to Evaluate the “Interestingness” of Discovered Patterns

useful 

novel, surprising 

comprehensible 

valid (accurate) 

Amount of Research

Difficulty of measurement

Page 5: Are We Really Discovering “Interesting” Knowledge from Data?

Main Machine Learning / Data Mining Advances in the Last Decade ??

Ensembles (boosting, bagging, etc) Support Vector Machines Etc…

These methods aim at maximizing accuracy, but…– they usually discover patterns that are not comprehensible

Page 6: Are We Really Discovering “Interesting” Knowledge from Data?

Motivation for comprehensibility

User’s interpretation and validation of discovered patterns

Give the user confidence in the discovered patterns– Improves actionability, usefulness

Importance of comprehensibility depends on the application– Pattern recognition vs. medical/scientific applications

Page 7: Are We Really Discovering “Interesting” Knowledge from Data?

Discovering comprehensible knowledge

Most comprehensible kinds of knowledge representation ??

– Decision trees– IF-THEN rules– Bayesian nets– Etc…

No major comparison of the comprehensibility of these representations - (Pazzani, 2000)

Typical comprehensibility measure: size of the model– Syntactical measure, ignores semantics

Page 8: Are We Really Discovering “Interesting” Knowledge from Data?

Accuracy and comprehensibility do not imply novelty or surprisingness

(Brin et al. 1997) found 6,732 rules with a maximum ‘conviction’ value in Census data, e.g.:– “five years olds don’t work”– “unemployed residents don’t earn income from work”– “men don’t give birth”

(Tsumoto 2000) found 29,050 rules, out of which only 220 (less than 1%) were considered interesting or surprising by the user

(Wong & Leung 2000) found rules with 40-60% confidence which were considered novel and more accurate than some doctor’s domain knowledge

Page 9: Are We Really Discovering “Interesting” Knowledge from Data?

The intersection between useful (actionable) and surprising patterns

(Silberchatz & Tuzhilin 1996) argue that there is a large intersection between these two notions

actionable surprising

The remainder of the talk focuses on the notions of novelty or surprisingness (unexpectedness)

Page 10: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting the Most Interesting Rules in a Post-Processing Step

Goal: select a subset of interesting rules out of all discovered rules, in order to show to the user just the most interesting rules

User-driven, “subjective” approach– Uses the domain knowledge, believes or preferences of the

user Data-driven, “objective” approach

– Based on statistical properties of discovered patterns– More generic, independent of the application domain and user

Page 11: Are We Really Discovering “Interesting” Knowledge from Data?

Data-driven vs user-driven rule interestingness measures

all rules all rules Note: user-driven measures real user interest

data-driven user-driven measure measure

believes, Dom. Knowl. user user

selected rules

selected rules

Page 12: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting Interesting Association Rules based on User-Defined Templates

User-driven approach: the user specifies templates for interesting and non-interesting rules

Inclusive templates specify interesting rules Restrictive templates specify non-interesting rules A rule is considered interesting if it matches at least

one inclusive template and it matches no restrictive template

(Klemettinen et al. 1994)

Page 13: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting Interesting Association Rules – an Example

Hierarchies of items and their categories food drink clothes

rice fish … hot dog coke wine … water trousers … socks

Inclusive template: IF (food) THEN (drink) Restrictive template: IF (hot dog) THEN (coke)

Rule IF (fish) THEN (wine) is interesting

Page 14: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting Surprising Rules based on General Impressions

User-driven approach: the user specifies her/his general impressions about the application domain

(Liu et al. 1997, 2000); (Romao et al. 2004)

Example: IF (salary = high) THEN (credit = good) (different from “reasonably precise knowledge”, e.g. IF

salary > £35,500 …) Discovered rules are matched with the user’s general

impressions in a post-processing step, in order to determine the most surprising (unexpected) rules

Page 15: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting Surprising Rules based on General Impressions – cont.

Case I – rule with unexpected consequent A rule and a general impression have similar antecedents

but different consequents

Case II – rule with unexpected antecedent A rule and a general impression have similar consequents

but the rule antecedent is different from the antecedent of all general impressions

In both cases the rule is considered unexpected

Page 16: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting Surprising Rules based on General Impressions - example

GENERAL IMPRESSION: IF (salary = high) AND (education_level = high) THEN (credit = good)

UNEXPECTED RULES: IF (salary > £30,000) AND (education BSc) AND (mortage = yes) THEN (credit = bad) /* Case I - unexpected consequent */

IF (mortgage = no) AND (C/A_balance > £10,000) THEN (credit = good) /* Case II – unexpected antecedent */ /* assuming no other general impression has a similar antecedent */

Page 17: Are We Really Discovering “Interesting” Knowledge from Data?

Data-driven approaches for discovering interesting patterns

Two broad groups of approaches:

Approach based on the statistical strength of patterns– entirely data-driven, no user-driven aspect

Based on the assumption that a certain kind of pattern is surprising to “any user”, in general– But still independent from the application domain and

from believes / domain knowledge of the target user– So, mainly data-driven

Page 18: Are We Really Discovering “Interesting” Knowledge from Data?

Rule interestingness measures based on the statistical strength of the patterns

More than 50 measures have been called rule “interestingness” measures in the literature

For instance, using rule notation: IF (A) THEN (C) Interest = |A C| – (|A| |C|) / N (Piatetsky-Shapiro, 1991)

– Measures deviation from statistical independence between A, C– Measures co-occurrence of A and C, not implication A C

The vast majority of these measures are based only on statistical strength or mathematical properties (Hilderman & Hamilton 2001), (Tan et al. 2002)

Are they correlated with real human interest ??

Page 19: Are We Really Discovering “Interesting” Knowledge from Data?

Evaluating the correlation between data-driven measures and real human interest

3-step Methodology:– Rank discovered rules according to each of a number of

data-driven rule interestingness measures– Show (a subset of) discovered rules to the user, who

evaluates how interesting each rule is– Measure the linear correlation between the ranking of

each data-driven measure and real human interest Rules are evaluated by an expert user, who

also provided the dataset for the data miner

Page 20: Are We Really Discovering “Interesting” Knowledge from Data?

First Study – (Ohsaki et al. 2004)

A single dataset (hepatitis), and a single user 39 data-driven rule interestingness measures First experiment: 30 discovered rules

– 1 measure had correlation 0.4; correlation: 0.48 Second experiment: 21 discovered rules

– 4 measures had correlation 0.4; highest corr.: 0.48 In total, out of 78 correlations, 5 (6.4%) were 0.4 NB: the paper also shows other performance measures

Page 21: Are We Really Discovering “Interesting” Knowledge from Data?

Second Study – (Carvalho et al. 2005)

8 datasets, one user for each dataset 11 data-driven rule interestingness measures Each data-driven measure was evaluated over

72 <user-rule> pairs (8 users 9 rules/dataset) 88 correlation values (8 users 11 measures)

– 31 correlation values (35%) were 0.6– The correlations associated with each measure varied

considerably across the 8 datasets/users

Page 22: Are We Really Discovering “Interesting” Knowledge from Data?

Data-driven approaches based on finding specific kinds of surprising patterns

Principle: a few special kinds of patterns tend to be surprising to most users, regardless of the “meaning” of the attributes in the pattern and the application domain

Does not require domain knowledge or believes specified by user

We discuss three approaches:– Discovery of exception rules contradicting general rules– Measuring the novelty of text-mined rules with WordNet– Detection of Simpson’s paradox

Page 23: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting Surprising Rules Which Are Exceptions of a General Rule

Data-driven approach: consider the following 2 rules R1: IF Cond1 THEN Class1 (general rule) R2: IF Cond1 AND Cond2 THEN Class2 (exception)

Cond1, Cond2 are conjunctions of conditions

Rule R2 is a specialization of rule R1 Rules R1 and R2 predict different classes An exception rule is interesting if both the rule and its

general rule have a high predictive accuracy (Suzuki & Kodratoff 1998), (Suzuki 2004)

Page 24: Are We Really Discovering “Interesting” Knowledge from Data?

Selecting Surprising Exception Rules – an Example

Mining data about car accidents – (Suzuki 2004)

IF (used_seat_belt = yes) THEN (injury = no) (general rule)

IF (used_seat_belt = yes) AND (passenger = child) THEN (injury = yes) (exception rule)

If both rules are accurate, then the exception rule would be considered interesting (it is both accurate and surprising)

Page 25: Are We Really Discovering “Interesting” Knowledge from Data?

Measuring Novelty of Text-Mined Rules with WordNet (Basu et al. 2001)

Motivation: avoid the discovery of trivial text-mined rules such as “SQL database”

WordNet is an online lexical knowledge-base of about 130,000 English words and some semantic relationships between them

Rule format: IF A (words) THEN C (words) Novelty measure based on measuring the semantic

distance between each pair of words in A and C – Distance based on number of edges in the shortest path

between words in the WordNet network

Page 26: Are We Really Discovering “Interesting” Knowledge from Data?

Simpson’s Paradox – an example

Income and taxes in the USA (in 109 US$) Year Value 1974 income 880.18 tax 123.69 rate 0.141 From 1974 to 1978, the 1978 tax rate (tax / income) income 1242.40 has increased tax 188.58 rate 0.152

Page 27: Are We Really Discovering “Interesting” Knowledge from Data?

Simpson’s Paradox – an example (cont.)

income category (in 103 US$) Year <5 5..10 10..15 15..100 >100 Total 1974 income 41.7 146.4 192.7 470.0 29.4 880.18 tax 2.2 13.7 21.5 75.0 11.3 123.69 rate 0.054 0.093 0.111 0.160 0.384 0.141

1978 income 19.9 122.9 171.9 865.0 62.81 1242.40

tax 0.7 8.8 17.2 137.9 24.05 188.58 rate 0.035 0.072 0.100 0.159 0.383 0.152

Overall the rate increased, but it decreased in each category!

Page 28: Are We Really Discovering “Interesting” Knowledge from Data?

Discovering Instances of Simpson’s Paradox as Surprising Patterns

Simpson’s paradox occurs when: One value of attribute A (year) increases the probability

of event X (tax) in a given population but, at the same time, decreases the probability of X in every subpopulation defined by attribute B (income category)

Although Simpson’s paradox is known by statisticians, occurrences of the paradox are surprising to users

Algorithms that find all instances of the paradox in data and rank them in decreasing order of surprisingness:

– (Fabris & Freitas 1999, 2005) – algorithms for “flat”, single-relation data and for hierarchical multidimensional (OLAP) data

Page 29: Are We Really Discovering “Interesting” Knowledge from Data?

Summary

Data mining algorithms can easily discover too many rules/patterns to the user – there is a clear motivation for selecting the most interesting rules/patterns

The main challenge is to discover novel, useful patterns, going beyond accuracy and comprehensibility

User-driven and data-driven approaches have complementary advantages and disadvantages

– Using a hybrid approach seems sensible Discovering patterns that are truly interesting to the

user without using a lot of user-specified domain knowledge is an open problem

Page 30: Are We Really Discovering “Interesting” Knowledge from Data?

Research Direction – improving the effectiveness of data-driven measures

Estimate the user’s interest in a rule with an ensemble of data-driven rule interestingness measures: int1…intm

Training Set Test Set int1 . . . Intm user’s int int1 . . . Intm user’s intRule 1 0.9 0.4 high 0.2 0.5 ? : : . . . : : : . . . : :Rule n 0.3 0.8 low 0.9 0.2 ?

Mapping from (int1…intm) into predicted user’s interest: can be a predefined function (e.g. voting) or learned

In (Abe et al. 2005) a classification algo learns the mapping

Page 31: Are We Really Discovering “Interesting” Knowledge from Data?

Research Direction – increasing the autonomy of the user-driven approach

User-driven approach is very user-dependent, requires “prior model” (e.g. general impressions) from the user

Automatic learning of general impressions with text mining

data

Web text general data mining impressions mining

patterns

Page 32: Are We Really Discovering “Interesting” Knowledge from Data?

Research Direction – a broader role for interestingness measures

Think about interestingness (surprisingness, usefulness) since the start of the KDD process:

pre-proc data mining post-proc

interestingness measures

E.g. Attribute selection and construction trying to maximize not only accuracy and comprehensibility, but also surprisingness and/or usefulness

Page 33: Are We Really Discovering “Interesting” Knowledge from Data?

Any Questions ??

Thanks for listening!

Page 34: Are We Really Discovering “Interesting” Knowledge from Data?

References

(Abe et al. 2005) H. Abe, S. Tsumoto, M. Ohsaki, T. Yamaguchi. A rule evaluation support method with learning models based on objective rule evaluation indices. To appear in 2005 IEEE Int. Conf. on Data Mining.

(Basu et al. 2001) S. Basu, R.J. Mooney, K.V. Pasupuleti, J. Ghosh. Evaluating the novelty of text-mined rules using lexical knowledge. Proc. KDD-2001, 233-238.

(Brin et al. 1997) S. Brin, R. Motwani, J.D. Ullman, S. Tsur. Dynamic itemset counting and implication rules for market basket data. Proc. KDD-97.

(Carvalho et al. 2005) D.R. Carvalho, A.A. Freitas, N.F. Ebecken. Evaluating the correlation between objective rule interestingness measures and real human interest. To appear in Proc. PKDD-2005.

Page 35: Are We Really Discovering “Interesting” Knowledge from Data?

References (cont.)

(Fabris & Freitas1999) C.C. Fabris and A.A. Freitas. Discovering surprising patterns by detecting occurrences of Simpson's paradox. In: Research and Development in Intelligent Systems XVI, 148-160. Springer

(Fabris & Freitas 2005) C.C. Fabris and A.A. Freitas. Discovering surprising instances of Simpson’s paradox in hierarchical multidimensional data. To appear in Int. J. of Data Warehousing and Mining.

(Fayyad et al., 1996) U. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From data mining to knowledge discovery: an overview. In: Advances in Knowledge Discovery and Data Mining, 1-34. AAAI Press.

(Hilderman & Hamilton 2001) R.J. Hilderman and H.J. Hamilton. Knowledge Discovery and Measures of Interest, Kluwer.

Page 36: Are We Really Discovering “Interesting” Knowledge from Data?

References (cont.)

(Klemettinen et al. 1994) M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. Proc. 3rd Int. Conf. on Information and Knowledge Management, 401-407.

(Liu et al. 1997) B. Liu, W. Hsu, S. Chen. Using general impressions to analyze discovered classification rules. Proc. KDD-97, 31-36

(Liu et al. 1999) B. Liu, W. Hsu, L-F. Mun, H-Y Lee. Finding interesting patterns using user expectations. IEEE Trans. on Knowledge Engineering 11(6), 817-832.

(Ohsaki et al. 2004) M. Ohsaki, S. Kitaguchi, K. Okamoto, H. Yokoi, T. Yamaguchi. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. Proc. PKDD-2004, 362-373.

Page 37: Are We Really Discovering “Interesting” Knowledge from Data?

References (cont.) (Pazzani 2000) M.J. Pazzani. Knowledge discovery from data? IEEE

Intelligent Systems, Mar/Apr. 2000, pp. 10-13. (Piatetsky-Shapiro, 1991) G. Piatetsky-Shapiro. Discovery, analysis and

presentation of strong rules. In: Knowledge Discovery in Databases, 229-248. AAAI/MIT Press.

(Romao et al. 2004) W. Romao, A.A. Freitas, I.M.S. Gimenes. Discovering Interesting Knowledge from a Science & Technology Database with a Genetic Algorithm. Applied Soft Computing 4, 121-137.

(Silberchatz & Tuzhilin 1996) S. Silberchatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowledge and Data Engineering, 8(6).

(Suzuki & Kodratoff 1998) E. Suzuki and Y. Kodratoff. Discovery of surprising exception rules based on intensity of implication. Proc. PKDD-98.

Page 38: Are We Really Discovering “Interesting” Knowledge from Data?

References (cont.)

(Suzuki 2004) E. Suzuki. Discovering interesting exception rules with rule pair. Proc. Workshop on Advances in Inductive Rule Learning at PKDD-2004, 163-178.

(Tan et al. 2002) P-N. Tan, V. Kumar, J. Srivastava. Selecting the right interestingness measure for association patterns. Proc. KDD-2002.

(Tsumoto 2000) S. Tsumoto. Clinical knowledge discovery in hospital information systems: two case studies. Proc. PKDD-2000, 652-656.

(Wong & Leung 2000) M.L. Wong and K.S. Leung. Data mining using grammar based genetic programming and applications. Kluwer.