Are We Really Discovering “Interestingâ€‌ Knowledge from Data?

  • View

  • Download

Embed Size (px)


Are We Really Discovering “Interesting” Knowledge from Data?. Alex A. Freitas University of Kent, UK. Outline of the Talk. Introduction to Knowledge Discovery Criteria to evaluate discovered patterns Selecting “interesting” rules/patterns User-driven (subjective) approach - PowerPoint PPT Presentation

Text of Are We Really Discovering “Interestingâ€‌ Knowledge from Data?

  • Are We Really Discovering Interesting Knowledge from Data? Alex A. FreitasUniversity of Kent, UK

  • Outline of the TalkIntroduction to Knowledge DiscoveryCriteria to evaluate discovered patternsSelecting interesting rules/patternsUser-driven (subjective) approachData-driven (objective) approachSummaryResearch Directions

  • The Knowledge Discovery Process the standard definitionKnowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayyad et al. 1996) Focus on the quality of discovered patternsindependent of the data mining algorithmThis definition is often quoted, but not very seriously taken into account

  • Criteria to Evaluate the Interestingness of Discovered Patterns useful novel, surprisingcomprehensiblevalid (accurate)Amount of ResearchDifficulty of measurement

  • Main Machine Learning / Data Mining Advances in the Last Decade ??Ensembles (boosting, bagging, etc)Support Vector MachinesEtc

    These methods aim at maximizing accuracy, butthey usually discover patterns that are not comprehensible

  • Motivation for comprehensibilityUsers interpretation and validation of discovered patternsGive the user confidence in the discovered patternsImproves actionability, usefulnessImportance of comprehensibility depends on the applicationPattern recognition vs. medical/scientific applications

  • Discovering comprehensible knowledgeMost comprehensible kinds of knowledge representation ??Decision treesIF-THEN rulesBayesian netsEtcNo major comparison of the comprehensibility of these representations - (Pazzani, 2000)Typical comprehensibility measure: size of the modelSyntactical measure, ignores semantics

  • Accuracy and comprehensibility do not imply novelty or surprisingness(Brin et al. 1997) found 6,732 rules with a maximum conviction value in Census data, e.g.:five years olds dont workunemployed residents dont earn income from workmen dont give birth (Tsumoto 2000) found 29,050 rules, out of which only 220 (less than 1%) were considered interesting or surprising by the user(Wong & Leung 2000) found rules with 40-60% confidence which were considered novel and more accurate than some doctors domain knowledge

  • The intersection between useful (actionable) and surprising patterns(Silberchatz & Tuzhilin 1996) argue that there is a large intersection between these two notions

    actionable surprising

    The remainder of the talk focuses on the notions of novelty or surprisingness (unexpectedness)

  • Selecting the Most Interesting Rules in a Post-Processing StepGoal: select a subset of interesting rules out of all discovered rules, in order to show to the user just the most interesting rulesUser-driven, subjective approachUses the domain knowledge, believes or preferences of the userData-driven, objective approachBased on statistical properties of discovered patternsMore generic, independent of the application domain and user

  • Data-driven vs user-driven rule interestingness measures

    all rules all rules Note: user-driven measures real user interestdata-drivenuser-driven measure measure believes, Dom. Knowl. user user selected rules selected rules

  • Selecting Interesting Association Rules based on User-Defined TemplatesUser-driven approach: the user specifies templates for interesting and non-interesting rulesInclusive templates specify interesting rules Restrictive templates specify non-interesting rulesA rule is considered interesting if it matches at least one inclusive template and it matches no restrictive template (Klemettinen et al. 1994)

  • Selecting Interesting Association Rules an ExampleHierarchies of items and their categories food drink clothes

    rice fish hot dog coke wine water trousers socks

    Inclusive template: IF (food) THEN (drink)Restrictive template: IF (hot dog) THEN (coke)

    Rule IF (fish) THEN (wine) is interesting

  • Selecting Surprising Rules based on General Impressions User-driven approach: the user specifies her/his general impressions about the application domain (Liu et al. 1997, 2000); (Romao et al. 2004)Example: IF (salary = high) THEN (credit = good) (different from reasonably precise knowledge, e.g. IF salary > 35,500 )Discovered rules are matched with the users general impressions in a post-processing step, in order to determine the most surprising (unexpected) rules

  • Selecting Surprising Rules based on General Impressions cont.Case I rule with unexpected consequent A rule and a general impression have similar antecedents but different consequentsCase II rule with unexpected antecedent A rule and a general impression have similar consequents but the rule antecedent is different from the antecedent of all general impressionsIn both cases the rule is considered unexpected

  • Selecting Surprising Rules based on General Impressions - exampleGENERAL IMPRESSION: IF (salary = high) AND (education_level = high) THEN (credit = good)

    UNEXPECTED RULES: IF (salary > 30,000) AND (education BSc) AND (mortage = yes) THEN (credit = bad) /* Case I - unexpected consequent */ IF (mortgage = no) AND (C/A_balance > 10,000) THEN (credit = good) /* Case II unexpected antecedent */ /* assuming no other general impression has a similar antecedent */

  • Data-driven approaches for discovering interesting patternsTwo broad groups of approaches:Approach based on the statistical strength of patternsentirely data-driven, no user-driven aspectBased on the assumption that a certain kind of pattern is surprising to any user, in generalBut still independent from the application domain and from believes / domain knowledge of the target userSo, mainly data-driven

  • Rule interestingness measures based on the statistical strength of the patternsMore than 50 measures have been called rule interestingness measures in the literatureFor instance, using rule notation: IF (A) THEN (C)Interest = |A C| (|A| |C|) / N (Piatetsky-Shapiro, 1991) Measures deviation from statistical independence between A, CMeasures co-occurrence of A and C, not implication A CThe vast majority of these measures are based only on statistical strength or mathematical properties (Hilderman & Hamilton 2001), (Tan et al. 2002)Are they correlated with real human interest ??

  • Evaluating the correlation between data-driven measures and real human interest3-step Methodology:Rank discovered rules according to each of a number of data-driven rule interestingness measuresShow (a subset of) discovered rules to the user, who evaluates how interesting each rule isMeasure the linear correlation between the ranking of each data-driven measure and real human interestRules are evaluated by an expert user, who also provided the dataset for the data miner

  • First Study (Ohsaki et al. 2004)A single dataset (hepatitis), and a single user39 data-driven rule interestingness measures First experiment: 30 discovered rules1 measure had correlation 0.4; correlation: 0.48Second experiment: 21 discovered rules4 measures had correlation 0.4; highest corr.: 0.48In total, out of 78 correlations, 5 (6.4%) were 0.4NB: the paper also shows other performance measures

  • Second Study (Carvalho et al. 2005)8 datasets, one user for each dataset11 data-driven rule interestingness measures Each data-driven measure was evaluated over 72 pairs (8 users 9 rules/dataset)88 correlation values (8 users 11 measures)31 correlation values (35%) were 0.6The correlations associated with each measure varied considerably across the 8 datasets/users

  • Data-driven approaches based on finding specific kinds of surprising patternsPrinciple: a few special kinds of patterns tend to be surprising to most users, regardless of the meaning of the attributes in the pattern and the application domainDoes not require domain knowledge or believes specified by userWe discuss three approaches:Discovery of exception rules contradicting general rulesMeasuring the novelty of text-mined rules with WordNetDetection of Simpsons paradox

  • Selecting Surprising Rules Which Are Exceptions of a General RuleData-driven approach: consider the following 2 rules R1: IF Cond1 THEN Class1 (general rule) R2: IF Cond1 AND Cond2 THEN Class2 (exception) Cond1, Cond2 are conjunctions of conditionsRule R2 is a specialization of rule R1 Rules R1 and R2 predict different classesAn exception rule is interesting if both the rule and its general rule have a high predictive accuracy (Suzuki & Kodratoff 1998), (Suzuki 2004)

  • Selecting Surprising Exception Rules an ExampleMining data about car accidents (Suzuki 2004) IF (used_seat_belt = yes) THEN (injury = no) (general rule) IF (used_seat_belt = yes) AND (passenger = child) THEN (injury = yes) (exception rule)If both rules are accurate, then the exception rule would be