53
Jean-Charles Lamirel [email protected] INRIA-NSC 3rd SFW INRIA-NSC 3rd SFW Jean-Charles LAMIREL, Jieh HSIANG Liu WJ LORIA, Nancy, France Using a Background Neural Model in a Digital Library

Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

  • Upload
    zed

  • View
    60

  • Download
    0

Embed Size (px)

DESCRIPTION

Using a Background Neural Model in a Digital Library. Jean-Charles LAMIREL, Jieh HSIANG Liu WJ. LORIA, Nancy, France. The CORTEX team. Research areas : Biological-like models for intelligent information management Applications : Autonomous robotics and in-board intelligence - PowerPoint PPT Presentation

Citation preview

Page 1: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Jean-Charles LAMIREL, Jieh HSIANG

Liu WJ

LORIA, Nancy, France

Using a Background Neural Model in a Digital Library

Page 2: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Research areas : Biological-like models for intelligent

information management

Applications : Autonomous robotics and in-board intelligence Numerical classification (vs. symbolical) Information retrieval and discovery

The CORTEX team

Page 3: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Main themes of researchInterface for personalized access to informationIntelligent multimedia data miningWeb - Documentary database interaction

Collaborations ORPAILLEUR INRIA team, INIST, LaVillette,

NSC Taiwan, industry... European projects: SCHOLNET, EISCTES

The CORTEX information retrieval and discovery activity

Page 4: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Adaptive environment for assistance to investigation on the Web

Multi-topographic navigation MultiSOM For multimedia data mining For data mining on full text (patents)

Numerical-symbolic collaboration

Some examples of application

Page 5: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Introduction: Basic set of functionalities for information discovery Limitations of the classical methods for information

discovery The MultiSOM model + Butterfly application:

Basic behaviour Extensions

Management of textual information

[email protected]

Presentation summary

Page 6: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Synthetical view of the studied domain = Distribution of the thematical indicators of the domain Highligting of regularities / weak signals Management of several type of synthesis

Interactivity = Dynamic data mixture / type of need Choice of meta-orientation of investigation Setting of the granularity level of the analysis

Multimedia

Basic set of functionalities for information discovery

Page 7: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Exploratory (no goal): « Which is the contents of the database ?»

Thematic (general orientation): « Images of space conquest »

Connotative (hidden goal, indirect research): « Impressive images on human technology »

Precise: « Images of Amstrong moonwalk, July 69 »

Managing different kinds of queries for discovery

Page 8: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Overall view of the studied domain = Noise Complex interpretation (hidden information)

Local views necessarily independant Weaks signal difficult to highlight No interactivity =

Passive classification Predefined ways to access to information

Limitations of the classicalmethods for information discovery

Page 9: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Topographic learning (SOM) = classification projection

Multi-viewpoint modelization capabilties (MultiSOM) Intuitive auto-organization of information Active maps (IR + Navigation) Low human intervention during construction Multimedia capabilities

Neural methods for information cartography

Page 10: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Different kinds of query Query by keywords Query by example

Different kinds of criteria Colour (automatic) Shape (manual) Texture (manual)

Problems Hand-made classifications Combination of results coming from different criteria

Butterfly museum application

Yellow = very strong,Red = not,Edge = strongSpot = middle, …

Page 11: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Viewpoint classifications

Global and/or cross viewpoints classifications

Butterfly application

Query by keywords

Query by example

Adding new individuals

User interface

Combination of results

Validation of insertionor classification recalculation

Butterfly application automationU

ser interface

Page 12: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

IDF

Data description: Document (image) = index vector : eg vector of characteristics Weighting of the characteristics modalities (very strong=1, …) Optionnal IDF weighting (weak signals detection)

Basic topographic map building

WEIGHTED DESCRIPTION

TEXTURE

Page 13: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Map predefined parameters settings: Number of neurons Structure : eg 2D grid with square neighbourhood

Competitive learning:

Basic topographic map building

Page 14: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Selection of the winning neuronInfluence on the neigbourhood

Current data(image)

at time T

Competitive learning

Page 15: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Map labelization: Based on the best components of the profiles Class or member-oriented One single method is not sufficient

=> Gives an overview of the detected themes

Map zoning: Based on the SOM topographic properties Based on the best components of the class profiles

=> Gives an overview of the weights of the themes

Map labelization and zoning

Page 16: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

THEME

« YELLOW »

MULTIMEDIA THEMATIC CARTOGRAPHY OF « BUTTERFLY »

COLOR VIEWPOINT

THEME

« GREEN »

CENTRAL SUB.

LIST OF THEME MEMBERS

IMAGE DESCRIPTION

Page 17: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

VIEWPOINT 1

Basic map (core classification)

On-line generalizations

VIEWPOINT 2

The MultiSOM model

Page 18: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Goal: Synthethize the map contents by decreasing the number

of neurons (classes) Constraints:

Preserve the map topographic properties No classification re-computation

Method: Exploitation of the neighbourhood relations on the map

Map on-line generalization

Page 19: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Map on-line generalization

Page 20: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ
Page 21: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Subspace of the description space Can be a field, a subset of keywords, ... Possible overlapping sets Concurrent or complementary viewpoints

=>Examples: indexer keywords, title keywords, authors, … , visual characteristics, sounds

=>Butterflies: color, shape, texture, …

Semantic viewpoints

Page 22: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Goal: Cope with the limitations of a global map Allow communication between viewpoints

Constraints: Interpretable behaviour

Method: Re-projected data = Transmitters neurons Two steps:

1) Activation of a source map (directly or through a query)

2) Transmission to target maps

Inter-map communication

Page 23: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Inter-map communication

Page 24: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

A function:

Two modes: Possibilistic (weak thematic relations over viewpoints) 

Probabilistic (mesure of the themes similarities)

=> g = class belonging degree

Inter-map communication

Page 25: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Activity coherency

WEAK FOCALIZATION STRONG FOCALIZATION

Page 26: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

COLOR MAP

BUTTERLIES

Question: Regularities in textures of yellow butterflies ?

Inter-map communicationTEXTURE MAP

Response: YES, Spots and Edges

Page 27: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Compliance with IR operations

Question: Are there butterflies with spots AND veins ?

Response = NOResponse = YES

Page 28: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Remaining problems (to be solved)

Validation of the automatic classification results by the experts

Testing of different results merging methods Test the use of prototype features in classification* Realization of a Web interface for the maps Compare map build-in result combination mechanism

with external combination mechanism Test map capabilities for the help in adding new

individuals Introduce textual data and combine it with images

Page 29: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

THEME

« YELLOW »

USE OF COLOR PROTOTYPES

COLOR VIEWPOINT

YELLOW

Page 30: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Experimentation on patents (texts)

Goal : Intelligent technological survey =

Full text analysis of the patentsDomain of oil engineeringProvide answers to questions like :1. “Which are the relationships between patentees ?”,

2. “On which specific technology does a patentee work ? Which are the advantages of this specific technology ? For which use ?”,

Page 31: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Basic experimental protocol

[email protected]

DILIBReformating

Nominal groupsExtraction

MicroNOMADMultiSOM

Patents in XMLFormat

Structured by Viewpoints

PatentsDatabase

Interactive maps for analysisInteractive maps for analysisValidated

Multi-indexes

ViewpointsDefinition

Page 32: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Nominal groups extraction

1) Lexicographic analysis (compound terms)

2) Normalization :

Ex: “ oil fabrication ” and “ oil engineering” => “ oil engineering ”

Results :

Déposants Titre Utilisation AvantagesNombre de documents indexés 1000 1000 745 624Nombre d’index bruts générés 73 605 252 231Nombre final d’index (après filtrage) 32 589 234 207Nombre de classes non vides par carte * 28 55 57 61

Page 33: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Patents reindexingSelected Viewpoints: title, use, advantages and patentees

Page 34: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Use

Advantages

Title (Components)

Patentees

Example of dynamic analysisDYNAMIC DEDUCTION : Parentee «TONEN CORP. » is a specialist of lubrification

of the « automatic transmission ». It products mainly oils based on « organo- molybdenum compound » whic have the specific property of having a « friction

coefficient stable stable on a wide range of temperature »

Page 35: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

CLASS DESCRIPTION

Classical methods (AK-means)

Hidden link !

CLASSES MAP

Page 36: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Different viewpoints yield complementary results: Ex: Indexer keywords = Closed themes,

Title keywords = Open themes, ...

Detection of indexation inconsistencies Projection of thematic pertinence of a query Bilateral synergy: images <=> textual information Very rich and flexible inter-map communication

mechanism: Cross analysis between viewpoints, dynamics No limitation regarding viewpoints type and number

Conclusion

Page 37: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Sophisticated 2D mapping, 3D mapping Pure image mosaic navigation Automatization of communication between viewpoints Interaction with Gallois lattice: map zoning and

generalization, rule mapping, lattice entry points selection Applications:

1) La Vilette: interactive browsing through museum collection, setting up of exibitions

2) INIST: Cartography of the Web (EISCTES EEC Project)

Perspectives

Page 38: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

3) Combining Symbolic and Numeric Techniques for DL Contents Classification and

Analysis

Jean-Charles LAMIREL,

Yannick TOUSSAINT (Orpailleur)

Page 39: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Introduction

Combining numerical and symbolic methods: MicroNOMAD Self Organizing Maps (SOM)

• Basic SOM topographic properties

• MicroNOMAD multi-map communication process

Lattice• Formal properties and symbolic deduction

• Hierarchical structure and inheritance of properties

Study of projection of SOM over lattice • Making explicit formal properties on the map

• Map intelligent zoning and labelization

Page 40: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Galois lattice Symbolic hierarchical method: ({i1, i2}, {p1, p2, p3}) Partial order defined by the subsumption relation over

the set of formal concepts:

(I1, P1) (I2, P2) I1 I2,

(I1, P1) (I2, P2) P1 P2,

I1, I2 there is a unique meet and join.

Inheritance of properties Extraction of association rules:

Search Engine {Web, IR}

Page 41: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

{{i1, i2, i3, i4} , }

{ , {IA, Robots, Search Engine, Web, IR} }

{{i1, i2}, {Web, IR} }{{i4}, {AI, Robots} }

{{i1, i2, i3}, {Search Engine, Web, IR} }

R1 = Search Engine {Web, IR}

I = {i1, i2, i3, i4}, P = {AI, Robots, Search Engine, Web, IR}i1 = {Web, IR}i2 = {Web, IR}i3 = {Web, IR, Search Engine}i4 = {AI, Robots}

Page 42: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Complementarity of approaches Kohonen SOM

Complex weighting scheme Difficulty for precise interpretation Good illustrative power (topographic structure) Good synthesis capabilities Non linearity

Lattice High number of classes Memory and time consuming Hierarchical structure Rule extraction Incrementality

Page 43: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Projection

Agglomeration

3-steps methodologyGrouping

Page 44: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Conclusion

Cosine method seems to be the best of the test Good accuracy Well-balanced agglomeration Agglomeration preserves closed areas on SOM

Other projection and agglomeration methods have to be tested Preservation of partial order and inheritance

Page 45: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Perspectives

Evaluation on large corpus + Expert Rule management

class quality evaluation class labelisation

Deduction validation on communicating maps (lattice extensions)

Implementation of an operational prototype

Page 46: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Other approaches

Multi-classificator cooperation (PhD) SVM Stigmergy Genetic Neural maps

On-line learning of user ’s behaviour, intelligent relevance feedback

Page 47: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW

Annexes

Topographic inconsistencies Area computation Inter-map communication Activity coherency

Page 48: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Topographic inconsistencies

NO INCONSISTENCIES WEAK INCONSISTENCIES

STRONG INCONSISTENCIES

Page 49: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Neuron neighbourhood

STRONG

GLOBAL

Topographic inconsistencies

Page 50: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Area computation

WHILE IN

DO

END DO

SO AS

Page 51: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Inter-map communication

HYPERAWACKENESS : CONDITIONNAL POSSIBILITY

WEIGHTED SUM : CONDITIONNAL PROBABILITY

Page 52: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Viewpoint oriented Patents Analysis

Selected Viewpoints: title, use, advantages and patentees

[email protected]

Page 53: Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

MAP OF VIEWPOINT: ADVANTAGES

Themes «extending oil live » and « black sludge control »are strongly linked together

because they are neighbours on the map

« black sludge » apparition has a negative incidence on the « friction coefficient» of oil