Upload
dale
View
54
Download
0
Embed Size (px)
DESCRIPTION
Constraint Reasoning and Kernel Clustering for Pattern Decomposition With Scaling. Ronan LeBras Computer Science Theodoros Damoulas Computer Science Ashish Sabharwal Computer Science Carla P. Gomes Computer Science John M. Gregoire Materials Science / Physics - PowerPoint PPT Presentation
Citation preview
Constraint Reasoning and Kernel Clusteringfor Pattern Decomposition With Scaling
Ronan LeBras Computer ScienceTheodoros Damoulas Computer ScienceAshish Sabharwal Computer ScienceCarla P. Gomes Computer ScienceJohn M. Gregoire Materials Science / PhysicsBruce van Dover Materials Science / Physics
Sept 15, 2011 CP’11
Motivation
Ronan Le Bras - CP’11, Sept 15, 2011 2
Cornell Fuel Cell InstituteMission: develop new materials for fuel cells.
An Electrocatalyst must:
1) Be electronically conducting
2) Facilitate both reactions
Platinum is the best known metal to fulfill that role, but:
1) The reaction rate is still considered slow (causing energy loss)
2) Platinum is fairly costly, intolerant to fuel contaminants, and has a short lifetime.
Goal: Find an intermetallic compound that is a better catalyst than Pt.
Motivation
3
Recipe for finding alternatives to Platinum1) In a vacuum chamber, place a silicon wafer.2) Add three metals.3) Mix until smooth, using three sputter guns.4) Bake for 2 hours at 650ºC
Ta
Rh
Pd
(38% Ta, 45% Rh, 17% Pd)
• Deliberately inhomogeneous composition on Si wafer
• Atoms are intimately mixed
[Source: Pyrotope, Sebastien Merkel]
Ronan Le Bras - CP’11, Sept 15, 2011
Motivation
4
Identifying crystal structure using X-Ray Diffraction at CHESS• XRD pattern characterizes the underlying crystal fairly well• Expensive experimentations: Bruce van Dover’s research
team has access to the facility one week every year.
Ta
Rh
Pd
(38% Ta, 45% Rh, 17% Pd)
[Source: Pyrotope, Sebastien Merkel]
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
Ronan Le Bras - CP’11, Sept 15, 2011
Motivation
5
Ta
Rh
Pd
α
β
δ
γ
Ronan Le Bras - CP’11, Sept 15, 2011
Motivation
6
Ta
Rh
Pdα
β
α+βδ
γ
γ+δ
Ronan Le Bras - CP’11, Sept 15, 2011
Motivation
7
Ta
Rh
Pd
α
β
δ
γ
Ronan Le Bras - CP’11, Sept 15, 2011
Motivation
8
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
Ta
Rh
Pd
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
α
β
δ
γ
α+β
Ronan Le Bras - CP’11, Sept 15, 2011
Pi
Pj
Motivation
9
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
15 20 25 30 35 40 45 50 55 60 650
10
20
30
40
50
60
70
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
Ta
Rh
Pdα
β
α+β
δ
γ
Ronan Le Bras - CP’11, Sept 15, 2011
Pi
Pk
Pj
Motivation
Fe
Al
Si
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
INPUT: pure phaseregion
Fe
Al
Si
m phase regions k pure regions m-k mixed regions
XRD patterncharacterizingpure phases
Mixedphaseregion
OUTPUT:
Additional Physical characteristics: Peaks shift by 15% within a region Phase Connectivity Mixtures of 3 phases Small peaks might be discriminative Peak locations matter, more than peak
intensities
10Ronan Le Bras - CP’11, Sept 15, 2011
Motivation
11
Ta
Rh
Pd
Figure 1: Phase regions of Ta-Rh-Pd Figure 2: Fluorescence activity of Ta-Rh-Pd
Ronan Le Bras - CP’11, Sept 15, 2011
Outline
12
• Motivation• Problem Definition
Abstraction Hardness
• CP Model• Kernel-based Clustering• Bridging CP and Machine learning
• Conclusion and Future work
Ronan Le Bras - CP’11, Sept 15, 2011
Problem Abstraction: Pattern Decomposition with Scaling
13
Input
Output
Ronan Le Bras - CP’11, Sept 15, 2011
Problem Abstraction: Pattern Decomposition with Scaling
14
Input
Output
v1
v2
v3
v4
v5
Ronan Le Bras - CP’11, Sept 15, 2011
Problem Abstraction: Pattern Decomposition with Scaling
15
Input
Output
v1
v2
v3
v4
v5
P1
P2
P3
P4
P5
Ronan Le Bras - CP’11, Sept 15, 2011
Problem Abstraction: Pattern Decomposition with Scaling
16
Input
Output
v1
v2
v3
v4
v5
P1
P2
P3
P4
P5
M= K = 2, δ = 1.5
Ronan Le Bras - CP’11, Sept 15, 2011
Problem Abstraction: Pattern Decomposition with Scaling
17
Input
Output
v1
v2
v3
v4
v5
P1
P2
P3
P4
P5
M= K = 2, δ = 1.5B2
B1
Ronan Le Bras - CP’11, Sept 15, 2011
Problem Abstraction: Pattern Decomposition with Scaling
18
Input
Output
v1
v2
v3
v4
v5
P1
P2
P3
P4
P5
M= K = 2, δ = 1.5B2
B1
s11=1.00s12=0.95s13=0.88s14=0.84
s15=0
s21=0.68s22=0.78s23=0.85s24=0.96s25=1.00
Ronan Le Bras - CP’11, Sept 15, 2011
Problem Hardness
19
Assumptions: Each Bk appears by itself in some vi / No experimental noise
The problem can be solved* in polynomial time.
Assumption: No experimental noise The problem becomes NP-hard (reduction from the “Set Basis” problem)
Assumption used in this work: Experimental noise in the form of missing elements in Pi
v1
v2
v3
v4
v5
P1
P2
P3
P4
P5
M= K = 2, δ = 1.5B2
B1
s11=1.00s12=0.95s13=0.88s14=0.84
s15=0
s21=0.68s22=0.78s23=0.85s24=0.96s25=1.00
Ronan Le Bras - CP’11, Sept 15, 2011
P0=
Outline
20
• Motivation• Problem Definition• CP Model
ModelExperimental Results
• Kernel-based Clustering• Bridging CP and Machine learning
• Conclusion and Future work
Ronan Le Bras - CP’11, Sept 15, 2011
CP Model
21Ronan Le Bras - CP’11, Sept 15, 2011
CP Model (continued)
22
Advantage: Captures physical properties and relies on peak location rather than height.Drawback: Does not scale to realistic instances; poor propagation if experimental noise.
Ronan Le Bras - CP’11, Sept 15, 2011
CP Model – Experimental Results
23
0 1 2 3 4 5 60
200
400
600
800
1000
1200
1400
Number of unknown phases vs. running times (AlLiFe with |P| = 24)
N=10N=15N=28N=218
Number of unknown phases (K’)
Run
ning
tim
e (in
s)
For realistic instances, K’ = 6 and N ≈ 218...
Ronan Le Bras - CP’11, Sept 15, 2011
Outline
24
• Motivation• Problem Definition• CP Model• Kernel-based Clustering• Bridging CP and Machine learning
• Conclusion and Future work
Ronan Le Bras - CP’11, Sept 15, 2011
Kernel-based Clustering
25
Set of features: X =
Similarity matrices:[X.XT]
Method: K-means on Dynamic Time-Warping kernelGoal: Select groups of samples that belong to the same phase region to feed the CP
model, in order to extract the underlying phases of these sub-problems.
Better approach: takeshifts into account
Red: similarBlue: dissimilar
+
+
+
Ronan Le Bras - CP’11, Sept 15, 2011
Bridging CP and ML
26
Goal: a robust, physically meaningful, scalable, automated solution method that combines:
Similarity “Kernels”& Clustering
Machine Learningfor a “global
data-driven view”
UnderlyingPhysics
A language forConstraints
enforcing“local details”
Constraint Programming model
Ronan Le Bras - CP’11, Sept 15, 2011
Bridging Constraint Reasoning and Machine Learning: Overview of the Methodology
27
Fe
Al
Si
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
15 20 25 30 35 40 45 50 55 600
20
40
60
80
100
120
INPUT:
Peakdetection
Machine Learning:Kernel methods,
Dynamic Time Wrapping
Machine Learning:Partial “Clustering”
Fe
Al
SiCP Model & Solveron sub-problems
, +,
only only
Full CP Modelguided by
partial solutions
OUTPUT
Fix errorsin data
Ronan Le Bras - CP’11, Sept 15, 2011
Experimental Validation
28
Al-Li-Fe instancewith 6 phases:
Ground truth(known)
Our CP + ML hybridapproach is much
more robust
Previous work (NMF)violates many
physical requirements
Conclusion
29
Hybrid approach to clustering under constraints More robust than data-driven “global” ML approaches More scalable than a pure CP model “locally” enforcing constraints
An exciting application in close collaboration with physicists Best inference out of expensive experiments Towards the design of better fuel cell technology
Ronan Le Bras - CP’11, Sept 15, 2011
Future work
Ongoing work:Spatial Clustering, to further enhance cluster qualityBayesian Approach, to better exploit prior knowledge about local smoothness and available inorganic libraries
Active learning:Where to sample next, assuming we can interfere with the sampling process?When to stop sampling if sufficient information has been obtained?
Correlating catalytic properties across many thin-films:In order to understand the underlying physical mechanism of catalysis and to find promising intermettalic compounds
30Ronan Le Bras - CP’11, Sept 15, 2011
The end
Thank you!
31Ronan Le Bras - CP’11, Sept 15, 2011
Extra slides
32Ronan Le Bras - CP’11, Sept 15, 2011
Experimental Sample
33
Example on Al-Li-Fe diagram:
Ronan Le Bras - CP’11, Sept 15, 2011
Applications with similar structure
34
Flight Calls / Bird conservation
Identifying bird population from sound recordings at night.
Analogy: basis pattern = species samples = recordings physical constraints = spatial constraints,
species and season specificities…
Fire Detection
Detecting/Locating fires.
Analogy: basis pattern = warmth sources samples = temperature recordings physical constraints = gradient of
temperatures, material properties…
Ronan Le Bras - CP’11, Sept 15, 2011
Previous Work 1: Cluster Analysis (Long et al., 2007)
35
xi =
(Feature vector) (Pearson correlation coefficients) (Distance matrix)
(PCA – 3 dimensional approx) (Hierarchical Agglomerative Clustering)
Drawback: Requires sampling of pure phases, detects phase regions (not phases), overlooks peak shifts, may violate physical constraints (phase continuity, etc.).
Ronan Le Bras - CP’11, Sept 15, 2011
Previous Work 2: NMF (Long et al., 2009)
36
xi =
(Feature vector) (Linear positive combination (A) of basis patterns (S))
(Minimizing squared Frobenius norm
X = A.S + E Min ║E║
Drawback: Overlooks peak shifts (linear combination only), may violate physical constraints (phase continuity, etc.).
Ronan Le Bras - CP’11, Sept 15, 2011