Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Visual Analytics for Explainable
Machine Learning
Shixia Liu (刘世霞)
Tsinghua University (清华大学)
Recent ML Systems Achieve Superhuman Performance
AlphaGo beats Go
human champ
Deep Net outperforms humans
in image classification
Deep Net beats human at
recognizing traffic signs
DeepStack beats
professional poker players
Computer out-plays
humans in "doom"
Autonomous search-and-rescue
drones outperform humans
2
IBM's Watson destroys
humans in jeopardy
Machine Learning as a Black Magic or Black Box
Most machine learning processes are NP-processes
Explainable AI – What Are We Trying To Do?
Learning
Process
Today
Learned
Function
Output
This is a cat
(p = .93)
Training
Data
New
Learning
Process
Tomorrow
• Why did you do that?
• Why not something else?
• When do you succeed?
• When do you fail?
• When can I trust you?
• How do I correct an error?User with
a Task
Training
Data
Explainable
Model
Explanation
Interface
It has fur,
whiskers, and
claws.
• I understand why
• I know when you’ll succeed
• I know when you’ll fail
• I know when to trust you
• I know why you erredUser with
a Task
Explainable Artificial Intelligence - Darpa
DARPA - Explainable Artificial Intelligence (XAI)
New
Machine
Learning
Process
Training
Data
XAI
• I understand why
• I understand why not
• I know when you succeed
• I know when you fail
• I know when to trust you
• I know why you erred
Task
User
Explainable
Model
Explanation
Interface
develop a range of new or
modified machine learning
techniques to produce
more explainable models
integrate state-of-the-art HCI
with new principles, strategies,
and techniques to generate
effective explanations
Explainable Models Explanation Interface
summarize, extend, and apply
current psychological theories
of explanation to develop a
computational theory
Psychology of Explanation
Machine Learning Pipeline
Solution = Data + ML Expertise + Computation
Tasks- Preprocess and clean the data
- Select and construct appropriate features
- Select an appropriate model family
- Optimize model hyperparameters
- Postprocess machine learning models
- Critically analyze the results obtained
Data is the King of Machine Learning!
LabelInspect, IEEE VIS 2018 (TVCG) AEVis, IEEE VIS 2018 (TVCG)
IEEE VIS 2018
Adversarial Examples
• Intentionally designed to mislead a deep neural
network (DNN) into making incorrect prediction
Deep neural networks
DNNGiant panda
Guenon monkey
Robustness?
Technical Challenges
• Extract the datapath for adversarial examplesNormal example
Adversarial example
• Datapath visualization
A datapath
Hundreds of layers
Millions of neurons
Millions of connections
Datapath Extraction - Motivation
• Current method• Most activated neurons
• Problem• Misleading results when existing a
highly recognizable secondary object
• Reason• Neurons have complex interactions
• Gap between activation and prediction
Most activated
Neuron 1 Neuron 2
Learned feature
Activations
Neuron
Datapath Extraction - Formulation
• The critical neurons for a prediction: the neurons that highly
contributed to the final prediction
• Subset selection
– Keep the original prediction by selecting a minimized subset of neurons
• Extend to a set of images X
N: all neurons
Ns: neuron subset
p(): prediction
12
12
Datapath Extraction - Solution
• Directly solving: time-consuming
– NP-complete
– Large search space due to the large number of neurons in a CNN
Divide-and-conquer-based
search space reductionQuadratic approximation
An accurate approximation in smaller search space
Datapath Extraction – Search Space Reduction
• Original problem: 57.78 million dims
Processing layer by layer
2k ~ 1.44 million dimss
Aggregate neurons into feature maps
Split into layers
Split into feature maps
Network: ResNet-101
Neurons in a layer Neurons in a feature map
A neuron64 ~ 2k dimss
Datapath Extraction – Quadratic Approximation
Reformulate
whether the j-th feature
map in layer i is criticalDiscrete to continuous
Needs to calculate
by BP in each iteration
Taylor decomposition: activation vector of
the j-th feature map
[ , ] )( ) (j i i
j
k
k
pQ j k
p
a aa a
1. Bridge the gap
between activation and
prediction
2. Each element in Q
approximately models the
interaction between
feature map j and feature
map k Quadratic optimization
Still NP
Datapath Visualization
• Explain why an adversarial example is misclassified
• Improve the robustness of the learning model
Why?
17
18
20
19
21
20
21
21
21
21
21
21
21
Feature map cluster, color: A(normal) – A(adversarial)
Euler-diagram-based layout to present
feature maps in a layer
Shared feature maps
Unique feature maps
Large model, but sparsely activated
Single model to solve many tasks
Visualization and Machine Learning pipeline
Outline
Refinement
Diagnosis
Want huge model capacity
for large datasets
Understanding
Understanding
Refinement
Diagnosis
Outline
Analyzing the Training Processes of
Deep Generative Models
Mengchen Liu, Jiaxin Shi, Kelei Cao, Jun Zhu, Shixia Liu
Tsinghua University
IEEE VAST 2017 (IEEE TVCG)
Deep Generative Models (DGMs)
z
x
𝑝 𝑧 𝑥, 𝑘 ∝ 𝑝 𝑥 𝑧, 𝑘 𝑝(𝑧|𝑘)
Deep neural network
𝑧 = 𝑓(𝑥)
z
x
Smiling face
Knowledge
Unsupervised /
semi-supervisedRandom variables
VAE
GAN
DGM
Deep Generative Models (DGMs)
Deep neural network
Pose
2
z
x
Training a DGM is Hard
• DGM often involves both deterministic functions
and random variables
– CNN: deterministic functions (e.g., convolution)
• DGM involves a top-down generative process
and a bottom-up Bayesian inference process
– CNN: a bottom-up process: input at the bottom layer
-> high-level features -> outputs
z
x
Random
variables
z
x
DGM CNN
Challenges
• Handle a large amount of time series data
– Millions of activations/gradients/weights in a DGM
• Identify the root cause of a failed training process
– Errors may arise from multiple possible sources: abnormal training samples,
inappropriate network structures, and lack of numerical stability in the library
– Even when we can determine that the error is caused by the network
structure, it is often difficult to locate the specific neurons
Our Solution
• A blue noise polyline sampling algorithm
– Selects polyline samples the with blue-noise properties
– Preserves outliers and reduce visual clutter
• A credit assignment algorithm
– Discloses how other neurons contribute to the output of the neuron of interest
DGMTracker
• Better understand and diagnose the training process of a DGM
6
Case Study: Debugging a Failed Training Process of
a Variational Autoencoder (VAE)
• Autoencoder
– Reconstruct their input with minimum information loss
• Variational autoencoder
– Probabilistic version of an autoencoder
– zv : a vector of random variables
– za : a vector of real numbers
Encoder DecoderCode: zaInput: x Reconstructed input: x’
Dataset: CIFAR10 dataset
Loss = NaN (10k-30k iterations)
An example case: fails at 24,397
Loss changesFails at iteration 24,397
An abnormal snapshot
How data flows through a DGM
Focus snapshot
Maximum activation
Average activation
Minimum activation
Time
Source: 2nd Gaussian
sampling layerLogarithmic variance of the
Gaussian sampling layer
Aggregate height and weight
Some of them showed unusual behavior
Most of the activations of this layer remained stable
A sudden increase
Abnormal image Normal image
Solution
• Trial 1:
– Replacing with , but the training failed again
– By the same analysis, we find another “bad” image
• Trial 2:Much smoother
Large variance large samples large increase in loss
DGMTracker
12
Research Opportunities
• Human-in-the-loop visual analytics for practitioners
– Existing deep learning models are data-driven
– Combine human expert knowledge and deep learning techniques through
interactive visualization
• Progressive visual analytics of deep learning models
– The training of many deep learning models is time-consuming
– Progressive visual analytics techniques are needed
Jaegul Choo, Shixia Liu.Visual Analytics for Explainable Deep Learning. IEEE Computer
Graphics and Applications, 2018.
Liu S, Wang X, Liu M, Zhu J. Towards better analysis of machine learning models: A visual
analytics perspective. Visual Informatics. 2017 Mar 1;1(1):48-56.
Research Opportunities (Cont’d)
• Improving the robustness of deep learning models for
secure artificial intelligence
– Deep learning models are generally vulnerable to adversarial perturbations
– Incorporate human knowledge to improve the robustness of deep learning
models
• Reducing the size of the required training set
– One-shot learning or zero-shot learning
• Visual analytics for advanced deep learning architectures
– ResNet and DenseNet
References
• M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards better analysis of
deep convolutional neural networks. IEEE TVCG, 23(1):91–100, 2017
• M Liu, J Shi, K Cao, J Zhu, S Liu. Analyzing the Training Processes of Deep
Generative Models. IEEE TVCG, , 24(1), 77-87, 2018.
• S. Liu, J. Xiao, J. Liu, X. Wang, J. Wu, & J. Zhu. Visual diagnosis of tree
boosting methods. IEEE TVCG, 24(1), 163-173, 2018.
• Jaegul Choo, Shixia Liu. Visual Analytics for Explainable Deep Learning.
IEEE Computer Graphics and Applications, 2018.
• S. Liu, X. Wang, M. Liu, and J. Zhu. Towards better analysis of machine
learning models: A visual analytics perspective. Visual Informatics, 1(1):48–
56, 2017.
Acknowledgements
• Professors
– Prof. Jun Zhu (Tsinghua), Prof. Hang Su (Tsinghua)
– Prof. Jaegul Choo (Korea University)
• Students
– Mengchen Liu, Kelei Cao, Changjian Chen, Fangxin Ouyang, Jiaxin Shi,
Zhen Li, Chongxuan Li
Thanks a lot for your attention!