Upload
lorna
View
148
Download
0
Embed Size (px)
DESCRIPTION
新的挑战 - 基于内容的信息处理. 张 钹 清华信息科学与技术国家实验室 智能技术与系统国家重点实验室 清华大学信息学院、计算机系. 一、经典信息论 Classical Information Theory. Semantics vs. Information Processing Frequently the messages have meaning: that is they refer to or are correlated according to some - PowerPoint PPT Presentation
Citation preview
新的挑战 - 基于内容的信息处理
张 钹清华信息科学与技术国家实验室智能技术与系统国家重点实验室
清华大学信息学院、计算机系
一、经典信息论Classical Information
Theory
Semantics vs. Information Processing
• Frequently the messages have meaning: that is
they refer to or are correlated according to some
system with certain physical or conceptual entities.
• These semantic aspects of communication
irrelevant to the engineering problem.
C. E. Shannon, A mathematical theory of communication, 1948
经典通讯模型
• Communication Model:
Sender
Receiver
(Markov) Stochastic Process
-C. E. Shannon
( ) ( ) ( )P x P y x P x y
图灵计算理论Turing Computation Theory
Deterministic Turing Machine (1937)
Computability
Algorithms
Computational Complexity
P Read-write Head
tape -4 -3 -2 -1 0 +1
QFinite State Controller
经典信息论下的信息处理Information Processing
The conversion of latent information intomanifest information _C. E. Shannon
X -p(x) Y -
Dissipation Equivocation
Transformation=equivocation-dissipation
Uncertainty (error, noisy, deformation,…)
( )p x y
经典信息论在文本、图像处理中的应用
The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel
• Communication: coding, data
compression, noisy-channel coding,..• Text: editor, compression, spell and syntax correction,…• Image: editor, lossless compression, noise suppression, enhancement, …
二、基于内容(语义)的信息处理
Information (Web) Age
• Complex (variety of ) information:
text, image, speech, video,…
• A huge amount of data
• Man-Machine Interaction
Content (Semantics) based Information Processing
Information Retrieval, Classification, Summary,
Recognition, Understanding,..
Computer
Input
(Encoding)
Output
(Decoding)
Users
Codes Form
Content
从处理形式( Form )到与内容( Semantics)相关的处理
Natural Man-Machine Interaction
Computer Network
三、句法分析( Syntact Analysis ) Holistic Coding (Gestalt)
The basic properties among different quotient spaces such as the falsity preserving, the truth preserving properties are discussed. There are three quotient-space model construction approaches.
数字视频编码技术发展至今已有半个世纪的历史,已取得很大的进展。从五十年代的差分预测编码,到七十年代的变换编码、基于块的运动预测编码,直到如今兴起的分布式编码、立体视编码、多视编码、视觉编码等等
11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100
00001001010100111110101010000011101010110100010001110101000110010110111010010011001000100111100100000111
Semantics S Code (Form) X
?
Rule-based, Knowledge-based, Top-down, Parsing (text)
Advantages: Deliberative behaviors (AI expert systems: decision making, diagnosis, design,…) Specific domain, Small-size
Disadvantages: Perception, Common sense, Nature language,..
Uncertainty (exception, ambiguity, vague,..)
Detector (检测子) Semantically meaningful primitives
Text: word-sentence-chapter-
Image: subpart-part-object-
There is no clear boundary among parts
Segmentation Problem
Descriptor (描述子) Structural uncertaintyK. S. Fu, Syntactic pattern recognition, New York: Prentice-Hall, 1974
D. Marr, Vision, New York: Freeman, 1982
图像分割(分词)
Where is the object ?
What is the object ?
Chicken orEgg ?
结构分析Structural analysis, Rule-based,
Syntax
11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100
Image: Part, Object, … Text: Words, Sentences, …
• Uncertainty problem
• Scalability
Syntactic Analysis Faced Difficulty !
The basic properties among different quotient spaces such as the falsity preserving, the truth preserving properties are discussed. There are three quotient-space model construction approaches.
四、概率统计模型
Image (text) Classification:
Categories
Low level and local features (words)
Computer easily detectable but less
semantically meaningful
Colors, Textures, Bag of words
不确定性处理 Uncertainty Management
• How to deal with uncertainty ?
• A probabilistic information processing model
Regularity:
Probabilistic distribution
Examples:
Caltech 101, 25 objects
R. O. Duda & P. E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973
* * * * * * * * * * * * ** * * ** * · · · · ·
··· · ·· ·· ·· · ······· ·· · · · · ·
词袋法 Bag of (Visual) Words
• Defined in image patches (2005-06)
• Descriptors extracted around interest points
(2002-2004)
• Edge contours (2005-06)
• Regions (2005-06)
检测子与描述子 Detector & Descriptor
Kadir salience
region (points)
Histograms of Oriented Gradients (HOG)
-72 dimensionZuo Yuanyuan (2010-)
1
1 arg min( ( , ))1( )
0
n iv V
i
if w D v rCB w
n otherwise
CB-Codebook (Histograms)
n-the number of regions in an image
ri-an image region i
D(w,ri)-the distance between a codeword w
and region ri
V-vocabulary
数据空间的稀疏结构 - Sparsity
高维空间中的低维结构 -low-dimensional structure of high-dimensional space
Objects Precision
Airplane 0.947
Car-side 0.987
Dalmatian 0.733
Faces 0.760
Leopard 0.827
Pagoda 0.760
Stop-sign 0.800
Windsor-chair 0.787 Data Structure
优化yAxtosubjectxx
11 minargˆ
211 minargˆ yAxtosubjectxx
Sparse representation in sample space
L1 norm
图像库
实验结果
数据的空间结构(非稀疏性)
数据驱动法( Data-driven )
-learning from web data
Germany
Speech Recognition
Speech Synthesis
Text1Translation
Service
Text2
English
Learning from Data
(probabilistic models)
Annotation
Microsoft Research Asia
概率方法的基本缺陷• The semantic gap between low-level local features and high-level global concepts
Less semantically meaningful features: colors or their distribution (histogram), gray-values or their distribution, visual words (descriptors from interest points), image patches, image regions, edge, …
• Lack of structural knowledge
Generalization capacity
Information processing without understanding
五、新的研究方向
Sender
Reader
X X
S Uncertainty F (W,D)
句法分析的复苏Holistic (Gestalt), Probability, Inference
Information Structure Analysis
• Syntactic + Probabilistic
• Data-driven + Knowledge-driven
• Bottom-up + Top-down
• Part-based (Shape-based)
信息结构 Information Structure
The basic properties among different quotient spaces such as the falsity preserving, the truth preserving properties are discussed. There are three quotient-space model construction approaches.
数字视频编码技术发展至今已有半个世纪的历史,已取得很大的进展。从五十年代的差分预测编码,到七十年代的变换编码、基于块的运动预测编码,直到如今兴起的分布式编码、立体视编码、多视编码、视觉编码等等
11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100
00001001010100111110101010000011101010110100010001110101000110010110111010010011001000100111100100000111
Semantics Code (Holistic)
?
信息结构分析
History (Structural Analysis)(1) LinguisticsInformation Structure-Syntactic representationInformation packaging, Building the semanticsFocus, Topic, background, comment,…
• M. A. K. Halliday (1967) Notes on transitivity and theme in English, Part II, Journal of Linguistics 3:199-244• W. L. Chafe, Language and consciousness, Language 50:111-133
(2) Psychology Structural Information Theory• Coding: descriptive complexities_ frequencies of occurrence (Shannon)• Information Structure: Formalization of visual regularity: iteration, symmetry, and alternation
E. L. J. Leeuwenberg (1968) Structural information of visual patterns: an efficient coding system in perception, The Hague: MoutonE.L. J. Leeuwenberg (1969), Quantitative specification of information in sequential patterns, Psychological Review, 76, 216-220
(3) Information Science Algorithmic Information Theory Kolmogorov Complexity 16 bits 1111111111111111 0101101001011010 1001110111010111 1100010001001011
R. J. Solomonoff (1964), A formal theory of inductive inference,
Information & Control, V.7, No.1, pp1-22, No.2, pp. 224-254
A. N. Kolmogorov (1965), Three approaches to the definition of
the quantity of information, Problems of Information Transmission, No. 1, pp. 3-11
信息结构的挖掘与利用• Kolmogorov complexity
• Related data structures
low-dimensional structure in high-dimensional
data space
• Under the probabilistic (structure) framework
• Learning from human being
(1) Kolmogorov Complexity
min : ( ) ,( )
p f p x if x ran fK x
otherwise
The absolute measure of information content
in an individual finite object
Algorithmic information theory
The minimum description length
信息距离 Information Distance
max ( ), ( )( , )
max ( ), ( )
K x y K y xd x y
K x K y
K(.) is non-computable
C. Bennett , et al, Information distance, IEEE Trans. on Information Theory, vol.44, no.4, 1998, pp.1407-1423
min ( ), ( )( , )
min ( ), ( )
K x y K y xd x y
K x K y
A statistical approach to Kolgomorov complexity
(information distance)
f(x)-frequency
N-total number
( , ) ( , )
log ( , ) max log ( ),log ( )
log min log ( ),log( )
d x y NSD x y
f x y f x f y
N f x y
近似计算方法
语义距离与形式距离 Semantic & Formal Distance
max ( ), ( )( , )
max ( ), ( )
C x y C y xCDM x y
C x C y
CDM-Compressed Distance Measure (ASCII)
NSD-Normalized Statistical Distance
Xian Zhang, Yu Hao, et al, New information distance measure and its application in question answering system, J. Comput. Sci. & Tech. 23(4), 557-572, 2008
两篇文本之间的信息距离
1 1 11 12 1
2 2 21 22 2
, ,...,
, ,...,
n
m
T W w w w
T W w w w
Text representation: a bag of words
max 1 2 1 2 2 1
1 2 1 2
2 1 2 1
( , ) max ( ), ( )
( ) ( \ )
( ) ( \ )
( ) ( ), ( ) log ( )
i ji j
i ji j
w W
D T T K T T K T T
K T T K w w
K T T K w w
K W K w k w f w
文本(图像)检索
1
2
3
...
n
t
t
t
t
Image-Text
information distance
Web. . . .
. . .
.
. . .
.
. . .
.
. . .
.
. . .
.
. . . .
. . .
.
. . .
.
. . .
.
. . .
.
. . .
.
1
2
.
n
i
i
i
(2) 数据结构 -low-dimensional structure of high-dimensional space
Sparse Representation1 1
arg minx x subject to
Ax y or
BAx z
A: mn matrixx: sample space, n-dimensiony: pixel, m-dimensionz: feature space, d-dimension Yi Ma, UIUC, USA
(3) 扁平的结构模型 -Image Region Annotation: horse, sky, mountain, grass, tree
Yuan Jinhui (2008-)
区域自适应的网格划分 Region-adaptive Grid Partition
,,
* arg max ( ) arg max ( ) ( )
1( , ) exp ( ( ), ) ( , )
( , ) ( ) ( )
z z
i i i i j i ji i j
z p z I p I z p z
p I z f I x z g z zZ
p I z p I z p z
I-dataxi-image position
z-state (feature)g(zi,zj)-the probabilistic region-basedconstraints (co-occurrence)
不同的扁平结构模型
模型学习
n images, each image has mi=HV grids
( , ) ( , ), 1,2...,
( , ) ( , ), 1,2,...,
i i
j ji i i i i
x y x y i n
x y x y j m
(a) i.i.d generative model
(b) i.i.d. discriminative model
(c) 2-dimensional hidden Markov (2D HMM)
(d) Markov Random Field (MRF)
(e) Conditional Random Field (CRF)
标注配置
( , ), 1,2,...,i ix y i N
Given a training data,
MAP (maximal a posterior) : label configuration
1: 1:* arg max ( )m my P y x
For 2D HMM, MRF, CRF
using path limited Viterbi algorithm
Probabilistic distribution P Cs: labeling clique, C0: labeling and feature cliquey* the optimal label configuration
0
1: 1:
( , ) ( , )
1( , ) ( , ) ( , )
i j s k k
m m i j k k
y y C y x C
P x y y y y xZ
1:0( , ) ( , )
* arg max ( , ) ( , )m
i j s k k
i j k k
yy y C y x C
y y y y x
马尔科夫随机场模型
- MRF model
结构预测学习算法Learning Rules
Classification
Structural Prediction
Maximal Joint Likelihood Estimation
Naïve Bayesian Network
Hidden Markov Model (1966)1
Maximal Conditional Likelihood Estimation
Logistic Regression
Conditional Random Field (2001)2
Maximal Margin Learning
SVM Maximal Margin Markov Net (2003)3
Maximal Entropy Discrimination Learning
Maximal Entropy Discrimination Model
Maximal Entropy Discrimination Markov Net (2008)4
相关文献[1] L. E. Baum and T. Petrie. Statistical Inference for Probabilistic Functions of Finte State Markov Chains. The Annals of Mathematical Statistics, Vol. 37, No. 6, pp.1554-1563, 1966
[2] J. Lafferty et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of International Conference on Machine Learning (ICML), 2001
[3] B. Taskar et al., Max-Margin Markov Networks. Advances in Neural Information Processing Systems (NIPS), 2003
[4] J. Zhu et al., Laplace Maximum Margin Markov Networks, In Proc. of International Conference on Machine Learning (ICML), 2008
实验设置• 4002 Corel images (384256 or 256384)
• 11 basic (region) concepts
• Features: color moment + wavelet
• 5 models: 2 without structural knowledge
(GMM, SVM)
3 with structural knowledge
(HMM*, RMF*, CRF*)
图像区域标注类别
不同模型的比较
GMM: Gaussian Mixture Model (30 components)
SVM: Support Vector Machine
Gaussian kennel, one-against-one
HMM: Hidden Markov Model
RMF: Random Markov Field
CRF: Conditional Random Field
Limited Path Viterbi Algorithm
实验结果
(4) Latent Hierarchical Model
Web Page Extraction Name, Image, Price, Description, etc.
Web Page
Data Record
Image
Name Desc Price Note
Desc
Note
Data Record
Image
Desc Desc Name Desc Price Note Note
Given Data Record
Hierarchy Computational
efficiency Long-range
dependency Joint extraction
{image}
{name, price}
{name}
{price}
{name}
{price}
{image}
{name, price}
{desc}
{Head}
{Tail}{Info Block}
{Repeat block}
{Note}
{Note}
Zhu Jun (2008- )
Experiment Web page extraction Name, Photo, Price ,
Description Models Multi-lager CRFs, Multi-layer
M^3N, PoMEN, Partially observed HCRFs
Data set: 37 TemplateTraining: 185 (5/per template)
pages, or 1585 data records
Testing: 370 (10/per template) pages, or 3391 data records
Web Page
Data Record
Image
Name Desc Price Note
Desc
Note
Data Record
Image
Desc Desc Name Desc Price Note Note
Results
23/4/22 博士论文 预答辩 55
Performances: Avg F1:
o avg F1 over all attributes
Block instance accuracy:
o % of records whose Name, Image, and Price are correct
每个属性的性能
(5) 人脑视觉机制
Top-down feedback
Top-down feedback
Local connection
Data-driven V1 V2 … IT
High-level
Prior-knowledge
自底向上与自顶向下
• 114 basic functions
Images 1212 pixels
• HOG
视觉基元 Visual Primitive Elements
稀疏编码( Spare Coding )
( , ) ( , )i ii
I x y a x y2
( , ) ( , ) ii i
xy i i
aE I x y a x y S
( , )I x y
( , )i x y
ia
Images
Basic functions
Coefficients
A non-linear function2log(1 )S x or x
Object Recognitionwith sparse, localizedfeatures
MIT-CSAIL-TR-2006-028 T. Serre
HMAX-sum + max
多学科交叉研究Center for Cognitive and Neural Computation
Tsinghua University, Beijing
• Computational Neuroscience
• System Neuroscience
• Intelligent Technology and Systems
• Neural Information and Brain-computer interface
• Learning and Memory
• Cognitive Psychology
谢谢!