新的挑战 - 基于内容的信息处理

新的挑战 - 基于内容的信息处理

张钹清华信息科学与技术国家实验室智能技术与系统国家重点实验室

清华大学信息学院、计算机系

一、经典信息论Classical Information

Theory

Semantics vs. Information Processing

• Frequently the messages have meaning: that is

they refer to or are correlated according to some

system with certain physical or conceptual entities.

• These semantic aspects of communication

irrelevant to the engineering problem.

C. E. Shannon, A mathematical theory of communication, 1948

经典通讯模型

• Communication Model:

Sender

Receiver

(Markov) Stochastic Process

-C. E. Shannon

( ) ( ) ( )P x P y x P x y

图灵计算理论Turing Computation Theory

Deterministic Turing Machine (1937)

Computability

Algorithms

Computational Complexity

P Read-write Head

tape -4 -3 -2 -1 0 +1

QFinite State Controller

经典信息论下的信息处理Information Processing

The conversion of latent information intomanifest information _C. E. Shannon

X -p(x) Y -

Dissipation Equivocation

Transformation=equivocation-dissipation

Uncertainty (error, noisy, deformation,…)

( )p x y

经典信息论在文本、图像处理中的应用

The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel

• Communication: coding, data

compression, noisy-channel coding,..• Text: editor, compression, spell and syntax correction,…• Image: editor, lossless compression, noise suppression, enhancement, …

二、基于内容（语义）的信息处理

Information (Web) Age

• Complex (variety of ) information:

text, image, speech, video,…

• A huge amount of data

• Man-Machine Interaction

Content (Semantics) based Information Processing

Information Retrieval, Classification, Summary,

Recognition, Understanding,..

Computer

Input

(Encoding)

Output

(Decoding)

Users

Codes Form

Content

从处理形式（ Form ）到与内容（ Semantics）相关的处理

Natural Man-Machine Interaction

Computer Network

三、句法分析（ Syntact Analysis ） Holistic Coding (Gestalt)

The basic properties among different quotient spaces such as the falsity preserving, the truth preserving properties are discussed. There are three quotient-space model construction approaches.

数字视频编码技术发展至今已有半个世纪的历史，已取得很大的进展。从五十年代的差分预测编码，到七十年代的变换编码、基于块的运动预测编码，直到如今兴起的分布式编码、立体视编码、多视编码、视觉编码等等

11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100

00001001010100111110101010000011101010110100010001110101000110010110111010010011001000100111100100000111

Semantics S Code (Form) X

?

Rule-based, Knowledge-based, Top-down, Parsing (text)

Advantages: Deliberative behaviors (AI expert systems: decision making, diagnosis, design,…) Specific domain, Small-size

Disadvantages: Perception, Common sense, Nature language,..

Uncertainty (exception, ambiguity, vague,..)

Detector （检测子） Semantically meaningful primitives

Text: word-sentence-chapter-

Image: subpart-part-object-

There is no clear boundary among parts

Segmentation Problem

Descriptor （描述子） Structural uncertaintyK. S. Fu, Syntactic pattern recognition, New York: Prentice-Hall, 1974

D. Marr, Vision, New York: Freeman, 1982

图像分割（分词）

Where is the object ?

What is the object ?

Chicken orEgg ?

结构分析Structural analysis, Rule-based,

Syntax

11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100

Image: Part, Object, … Text: Words, Sentences, …

• Uncertainty problem

• Scalability

Syntactic Analysis Faced Difficulty !


四、概率统计模型

Image (text) Classification:

Categories

Low level and local features (words)

Computer easily detectable but less

semantically meaningful

Colors, Textures, Bag of words

不确定性处理 Uncertainty Management

• How to deal with uncertainty ?

• A probabilistic information processing model

Regularity:

Probabilistic distribution

Examples:

Caltech 101, 25 objects

R. O. Duda & P. E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973

* * * * * * * * * * * * ** * * ** * · · · · ·

··· · ·· ·· ·· · ······· ·· · · · · ·

词袋法 Bag of (Visual) Words

• Defined in image patches (2005-06)

• Descriptors extracted around interest points

(2002-2004)

• Edge contours (2005-06)

• Regions (2005-06)

检测子与描述子 Detector & Descriptor

Kadir salience

region (points)

Histograms of Oriented Gradients (HOG)

-72 dimensionZuo Yuanyuan (2010-)

1

1 arg min( ( , ))1( )

0

n iv V

i

if w D v rCB w

n otherwise

CB-Codebook (Histograms)

n-the number of regions in an image

ri-an image region i

D(w,ri)-the distance between a codeword w

and region ri

V-vocabulary

数据空间的稀疏结构 - Sparsity

高维空间中的低维结构 -low-dimensional structure of high-dimensional space

Objects Precision

Airplane 0.947

Car-side 0.987

Dalmatian 0.733

Faces 0.760

Leopard 0.827

Pagoda 0.760

Stop-sign 0.800

Windsor-chair 0.787 Data Structure

优化yAxtosubjectxx

11 minargˆ

211 minargˆ yAxtosubjectxx

Sparse representation in sample space

L1 norm

图像库

实验结果

数据的空间结构（非稀疏性）

数据驱动法（ Data-driven ）

-learning from web data

Germany

Speech Recognition

Speech Synthesis

Text1Translation

Service

Text2

English

Learning from Data

(probabilistic models)

Annotation

Microsoft Research Asia

概率方法的基本缺陷• The semantic gap between low-level local features and high-level global concepts

Less semantically meaningful features: colors or their distribution (histogram), gray-values or their distribution, visual words (descriptors from interest points), image patches, image regions, edge, …

• Lack of structural knowledge

Generalization capacity

Information processing without understanding

五、新的研究方向

Sender

Reader

X X

S Uncertainty F (W,D)

句法分析的复苏Holistic (Gestalt), Probability, Inference

Information Structure Analysis

• Syntactic + Probabilistic

• Data-driven + Knowledge-driven

• Bottom-up + Top-down

• Part-based (Shape-based)

信息结构 Information Structure


数字视频编码技术发展至今已有半个世纪的历史，已取得很大的进展。从五十年代的差分预测编码，到七十年代的变换编码、基于块的运动预测编码，直到如今兴起的分布式编码、立体视编码、多视编码、视觉编码等等

11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100

00001001010100111110101010000011101010110100010001110101000110010110111010010011001000100111100100000111

Semantics Code (Holistic)

?

信息结构分析

History (Structural Analysis)(1) LinguisticsInformation Structure-Syntactic representationInformation packaging, Building the semanticsFocus, Topic, background, comment,…

• M. A. K. Halliday (1967) Notes on transitivity and theme in English, Part II, Journal of Linguistics 3:199-244• W. L. Chafe, Language and consciousness, Language 50:111-133

(2) Psychology Structural Information Theory• Coding: descriptive complexities_ frequencies of occurrence (Shannon)• Information Structure: Formalization of visual regularity: iteration, symmetry, and alternation

E. L. J. Leeuwenberg (1968) Structural information of visual patterns: an efficient coding system in perception, The Hague: MoutonE.L. J. Leeuwenberg (1969), Quantitative specification of information in sequential patterns, Psychological Review, 76, 216-220

(3) Information Science Algorithmic Information Theory Kolmogorov Complexity 16 bits 1111111111111111 0101101001011010 1001110111010111 1100010001001011

R. J. Solomonoff (1964), A formal theory of inductive inference,

Information & Control, V.7, No.1, pp1-22, No.2, pp. 224-254

A. N. Kolmogorov (1965), Three approaches to the definition of

the quantity of information, Problems of Information Transmission, No. 1, pp. 3-11

信息结构的挖掘与利用• Kolmogorov complexity

• Related data structures

low-dimensional structure in high-dimensional

data space

• Under the probabilistic (structure) framework

• Learning from human being

(1) Kolmogorov Complexity

min : ( ) ,( )

p f p x if x ran fK x

otherwise

The absolute measure of information content

in an individual finite object

Algorithmic information theory

The minimum description length

信息距离 Information Distance

max ( ), ( )( , )

max ( ), ( )

K x y K y xd x y

K x K y

K(.) is non-computable

C. Bennett , et al, Information distance, IEEE Trans. on Information Theory, vol.44, no.4, 1998, pp.1407-1423

min ( ), ( )( , )

min ( ), ( )

K x y K y xd x y

K x K y

A statistical approach to Kolgomorov complexity

(information distance)

f(x)-frequency

N-total number

( , ) ( , )

log ( , ) max log ( ),log ( )

log min log ( ),log( )

d x y NSD x y

f x y f x f y

N f x y

近似计算方法

语义距离与形式距离 Semantic & Formal Distance

max ( ), ( )( , )

max ( ), ( )

C x y C y xCDM x y

C x C y

CDM-Compressed Distance Measure (ASCII)

NSD-Normalized Statistical Distance

Xian Zhang, Yu Hao, et al, New information distance measure and its application in question answering system, J. Comput. Sci. & Tech. 23(4), 557-572, 2008

两篇文本之间的信息距离

1 1 11 12 1

2 2 21 22 2

, ,...,

, ,...,

n

m

T W w w w

T W w w w

Text representation: a bag of words

max 1 2 1 2 2 1

1 2 1 2

2 1 2 1

( , ) max ( ), ( )

( ) ( \ )

( ) ( \ )

( ) ( ), ( ) log ( )

i ji j

i ji j

w W

D T T K T T K T T

K T T K w w

K T T K w w

K W K w k w f w

文本（图像）检索

1

2

3

...

n

t

t

t

t

Image-Text

information distance

Web. . . .

. . .

.

. . .

.

. . .

.

. . .

.

. . .

.

. . . .

. . .

.

. . .

.

. . .

.

. . .

.

. . .

.

1

2

.

n

i

i

i

(2) 数据结构 -low-dimensional structure of high-dimensional space

Sparse Representation1 1

arg minx x subject to

Ax y or

BAx z

A: mn matrixx: sample space, n-dimensiony: pixel, m-dimensionz: feature space, d-dimension Yi Ma, UIUC, USA

(3) 扁平的结构模型 -Image Region Annotation: horse, sky, mountain, grass, tree

Yuan Jinhui (2008-)

区域自适应的网格划分 Region-adaptive Grid Partition

,,

* arg max ( ) arg max ( ) ( )

1( , ) exp ( ( ), ) ( , )

( , ) ( ) ( )

z z

i i i i j i ji i j

z p z I p I z p z

p I z f I x z g z zZ

p I z p I z p z

I-dataxi-image position

z-state (feature)g(zi,zj)-the probabilistic region-basedconstraints (co-occurrence)

不同的扁平结构模型

模型学习

n images, each image has mi=HV grids

( , ) ( , ), 1,2...,

( , ) ( , ), 1,2,...,

i i

j ji i i i i

x y x y i n

x y x y j m

(a) i.i.d generative model

(b) i.i.d. discriminative model

(c) 2-dimensional hidden Markov (2D HMM)

(d) Markov Random Field (MRF)

(e) Conditional Random Field (CRF)

标注配置

( , ), 1,2,...,i ix y i N

Given a training data,

MAP (maximal a posterior) : label configuration

1: 1:* arg max ( )m my P y x

For 2D HMM, MRF, CRF

using path limited Viterbi algorithm

Probabilistic distribution P Cs: labeling clique, C0: labeling and feature cliquey* the optimal label configuration

0

1: 1:

( , ) ( , )

1( , ) ( , ) ( , )

i j s k k

m m i j k k

y y C y x C

P x y y y y xZ

1:0( , ) ( , )

* arg max ( , ) ( , )m

i j s k k

i j k k

yy y C y x C

y y y y x

马尔科夫随机场模型

－ MRF model

结构预测学习算法Learning Rules

Classification

Structural Prediction

Maximal Joint Likelihood Estimation

Naïve Bayesian Network

Hidden Markov Model (1966)1

Maximal Conditional Likelihood Estimation

Logistic Regression

Conditional Random Field (2001)2

Maximal Margin Learning

SVM Maximal Margin Markov Net (2003)3

Maximal Entropy Discrimination Learning

Maximal Entropy Discrimination Model

Maximal Entropy Discrimination Markov Net (2008)4

相关文献[1] L. E. Baum and T. Petrie. Statistical Inference for Probabilistic Functions of Finte State Markov Chains. The Annals of Mathematical Statistics, Vol. 37, No. 6, pp.1554-1563, 1966

[2] J. Lafferty et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of International Conference on Machine Learning (ICML), 2001

[3] B. Taskar et al., Max-Margin Markov Networks. Advances in Neural Information Processing Systems (NIPS), 2003

[4] J. Zhu et al., Laplace Maximum Margin Markov Networks, In Proc. of International Conference on Machine Learning (ICML), 2008

实验设置• 4002 Corel images (384256 or 256384)

• 11 basic (region) concepts

• Features: color moment + wavelet

• 5 models: 2 without structural knowledge

(GMM, SVM)

3 with structural knowledge

(HMM*, RMF*, CRF*)

图像区域标注类别

不同模型的比较

GMM: Gaussian Mixture Model (30 components)

SVM: Support Vector Machine

Gaussian kennel, one-against-one

HMM: Hidden Markov Model

RMF: Random Markov Field

CRF: Conditional Random Field

Limited Path Viterbi Algorithm

实验结果

(4) Latent Hierarchical Model

Web Page Extraction Name, Image, Price, Description, etc.

Web Page

Data Record

Image

Name Desc Price Note

Desc

Note

Data Record

Image

Desc Desc Name Desc Price Note Note

Given Data Record

Hierarchy Computational

efficiency Long-range

dependency Joint extraction

{image}

{name, price}

{name}

{price}

{name}

{price}

{image}

{name, price}

{desc}

{Head}

{Tail}{Info Block}

{Repeat block}

{Note}

{Note}

Zhu Jun (2008- )

Experiment Web page extraction Name, Photo, Price ,

Description Models Multi-lager CRFs, Multi-layer

M^3N, PoMEN, Partially observed HCRFs

Data set: 37 TemplateTraining: 185 (5/per template)

pages, or 1585 data records

Testing: 370 (10/per template) pages, or 3391 data records

Web Page

Data Record

Image

Name Desc Price Note

Desc

Note

Data Record

Image

Desc Desc Name Desc Price Note Note

Results

23/4/22 博士论文预答辩 55

Performances: Avg F1:

o avg F1 over all attributes

Block instance accuracy:

o % of records whose Name, Image, and Price are correct

每个属性的性能

(5) 人脑视觉机制

Top-down feedback

Top-down feedback

Local connection

Data-driven V1 V2 … IT

High-level

Prior-knowledge

自底向上与自顶向下

• 114 basic functions

Images 1212 pixels

• HOG

视觉基元 Visual Primitive Elements

稀疏编码（ Spare Coding ）

( , ) ( , )i ii

I x y a x y2

( , ) ( , ) ii i

xy i i

aE I x y a x y S

( , )I x y

( , )i x y

ia

Images

Basic functions

Coefficients

A non-linear function2log(1 )S x or x

Object Recognitionwith sparse, localizedfeatures

MIT-CSAIL-TR-2006-028 T. Serre

HMAX-sum + max

多学科交叉研究Center for Cognitive and Neural Computation

Tsinghua University, Beijing

• Computational Neuroscience

• System Neuroscience

• Intelligent Technology and Systems

• Neural Information and Brain-computer interface

• Learning and Memory

• Cognitive Psychology

谢谢！

Documents

新的挑战 - 基于内容的信息处理