ADAPTIVE INFORMATION RETRIEVAL 2007 년 2 월 10 일 인공지능 연구실 문홍구 Text: Finding out about Page : 252~291

ADAPTIVE INFORMATION RETRIEVAL

2007 년 2 월 10 일인공지능 연구실 문홍구Text: Finding out about

Page : 252~291

2

Background

In 1960’s,“deep understanding” of text promised by AI/NLP methods made IR’s statistical character“shallow”

Now : Both machine learning and corpus-based linguistics share very similar statistical methods with IR

3

Background

Training against Manual Indices

-The manual classification of documents to categories can be used as training data in the context of supervised learning.

-Manually constructed representations provide a kind of upper bound on what we can hope our automatic learning techniques can build.

4

Background

Source of Feedback※Supervised Learning

- The learning system observes a labeled training set consisting of (feature, label)pairs, denoted by{(x1,y1),…(xn, yn)}

- The goal is to predict the label y for any new input with feature x.

※Reinforcement Learning

- The learning system repeatedly observes the environment “x” performs an action “a” and receives a reward “r”.

- The goal is to choose the actions that maximize the future rewards.

5

Background

<Browsing across Queries in Same Session>

6

Building Hypotheses about Documents

Feature Selection- Obvious features = keywords- Documents characterized by large(sparse) vectors. => “irrelevant attributes abound”-Distribution-Based Selection => Mutual information

7

Building Hypotheses about Documents

Hypothesis Spaces

8

Learning Which Documents to Route

Document Modifications due to RelFbk

9

Learning Which Documents to Route

Widrow-Hoff

- The Widrow-Hoff algorithm is the best-understood and principled approach to training a linear system to minimize this squared error loss [Widrow and Hoff, 1960].

10

Classification

Training a Classifier

11

Classification

Modeling Document

-Multivariate Bernoulli

Training a Classifier

12

Other Approaches to Classification

Nearest-Neighbor Matching- One of the most straight-forward ways to classify documents making a rote memory of the training set |T|, and retrieving those documents from |T| that are most similar to a new document to be classified.

Boolean Predicates- RIPPER system.

Covering Algorithm

13


Combining Classifiers

Combining Experts

- considered two experts

• set of words as features

• phrase extraction.

14


Hierarchic Classification

Hierarchic Classification

15

The Fields of Machine Learning

The field of machine learning has traditionally been divided into three sub-fields.

- Supervised Learning.

- Unsupervised Learning.

- Reinforcement Learning

16


Supervised Learning

- The learning system observes a labeled training set consisting of (feature, label) pairs, denoted by {(x1,y1),…,(xn, yn)}.

- The goal is to predict the label y for any new input with feature x.

17


Unsupervised Learning

- The learning system observes an unlabeled set of items, represented by their features{x1,……,xn}.

- The goal is to organize the items : Typical unsupervised learning tasks includes clustering that groups items into clusters; outlier detection which determines if a new item x is significantly different from items seen so far.

18


Reinforcement Learning

- The learning system repeatedly observes the environment “x,” performs an action “a,” and receives a reward “r.”

- The goal is to choose the actions that maximize the future rewards.

19


Reinforcement Learning- Example : Cart-Pole

- State : 카트의 속도 , 현재위치 , 폴의 각도- Reward : 폴이 떨어지면 -1, 유지하면 0- Goal : -1 이 먼 미래에 일어 나도록 하는것이 목적

20


http://video.google.com/videoplay?docid=8226600171334714429&q=cart+pole 시도회수가 증가할수록 Pole 이 넘어 지는 시간이 감소 .

21


Semi-supervised Learning

- In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate.

- Learning from a combination of both labeled and unlabeled data.

22


Semi-supervised Learning

※Supervised learning algorithms - require enough labeled training data to learn reasonably accurate classifiers.

※Unsupervised learning algorithms - Unsupervised learning algorithms are employed to

discover structure in unlabeled data.

※Semi-supervised learning algorithm - Semi-supervised learning algorithm allows taking

advantage of the strengths of both.

23

엔트로피

엔트로피 (Entropy or self-information)

- The average uncertainty of single random variable.

- 확률변수에서의 정보량- 확률변수에서의 불확실성의 평균 - 속성 ( : 정보가 없음 .)

))(

1(log)(log)()()( 2 XpExpxpXHpH

x

0)( XH0)( XH

24

결합 엔트로피 & 조건부 엔트로피

결합 엔트로피

- 두 값을 나열하는 평균에 필요로 하는 정보량 조건부 엔트로피

체인 룰

x y

YXpyxpYXH ),(log),(),(

x y

x y

x

xypyxp

xypxypxp

xXYHxpXYH

)|(log),(

)|(log)|()(

)|()()|(

),...,|(...)|()(),...,(

)|()(),(

111211

nnn XXXHXXHXHXXH

XYHXHYXH

25

Mutual Information

the amount of information one random variable contains about another measure of independence

– : two variables are independent– grows according to ...

– the degree of dependence

yx ypxp

yxpyxp

XYHYHYXHXHYXI

, )()(

),(log),(

)|()()|()();(

),( YXH

)|( YXH )|( XYH

)(XH )(YH

);( YXI

0);( YXI

26

Mutual Information

Conditional Mutual Information

Chain Rule

),|()|()|);(()|;( ZYXHZXHZYXIZYXI

n

iii

nnn

XXYXI

XXYXIYXIYXI

111

1111

),...,|;(

),...,|;(...);();(

27

SVM(Support Vector Machine)

1995 년 Vapnik 에 의해 개발 이진분류 (binary classification) 를 위하여 개발

- 이진분류 문제는 수집된 훈련 데이터를 이용해서 두 클래스를 분류하는 대상함수 (target function) 를 추정해 내는 과정

목적 : 학습자료로 주어진 n 차원의 벡터공간에서 분류공간간에 모든 점들 사이의 거리를 초대화하도록 만들어 하나의 평면을 구해내는 것을 뜻한다 . 이 선형 평면 분류 경계를 OSH(Optimal separating hyperplane) 라고한다 . OSH 에 가장 가까운 점들을 Support vector 이라고 부른다 .

N 차원의 OSH 는 n 차원 방향벡터 w 와 기준 벡터 b로 wx+b = 0 을 만족하는 점들의 집합으로 표현된다 .

28


선형 공간에서 Hyperplane

-H : OSH(Optimal separating hyperplane)

-H1, H2 : 2 개의 벡터 그룹의 영역을 보여주 는 하이퍼플레인

-H1 과 H2 에서 접한 3 개의 벡터가 서포트 벡터

29


SVM 을 이용한 스팸메일 구분 예