最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)

Nearest Neighbor Search(NNS)

NGUYEN ANH TUAN

Bachelor 4th yearThe Engineering Falcuty, the University of Tokyo

May 14, 2014

NGUYEN ANH TUAN Group study memo 1 / 45

自己紹介

2008年　来日

2009年～2012年　兵庫県明石高専卒論:「エッジ方向のヒストグラムを利用した電子透かしの手法」(対象:文章画像)

2012年　東大 2年に編入

好きなプログラミング言語:C/C++、Java、最近はRubyを触っています。

本日の流れ

1 概要問題の定義定義の解説空間と距離次元課題:次元の呪縛

関連問題k近傍法ボロノイ図近似最近傍探索

2 アルゴリズム線形探索kd木Locality Sensitive Hashingその他

3 最近の研究Vector QuantizationProduct Quantization最近傍探索との関連

4 まとめ

問題の定義

Nearest Neighbor Search(NNS problem)

D次元ユークリッド空間RDにおいて

ベクトル x =(x1, x2, . . . xD

)距離 d

√(x1 − y1)2 + (x2 − y2)2 + . . .+ (xD − yD)2

と表記すると、

入力:部分集合Y ⊂ RDとベクトル x ∈ RD

出力:Yの要素の中で xと “一番近い”要素NN(x)。

要するに、入力(Y,x

)が与えられた時、

NN(x) = argminy∈Y

d(x,y) (1)

を求める問題

解説

空間と距離

高次元と低次元

ユークリッド空間内の距離

ユークリッド距離→直感的、通常の距離

→Minkowski距離 dp(x,y) =( D∑i=1

|xi − yi|p) 1

→ p = 2の時、ユークリッド距離

p = 1の時、マンハッタン距離

dM(x,y) =D∑i=1

|xi − yi| (2)

Question: p > 2のMinkowski距離を利用するか?

|xi − yi|p) 1

dM(x,y) =D∑i=1

|xi − yi| (2)

|xi − yi|p) 1

dM(x,y) =D∑i=1

|xi − yi| (2)

Hamming Space

Hamming Space:H = {0, 1}の時、すべての長さDのビット列からなる空間をD次元Hamming Space HD

Hamming距離→二つのビット列の相違度

dH(x,y) =∣∣{i|xi ̸= yi}

∣∣ (3)

例: 00011と 00101の相違度が 2

Hamming Space

dH(x,y) =∣∣{i|xi ̸= yi}

∣∣ (3)

例: 00011と 00101の相違度が 2

Hamming Space

dH(x,y) =∣∣{i|xi ̸= yi}

∣∣ (3)

例: 00011と 00101の相違度が 2

解説

空間と距離

参考文献 1

定理 1(引用)

Let F is an arbitrary distribution of n points (from a database of Nuniformly distributed points), the distance function is an k-normMinkowski distance function inside an Euclidean space RD. Therefore,

Ck ≤ limD→+∞

[dmax − dmin

D1/k−1/2

]≤ (N − 1)Ck (4)

, where dmax, dmin are the farthest and nearest distance from a point in Fto the query point, respectively. Ck is a constant value that depends on k.

1A. Hinneburg et al , “What Is the Nearest Neighbor in HighDimensional Spaces?”, Proceedings of the 26th International Conferenceon Very Large Data Bases, pp.506-515, 2000

k > 2のMinkowski距離 dk(x,y)を利用した時、

空間の次元Dを増加させると、

dmax − dminが 0に収束

→高次元で k > 2のMinkowski距離の時、データセットの中の最短距離と最長距離はほぼ一致してしまう

次元の呪縛

ざっくり見ると、普通な問題

一番簡単な手法:線形探索→ 時間計算量O(nD)

次元の一定の値を超えると、いかなるアルゴリズムでも線形探索 (全探索)と等価である。

次元の呪縛

ざっくり見ると、普通な問題

一番簡単な手法:線形探索→ 時間計算量O(nD)

次元の一定の値を超えると、いかなるアルゴリズムでも線形探索 (全探索)と等価である。

最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)

Technology

NASKAH PUBLIKASI IMPLEMENTASI K-NEAREST NEIGHBOR …eprints.uty.ac.id/2633/1/Naskah Publikasi-Rusma Eko Fiddy Rizarta... · ii naskah publikasi implementasi k-nearest neighbor dan

Schnelle k-Nearest-Neighbor Algorithmen auf der Basis von Simulated Annealing Seminar: Ausgewählte Kapitel des Softcomputing Dezember 2007

Vehicle Routing & Scheduling: Part 2 Multiple Routes Construction Heuristics –Sweep –Nearest Neighbor, Nearest Insertion, Savings –Cluster Methods Improvement

KLASIFIKASI AKREDITASI MENGGUNAKAN METODE K ...repository.usd.ac.id/37869/2/165314101_full.pdfi KLASIFIKASI AKREDITASI MENGGUNAKAN METODE K-NEAREST NEIGHBOR (KNN) PADA DATA SEKOLAH

ARTIKEL APLIKASI KLASIFIKASI TANAMAN REMPAH …simki.unpkediri.ac.id/mahasiswa/file_artikel/2017/a74a06047eb4b9e42d... · Nearest Neighbor cukup akurat dalam mengklasifikasi jenis

기계학습 - nlp.jbnu.ac.krnlp.jbnu.ac.kr/AI2019/slides/ch04-1.pdf · 분류 분류기학습알고리즘 결정트리(decision tree) 알고리즘 K-근접이웃(K-nearest neighbor,

Penerapan Metode K-Nearest Neighbor Untuk Klasifikasi

IMPLEMENTASI METODE K - Nearest Neighbor DALAM …lib.unnes.ac.id/36881/1/5302412114_Optimized.pdf · vi SARI ATAU RINGKASAN Saputro, Bayu. 2019. Implementasi Metode K-Nearest Neighbor

PERBANDINGAN METOD DAN K-NEAREST NEIGHBOR PAD

Non-Parameter Estimation 主講人：虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor

Decreasing Radius K-Nearest Neighbor Search using Mapping

Implementasi Algoritma K-Nearest Neighbor Sebagai ......digunakan untuk penentuan resiko kredit kendaraan bermotor (Leidiyana, 2013). K-nearest neighbor digunakan untuk memprediksi

K Nearest Neighbor (K NN)

Product quantization for nearest neighbor search-report

Fast Nearest-neighbor Search in Disk-resident Graphs

Geostaticka Analiza Rpv - Metode Kriging, Inverse Distance, Nearest Neighbor i Moving Average

Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Reconhecimento E Classificação De Fácies Geológicas ...repositorio.unicamp.br/bitstream/REPOSIP/265514/1/Sanchetta_Alexandre... · K-NN method (K-nearest Neighbor) applied on

IMPLEMENTASI NEAREST NEIGHBOR PADA DATA KATEGORIK …repository.its.ac.id/43372/1/5113100062-Undergraduate_Theses.pdf · IMPLEMENTASI NEAREST NEIGHBOR PADA DATA KATEGORIK DENGAN PEMBOBOTAN

Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays