16
Product Quantization for Nearest Neighbor Search Author:xzx Date:2012.11.13

Product quantization for nearest neighbor search-report

Embed Size (px)

DESCRIPTION

用于近似最近邻查找的积量化算法

Citation preview

Page 1: Product quantization for nearest neighbor search-report

Product Quantization for Nearest Neighbor Search

Author:xzx

Date:2012.11.13

Page 2: Product quantization for nearest neighbor search-report

• 主要想法:– 将原始数据向量分解成低维的子向量段

– 对子向量段分别量化

– 原始的向量可用子向量段的索引的笛卡尔积进行表示

– 向量间的距离,可以用表示的值进行估计

本文提出了一个积量化(product quantization)的方法来进行近似最

近邻的查找。

Page 3: Product quantization for nearest neighbor search-report

• 对原始向量直接处理,randomized KD-tree和层次k均值

(hierarchical k-means)(这两个方法都被FLANN(fast library for approximate nearest neighbor search)采用)。

• binary coding,如SH(spectral hashing),RBM(restricted Boltzmann machine),boosting和LSH等方法,它们对原始的

特征向量进行二值压缩。

现在的方法如KD-tree对于高维数据并不

是很有效,甚至已经退化为穷尽搜索

了。

解决的方法:近似最近邻搜索

Page 4: Product quantization for nearest neighbor search-report

• 主要有两点优点:

– 1.提出搜索结果的距离选择多样选择,而Binary coding 的方法结果选择的距离可选择性很少。

– 2.结果能对向量间的L2度量进行估计,这在eps近邻搜索中很有用。

本文所提出方法属于第一类。

Page 5: Product quantization for nearest neighbor search-report

12

34

5

12

34

5

12

34

5

codebook原始向量

量化之后

积量化

(product quantization)

Page 6: Product quantization for nearest neighbor search-report

Symmetric distance computation(SDC):

Asymmetric distance computation(ADC):

距离估计

(distance approximation)

Page 7: Product quantization for nearest neighbor search-report

距离估计

(distance approximation)

Symmetric distance computation(SDC):

Page 8: Product quantization for nearest neighbor search-report

Asymmetric distance computation(ADC):

Page 9: Product quantization for nearest neighbor search-report

距离估计偏差

(distance approximation error)

Page 10: Product quantization for nearest neighbor search-report

解决穷尽搜索

的问题

倒排文件(inverted file system),索引相同的点组成一

个队列。首先学习一个字典(codebook),进行粗分

类,通过k-means方法,进行初步量化,建

立倒排文件。

计算残差向量

Product Quantization

Video-google

Page 11: Product quantization for nearest neighbor search-report

实验系统结构

Page 12: Product quantization for nearest neighbor search-report

索引算法

Page 13: Product quantization for nearest neighbor search-report
Page 14: Product quantization for nearest neighbor search-report

实验

• 数据集• SIFT: Flickr-INRIA Holidays images.• GIST:100k images from tiny image set.

• 实验结果

Page 15: Product quantization for nearest neighbor search-report

• 本文提出一个近似近邻查找算法

• 能够计算出两个向量的L2距离的估计

• 结合倒排表,解决穷尽搜索的问题

• 在搜索性能与内存使用量上,较state of the art方法有更好的效果。

• 扩展性能很强。

结论

Page 16: Product quantization for nearest neighbor search-report

谢谢!