Empowering visual categorization with the GPU

Empowering visual categorization with the GPU

Present by 陳群元

我是強壯 !

outline

我是強壯 !

Introduction Overview of visual categorization

Image feature extraction Category model learning Test image classification

GPU accelerated categorization Experimental setup Results

introduction

我是強壯 !

Use GPU accelerate the quantization and classification components of a visual categorization architecture

The algorithms and their implementations should push the state-of-the-art in categorization accuracy.

Visual categorization must be decomposable into components to locate bottlenecks.

Given the same input, implementations of a component on various hardware architectures must give the same output.

overview

我是強壯 !

我是強壯 !

Visual categorization system

我是強壯 !

Image Feature Extraction Point Sampling Strategy Descriptor Computation Bag-of-Words

Category Model Learning Test Image Classification


我是強壯 !



Point sampling strategy

我是強壯 !

Dense sampling Typically, around10,000 points are sampled

per image Salient point method

Harris-Laplace salient point detector [29] Difference-of-Gaussians detector [28]


我是強壯 !



Descriptors

我是強壯 !

SIFT descriptor ->128 dim 10 frames per second for 640x480

images(GPU) SURF descriptor

100 frames per second for 640x480 images(GPU) ColorSIFT descriptor ->384 dim

Triple of SIFT


我是強壯 !



Bag-of-words

我是強壯 !

Vector quantization is computationally the most expensive part of the bag-of-words model.

Bag -> images set Words->features

Bag-of-words

我是強壯 !

N descriptors of length d in an image codebook with m elements

O(ndm) per image A tree-based codebook

O(nd log(m))->real-time on the GPU [25].

我是強壯 !


我是強壯 !



Category model learning

我是強壯 !

precompute kernel function values kernel-based SVM algorithm

我是強壯 !

我是強壯 !

Support Vector Machines

Kernel Support Vector Machines


我是強壯 !



Test image classification

我是強壯 !

我是強壯 !

outline

我是強壯 !

Introduction Overview of visual categorization

Image feature extraction Category model learning Test image classification

GPU accelerated categorization Parallel Programming on the GPU and CPU GPU-Accelerated Vector Quantization GPU-Accelerated Kernel Value Precomputation

Experimental setup Results

Parallel Programming on the GPU and CPU

我是強壯 !

SIMD instructions perform the same operation on multiple data elements at the same time

我是強壯 !

GPU-Accelerated Vector Quantization

我是強壯 !

The most expensive computational step in vector quantization is the calculation of the distance matrix.(n*m)

A:n*d matrix with all image descriptors as rows

B:m*d matrix with all codebook elements as rows

GPU-Accelerated Vector Quantization(cont.)

我是強壯 !

GPU-Accelerated Vector Quantization(cont.)

我是強壯 !

Compute the dot products between all rows of A and B (line 7).

matrix multiplications are the building block for many algorithms highly optimized BLAS linear algebra libraries containing this operation exist for both the CPU and the GPU.

我是強壯 !

GPU-Accelerated Kernel Value Precomputation

我是強壯 !

To compute kernel function values, we use the kernel function based on the distance

distance between feature vectors F and F’

kernel function based on this distance

GPU-Accelerated Kernel Value Precomputation(cont.)

我是強壯 !

multiple input features

For kernel value precomputation, memory usage is an important problem. for a dataset with 50, 000 images, the input data

is 12 GB and the output data is 19 GB to avoid holding all data in memory

simultaneously. We divide the processing into evenly sized chunks.(1024*1024)

GPU-Accelerated Kernel Value Precomputation(cont.)

我是強壯 !

EXPERIMENTAL SETUP

我是強壯 !

Experiment 1: Vector Quantization Speed CPU implementation is SIMD-optimized. codebook of size m = 4, 000 20, 000 descriptors per image descriptor lengths of d = 128 (SIFT) and d = 384

(ColorSIFT). Experiment 2: Kernel Value Precomputation Speed

chosen the large Mediamill Challenge training set of 30, 993 frames

Experiment 3: Visual Categorization Throughput comparison is made between the quad-core Core i7 920

CPU (2.66GHz) and the Gefore GTX260 GPU (27 cores).

Results

我是強壯 !

Experiment 1: Vector Quantization Speed Experiment 2: Kernel Value Precomputation

Speed Experiment 3: Visual Categorization

Throughput

Results

我是強壯 !



Throughput

Vector Quantization Speed(SIFT)

我是強壯 !

Vector Quantization Speed(ColorSIFT)

我是強壯 !

Results

我是強壯 !



Throughput

Kernel Value Precomputation Speed

我是強壯 !

Results

我是強壯 !



Throughput

Visual Categorization Throughput

我是強壯 !

Other applications

我是強壯 !

Application 1: k-means Clustering Application 2: Bag-of-Words Model for Text

Retrieval Application 3: Multi-Frame Processing for

Video Retrieval

Conclusions

我是強壯 !

This paper provides an efficiency analysis of a state-of-the art visual categorization pipeline based on the bag-of-words model.

two large bottlenecks were identified: the vector quantization step in the image feature extraction and the kernel value computation in the category classification

Compared to a multi-threaded CPU implementation on a quad-core CPU, the GPU is 4.8 times faster.

The end

我是強壯 !

Thank you!

Documents

Empowering visual categorization with the GPU