Download pptx - Image and Vision Computing 25 (2007) 1802–1813 Deformation tolerant generalized Hough transform for sketch-based image retrieval in complex scenes M. Anelli,

Image and Vision Computing 25 (2007) 18021813 Deformation tolerant generalized Hough transform for sketch-based image retrieval in complex scenes M. Anelli, L. Cinque, Enver Sangineto 1

Outline 1. Introduction 2. Methods 3. Results 4. Conclusion 2

Introduction 3

Introduction(1/4) In the last 1215 years the availability of digital visual information has grown very quickly. Content Based Image Retrieval (CBIR) is a research area whose aim is the development of tools for retrieval of visual information using its perceptual content. 4

Introduction(2/4) In Image Retrieval by Sketch the query is a stylized sketch drawn by the user in order to specify the shape features she is interested to find in the images within the systems database. The issue of inexact matching between the sketch and the images and the issue of segmentation are the two main problems which a sketch-based image retrieval system has to deal with. 5

Introduction(3/4) Most of the methods and techniques for shape- based image retrieval can be classified in three main categories: statistical techniques deformable template matching multiscale representations 6

Introduction(4/4) modified the GHT First of all, we spread the voting result in order to deal with small local deformations without increasing the whole asymptotic computational space and time complexity. Moreover, once the most likely position of the sketch in the image has been localized using the votes in the accumulator, shape segmentation is further verified. 7

Methods 8

Canny edge detection The first filter aims at deleting edge pixels surrounded by a disordered and thick texture. The second filter deals with ordered textures (e.g., a sheaf of parallel lines). 9

The first filter C(p) : a square mask of n 1 x n 1 pixels centered at pixel p. N : the number of edge pixels in C(p). (p) : the gradient direction of a generic edge pixel p we cancel the edge pixel p from the edge map if: N > 1 2 > 2, where 1 and 2 are two pre- fixed thresholds.(n 1 = 40, 1 = 260, 2 = 0.165) 10

The second filter Let N be the number of edge pixels p belonging to the mask D(p) n2 x n2 and such that (p) = (p). We cancel p if N > 3 (n 2 = 20, 3 = 120). 11

From now on we will denote with I the edge map of the currently analyzed image of the systems database after the salience filter application. 12

Generalized Hough Transform(GHT) GHT (template) I. R-Table Step 1 (Xc, Yc) Step 2 R-Table i, i=1,2,K, /K 0 180 Step 3 (X,Y) (r,) 13

Generalized Hough Transform(GHT) Step 4 ( ) (r,) i Step 5 Step 4 5 R-table 14

Generalized Hough Transform(GHT) II. Step 1 2D Hough table H(xc, yc) 0 Step2 (x,y) Step 3 R-Table i i (r, ) Step 4 H(xc, yc) 1 2 3 Step 5 H(xc, yc) (xc, yc) 15

Deformation tolerant GHT (DTGHT) I S user-drawn sketch Seg I T R-Table m cardinality of T(m = #T = #S) R-Table if p k is a point of S, then: T[k] = p r - p k, p r being the centroid of S 16

Deformation tolerant GHT (DTGHT) I (p) and S [k] denote, respectively, the direction of the point p in I and p k in S. In order to improve the accuracy, I (p) and S [k] are computed using adjacent points in the same segment using the following formula: where ( = 10) is a constant and p j is the jth point in a given segment s (and analogously for S ). 17

Deformation tolerant GHT (DTGHT) Nevertheless, we do not use S [k] to index T as in the original GHT. In fact we aim at looking for a shape S contained in I which is similar but not necessarily identical to S. Hence, we usually expect that a point p in S and a corresponding point p in S are quite differently oriented. 18

Voting Procedure We perform a vote operation analogous to the original GHT voting phase. = /8 Now we have a voting result in space A. 19

Cluster the Votes in A fixed vote dispersion window W Let W 2l+1x2l+1 be a square mask (l is defined below). W(p) is the set of all the nonzero cells of A contained in the mask W when its center is positioned at p. The mass M(p) of W(p), as the sum of the values of the elements of W(p). The maximum of M(p) corresponds to the mass of the region with the highest concentration of votes. 20

Compute M(p) M(p) is incrementally built using a technique similar to the integral image. W i (p) represents the nonzero elements of the ith column of the mask W(p). 21

Compute M(p) Let now C(x, y) be the cumulative row sum computed with respect to the yth column of A 22

Compute M(p) If P = arg max pI M(p), then P with a high probability is the point in I corresponding to the centroid of the shape most similar to S. Since the deformation tolerance area delimits the region of the points vary with S P, from the parameter l it decides the size of the shape details which will be ignored by the system in the matching process. We set l = d, where d is the diagonal of I and =0.03( in our trials l = 12, which leads to a window side of 25 pixels.). 23

Example of systems output 24

Line segment matching S P is the projection of S on I with P its center of mass Thick textured regions and cluttered backgrounds can randomly concentrate their votes in a unique point not actually corresponding to a shape S similar to S. 25

Line segment matching Extraneous vs. Valid Segments A point p of to I is a valid point if i is a valid hypothesis for p. We call a segment s i a valid segment if #V i k 1 x #s i, where k 1 = 0.7 and V i is the set of all the valid points of the segment s i. 26

Line segment matching A point p of to I is a nearby point if We call a segment s i a extraneous segment if s i is not a valid segment and # N i k 2 x #s i, where k 2 = 0.2 and N i is the set of all the nearby points of the segment s i. Let V be the subset of Seg composed of all the valid segments. Let E be the subset of Seg composed of all the extraneous segments. 27

Matching Test > valid valid , (m true ) 28

Similarity 29

Similarity rank The DTGHT, like the original GHT, is not rotation nor scale invariant. In the off-line preprocessing of each database image we produce a pyramidal representation of I composed of 5 different resolution levels. 30

Similarity rank The final scale invariant similarity estimation (SISim) between I and S is given by we can suppose the user usually draws a sketch with its expected orientation (e.g., a horizontal car or horse, a vertical tree) and thus rotation invariance can often be ignored in order to speed up the systems performance. 31

Similarity rank 32

Results 33

Computational complexity n is the number of edge pixels of I N 1 = w x h, m = #S, N = #Seg k is the number of scale iterations (N n,N 1 ) R-table voting phase find max M construction of the sets V and E Extraneous vs. Valid Segments and the Matching Test. 34

Computational complexity the computational worst case cost of the original GHT is O(h(nm + N 1 )) with h iterations for different discrete values of scale. From this comparison we can state that the DTGHT and the GHT have the same asymptotic worst case behavior. Moreover the DTGHT needs fewer iterations with respect to the GHT in order to deal with the same range of scale changes (i.e., k < h) 35

Experimental results We have implemented our method with non- optimized Java code and tested it on a Pentium IV, 1.7 GHz. Less than 2 s, one second on average. Images from 200 x 200 up to 380 x 350 pixels. Include 5 different iterations per image for the 5 corresponding image scale values. Not include the preprocessing. 36

Experimental results The systems database is composed of 283 images randomly taken by the Web. No manual segmentation has been performed on the images in order to separate the interesting objects from their background or from other adjacent or occluding objects. Also lighting conditions and noise degree are not fixed. 37

Experimental results 38

Experimental results Comparison to other approaches 24 DTGHT, 15 do not apply scale iterations,using the objects minimum enclosing rectangle to set the scale parameters. Kimia dataset. 39

Experimental results Comparison to other approaches we have obtained the second best result. our system is the only one among those mentioned in Table 2 which can be reliably applied to images containing occlusions and non-uniform backgrounds. 40

Experimental results Comparison to other approaches Caltech 101 dataset, composed of real images with significant texture and clutter. 160 images for a given query was about 140 seconds, including 5 different scale iterations per image. 41

Conclusion 42

Conclusion DTGHT is an effective technique to deal with the two main problems in sketch-based image retrieval: image segmentation and inexact matching. inexact matching can be realized using a large dispersion vote window and that a dynamic programming approach makes this process efficient. 43

Conclusion Segmentation is further obtained comparing the sketch with the candidate image lines. We have also shown how, differently from most of the existing sketch-based image retrieval approaches, the DTGHT is able to efficiently deal with images with cluttered backgrounds. 44

Thank You! 45