Image and Vision Computing 25 (2007) 1802–1813 Deformation tolerant generalized Hough transform...
Preview:
Citation preview
- Slide 1
- Image and Vision Computing 25 (2007) 18021813 Deformation
tolerant generalized Hough transform for sketch-based image
retrieval in complex scenes M. Anelli, L. Cinque, Enver Sangineto
1
- Slide 2
- Outline 1. Introduction 2. Methods 3. Results 4. Conclusion
2
- Slide 3
- Introduction 3
- Slide 4
- Introduction(1/4) In the last 1215 years the availability of
digital visual information has grown very quickly. Content Based
Image Retrieval (CBIR) is a research area whose aim is the
development of tools for retrieval of visual information using its
perceptual content. 4
- Slide 5
- Introduction(2/4) In Image Retrieval by Sketch the query is a
stylized sketch drawn by the user in order to specify the shape
features she is interested to find in the images within the systems
database. The issue of inexact matching between the sketch and the
images and the issue of segmentation are the two main problems
which a sketch-based image retrieval system has to deal with.
5
- Slide 6
- Introduction(3/4) Most of the methods and techniques for shape-
based image retrieval can be classified in three main categories:
statistical techniques deformable template matching multiscale
representations 6
- Slide 7
- Introduction(4/4) modified the GHT First of all, we spread the
voting result in order to deal with small local deformations
without increasing the whole asymptotic computational space and
time complexity. Moreover, once the most likely position of the
sketch in the image has been localized using the votes in the
accumulator, shape segmentation is further verified. 7
- Slide 8
- Methods 8
- Slide 9
- Canny edge detection The first filter aims at deleting edge
pixels surrounded by a disordered and thick texture. The second
filter deals with ordered textures (e.g., a sheaf of parallel
lines). 9
- Slide 10
- The first filter C(p) : a square mask of n 1 x n 1 pixels
centered at pixel p. N : the number of edge pixels in C(p). (p) :
the gradient direction of a generic edge pixel p we cancel the edge
pixel p from the edge map if: N > 1 2 > 2, where 1 and 2 are
two pre- fixed thresholds.(n 1 = 40, 1 = 260, 2 = 0.165) 10
- Slide 11
- The second filter Let N be the number of edge pixels p
belonging to the mask D(p) n2 x n2 and such that (p) = (p). We
cancel p if N > 3 (n 2 = 20, 3 = 120). 11
- Slide 12
- From now on we will denote with I the edge map of the currently
analyzed image of the systems database after the salience filter
application. 12
- Slide 13
- Generalized Hough Transform(GHT) GHT (template) I. R-Table Step
1 (Xc, Yc) Step 2 R-Table i, i=1,2,K, /K 0 180 Step 3 (X,Y) (r,)
13
- Slide 14
- Generalized Hough Transform(GHT) Step 4 ( ) (r,) i Step 5 Step
4 5 R-table 14
- Slide 15
- Generalized Hough Transform(GHT) II. Step 1 2D Hough table
H(xc, yc) 0 Step2 (x,y) Step 3 R-Table i i (r, ) Step 4 H(xc, yc) 1
2 3 Step 5 H(xc, yc) (xc, yc) 15
- Slide 16
- Deformation tolerant GHT (DTGHT) I S user-drawn sketch Seg I T
R-Table m cardinality of T(m = #T = #S) R-Table if p k is a point
of S, then: T[k] = p r - p k, p r being the centroid of S 16
- Slide 17
- Deformation tolerant GHT (DTGHT) I (p) and S [k] denote,
respectively, the direction of the point p in I and p k in S. In
order to improve the accuracy, I (p) and S [k] are computed using
adjacent points in the same segment using the following formula:
where ( = 10) is a constant and p j is the jth point in a given
segment s (and analogously for S ). 17
- Slide 18
- Deformation tolerant GHT (DTGHT) Nevertheless, we do not use S
[k] to index T as in the original GHT. In fact we aim at looking
for a shape S contained in I which is similar but not necessarily
identical to S. Hence, we usually expect that a point p in S and a
corresponding point p in S are quite differently oriented. 18
- Slide 19
- Voting Procedure We perform a vote operation analogous to the
original GHT voting phase. = /8 Now we have a voting result in
space A. 19
- Slide 20
- Cluster the Votes in A fixed vote dispersion window W Let W
2l+1x2l+1 be a square mask (l is defined below). W(p) is the set of
all the nonzero cells of A contained in the mask W when its center
is positioned at p. The mass M(p) of W(p), as the sum of the values
of the elements of W(p). The maximum of M(p) corresponds to the
mass of the region with the highest concentration of votes. 20
- Slide 21
- Compute M(p) M(p) is incrementally built using a technique
similar to the integral image. W i (p) represents the nonzero
elements of the ith column of the mask W(p). 21
- Slide 22
- Compute M(p) Let now C(x, y) be the cumulative row sum computed
with respect to the yth column of A 22
- Slide 23
- Compute M(p) If P = arg max pI M(p), then P with a high
probability is the point in I corresponding to the centroid of the
shape most similar to S. Since the deformation tolerance area
delimits the region of the points vary with S P, from the parameter
l it decides the size of the shape details which will be ignored by
the system in the matching process. We set l = d, where d is the
diagonal of I and =0.03( in our trials l = 12, which leads to a
window side of 25 pixels.). 23
- Slide 24
- Example of systems output 24
- Slide 25
- Line segment matching S P is the projection of S on I with P
its center of mass Thick textured regions and cluttered backgrounds
can randomly concentrate their votes in a unique point not actually
corresponding to a shape S similar to S. 25
- Slide 26
- Line segment matching Extraneous vs. Valid Segments A point p
of to I is a valid point if i is a valid hypothesis for p. We call
a segment s i a valid segment if #V i k 1 x #s i, where k 1 = 0.7
and V i is the set of all the valid points of the segment s i.
26
- Slide 27
- Line segment matching A point p of to I is a nearby point if We
call a segment s i a extraneous segment if s i is not a valid
segment and # N i k 2 x #s i, where k 2 = 0.2 and N i is the set of
all the nearby points of the segment s i. Let V be the subset of
Seg composed of all the valid segments. Let E be the subset of Seg
composed of all the extraneous segments. 27
- Slide 28
- Matching Test > valid valid , (m true ) 28
- Slide 29
- Similarity 29
- Slide 30
- Similarity rank The DTGHT, like the original GHT, is not
rotation nor scale invariant. In the off-line preprocessing of each
database image we produce a pyramidal representation of I composed
of 5 different resolution levels. 30
- Slide 31
- Similarity rank The final scale invariant similarity estimation
(SISim) between I and S is given by we can suppose the user usually
draws a sketch with its expected orientation (e.g., a horizontal
car or horse, a vertical tree) and thus rotation invariance can
often be ignored in order to speed up the systems performance.
31
- Slide 32
- Similarity rank 32
- Slide 33
- Results 33
- Slide 34
- Computational complexity n is the number of edge pixels of I N
1 = w x h, m = #S, N = #Seg k is the number of scale iterations (N
n,N 1 ) R-table voting phase find max M construction of the sets V
and E Extraneous vs. Valid Segments and the Matching Test. 34
- Slide 35
- Computational complexity the computational worst case cost of
the original GHT is O(h(nm + N 1 )) with h iterations for different
discrete values of scale. From this comparison we can state that
the DTGHT and the GHT have the same asymptotic worst case behavior.
Moreover the DTGHT needs fewer iterations with respect to the GHT
in order to deal with the same range of scale changes (i.e., k <
h) 35
- Slide 36
- Experimental results We have implemented our method with non-
optimized Java code and tested it on a Pentium IV, 1.7 GHz. Less
than 2 s, one second on average. Images from 200 x 200 up to 380 x
350 pixels. Include 5 different iterations per image for the 5
corresponding image scale values. Not include the preprocessing.
36
- Slide 37
- Experimental results The systems database is composed of 283
images randomly taken by the Web. No manual segmentation has been
performed on the images in order to separate the interesting
objects from their background or from other adjacent or occluding
objects. Also lighting conditions and noise degree are not fixed.
37
- Slide 38
- Experimental results 38
- Slide 39
- Experimental results Comparison to other approaches 24 DTGHT,
15 do not apply scale iterations,using the objects minimum
enclosing rectangle to set the scale parameters. Kimia dataset.
39
- Slide 40
- Experimental results Comparison to other approaches we have
obtained the second best result. our system is the only one among
those mentioned in Table 2 which can be reliably applied to images
containing occlusions and non-uniform backgrounds. 40
- Slide 41
- Experimental results Comparison to other approaches Caltech 101
dataset, composed of real images with significant texture and
clutter. 160 images for a given query was about 140 seconds,
including 5 different scale iterations per image. 41
- Slide 42
- Conclusion 42
- Slide 43
- Conclusion DTGHT is an effective technique to deal with the two
main problems in sketch-based image retrieval: image segmentation
and inexact matching. inexact matching can be realized using a
large dispersion vote window and that a dynamic programming
approach makes this process efficient. 43
- Slide 44
- Conclusion Segmentation is further obtained comparing the
sketch with the candidate image lines. We have also shown how,
differently from most of the existing sketch-based image retrieval
approaches, the DTGHT is able to efficiently deal with images with
cluttered backgrounds. 44
- Slide 45
- Thank You! 45