Upload
ernesto-tapia
View
224
Download
5
Embed Size (px)
Citation preview
Pattern Recognition Letters 32 (2011) 197–201
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier .com/locate /patrec
A note on the computation of high-dimensional integral images
Ernesto TapiaFreie Universität Berlin, Institut für Informatik, Arnimallee 7, 14195 Berlin, Germany
a r t i c l e i n f o a b s t r a c t
Article history:Received 15 May 2009Available online 27 October 2010Communicated by Q. Ji
Keywords:Integral imageHaar-based featuresHigh-dimensional imageMöbius inversion formulaNeuroimaging
0167-8655/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.patrec.2010.10.007
E-mail address: [email protected]
The integral image approach allows optimal computation of Haar-based features for real-timerecognition of objects in image sequences. This paper describes a generalization of the approach tohigh-dimensional images and offers a formula for optimal computation of sums on high-dimensionalrectangles.
� 2010 Elsevier B.V. All rights reserved.
1. Introduction
Viola and Jones (2004) introduced the integral image approachfor real-time detection of objects in image sequences. They con-structed a boosted cascade of simple classifiers based on Haar-similar features that measure vertical, horizontal, central, anddiagonal variations of pixel intensities. These features are the dif-ference between the sums of image values on two, three, and fourrectangles (see Fig. 1). However, it must be noted that the sum ofimage values i(x0,y0) on a given rectangle (x0,y0] � (x1,y1],
A ¼X
x0<x06x1
Xy0<y06y1
iðx0; y0Þ ð1Þ
is computationally expensive, because its complexity is propor-tional to the number of pixels contained in the rectangle.
One of the key contributions of Viola and Jones is the use of theintegral image (Crow, 1984; Viola and Jones, 2004) as an intermedi-ate array representation to optimally compute the sum A. The inte-gral image value at the pixel (x,y) is defined as the sum
Iðx; yÞ ¼X
06x06x
X06y06y
iðx0; y0Þ; ð2Þ
of the original image values on the rectangle [0,0] � [x,y]. Theycomputed the integral image in one pass over the image using therecurrence
cðx; yÞ ¼ cðx; y� 1Þ þ iðx; yÞ; ð3Þ
ll rights reserved.
Iðx; yÞ ¼ Iðx� 1; yÞ þ cðx; yÞ ð4Þwith
cðx;�1Þ ¼ Ið�1; yÞ ¼ 0; ð5Þwhere c(x,y) is called the cumulative row sum (Fig. 2). Thus, theycomputed A in constant time using only four references to the inte-gral image using the formula
A ¼ Iðx1; y1Þ � Iðx1; y0Þ � Iðx0; y1Þ þ Iðx0; y0Þ: ð6Þ
Ke et al. (2005) extended the integral image approach to detect themotion and activity of persons in videos. They considered the imagesequences as three-dimensional images and defined the integral videoto compute volumetric features from the video’s optical flow (seeFig. 3). The features are the sums of image values on parallelepi-peds, and the sums are optimally computed using eight referencesto the integral video.
Many other high-dimensional image structures could benefitfrom this approach. Examples of these structures are flow throughporous media in experimental fluid dynamics (Preusser and Rumpf,2003), and functional magnetic resonance images (fMRI) in medicalapplications (Huettel et al., 2004). These three-dimensional imagesare formed with volumetric picture elements (voxels) p, which locatethe values i(p) in space. These structures can also be extended to dy-namic four-dimensional images i(p, t), where p and t are the discreteindices in space and time, respectively.
Neuroimaging is the most appealing application of the integralimage approach owing to the advances in magnetic resonancetechnology that has made real-time fMRI possible (deCharms,2007; Weiskopf et al., 2004). Real-time analysis of dynamic fMRIdata could allow for the development of methods for mind-eventrecognition, which can led to new and practical applications, such
Fig. 1. Haar-based rectangular features used for face recognition. The features arethe sum of the values on the gray region minus the sum of the values on the whiteregion.
Fig. 2. Left: Integral image representation. Right: The four references used tocompute the image values on the gray area.
198 E. Tapia / Pattern Recognition Letters 32 (2011) 197–201
as brain-computer interfaces, lie detection, and therapeutic appli-cations (deCharms, 2008). Thus, a natural question is how wecan generalize the integral image approach for real-time analysisof high-dimensional images.
We realized that generalization of the approach basically con-sists of adapting two main steps. One that computes an integral ar-ray in one pass, and the other that computes the sum of pixelsincluded in a high-dimensional rectangle using only few referencesto the integral array in constant time.
The next section states these generalization steps and beginswith some useful notations and definitions.
2. Integral representation in high dimensions
We denote vectors with the usual notation
x ¼ ðx1; . . . ; xdÞ: ð7Þ
Bold-faced scalars denote vectors whose entries are equal to thescalar.
Fig. 3. Above: Volumetric features computed by the integral video. Below: The blackparallelepiped V.
Superindices in vectors represent a labeling, which can be a sca-lar m such as
xm ¼ ðxm1 ; . . . ; xm
d Þ; ð8Þ
or a vector n such as
xn ¼ ðxn11 ; . . . ; xnd
d Þ: ð9Þ
The vector em is a member of the canonical basis, where emm ¼ 1 and
emn ¼ 0 for n – m.
A relation that plays a relevant role in this work is defined asfollows:
Definition 1. The partial order � on the vectors is defined as
x � y() xi 6 yi; i ¼ 1; . . . ; d: ð10Þ
Remark 1. The partial order lets us define intervals in analogy toone-dimensional intervals. For example, consider the semi-closedinterval
ðx; y� ¼ fz : x � z � yg: ð11Þ
Similarly, we define the intervals (x,y), [x,y], and [x,y). It must benoted that these intervals define geometrically high-dimensionalrectangles. We will use interval or rectangle without distinctionto denote such sets.
Definition 2. A d-dimensional image is a real-valued function
i : ½0;u� ! R: ð12Þ
The integral image I : ½0;u� ! R of the image i is defined as
IðxÞ ¼X
z2½0;x�iðzÞ: ð13Þ
2.1. Optimal computation of integral images
The first step in this approach is the computation of the integralimage in one pass. This step is relatively easy to generalize: if thearray has dimension d, then we have to maintain only d � 1 extra
circles are the references used to compute the sum of the optical flow on the
E. Tapia / Pattern Recognition Letters 32 (2011) 197–201 199
arrays and define a recursion similar to (3)–(5). We can formallystate this idea as follows:
Proposition 1. The integral image I is computed in one pass over theimage i using the arrays cm, m = 1, . . . ,d � 1, and the recurrence
IðxÞ ¼ Iðx� e1Þ þ c1ðxÞ; ð14Þc1ðxÞ ¼ c1ðx� e2Þ þ c2ðxÞ ð15Þ
..
.ð16Þ
cd�1ðxÞ ¼ cd�1ðx� edÞ þ iðxÞ ð17Þwith
cmðxÞ ¼ IðxÞ ¼ 0; ð18Þ
when
xn < 0 for m ¼ 1; . . . ; d� 1 and n ¼ 1; . . . ;d: ð19Þ
Proof. By reordering the sum in the integral image, we have
IðxÞ ¼X
0�z�x
iðzÞ ð20Þ
¼X
06z16x1
� � �X
06zd6xd
iðz1; . . . ; zdÞ ð21Þ
¼X
06z16x1�1
� � �X
06zd6xd
iðz1; . . . ; zdÞ
þX
06z26x2
� � �X
06zd6xd
iðx1; z2; . . . ; zdÞ ð22Þ
If we define
c1ðxÞ ¼X
06z26x2
� � �X
06zd6xd
iðx1; z2; . . . ; zdÞ; ð23Þ
then we have
IðxÞ ¼ Iðx� e1Þ þ c1ðxÞ: ð24Þ
Similarly, we define for n = 1, . . . ,d � 1,
cnðxÞ ¼ cnðx� enþ1Þ þ cnþ1ðxÞ ð25Þwith
cnþ1ðxÞ ¼X
06znþ16xnþ1
� � �X
06zd6xd
iðx1; . . . ; xn; znþ1; . . . ; zdÞ; ð26Þ
where cd(x) = i(x). It must be noted that recursions (24) and (25) areactually undefined if the entries of x � e1 or x � en+1 are negative.For this reason, we define (18) and (19). h
The second step in the approach is the optimal computation of
A ¼X
z2ðx0 ;x1 �
iðzÞ; ð27Þ
Fig. 4. (a) Integral image values. (b) Co
given the image i, its integral image I, and the rectangle of interest(x0,x1]. For such purpose, we define the following concepts:
Definition 3. The corners of the rectangle (x0,x1] are the vectors
xq ¼ ðxq11 ; . . . ; xqd
d Þ; ð28Þ
where q 2 {0,1}d.Geometrically, the corners are the points that limit the rectangle
along the axes. Fig. 4 shows an example in the two-dimensionalspace. In this case, there are four corners x(0,0), x(1,0), x(0,1), and x(1,1).
Binary labeling used in the limits of the rectangle (x0,x1] natu-rally induces a bijective mapping of the corners to the binary vec-tors q 2 {0,1}d. We used this bijection to define an importantconcept:
Definition 4. The binary representation of the rectangle (x0,x1] arethe sums defined on its corners
SðqÞ ¼X
z2½0;xq �iðzÞ ð29Þ
and
AðqÞ ¼X
z2ðxq�1 ;xq �iðzÞ; ð30Þ
where q 2 {0,1}d and x�1n ¼ �1 for n = 1, . . . ,d.
The binary representation offers the key to express sums on therectangles in terms of the partial ordering of binary vectors:
Sð1;1Þ ¼X
q�ð1;1ÞAðqÞ ¼ Að1;1Þ þ Að0;1Þ þ Að0;1Þ þ Að0;0Þ: ð31Þ
From Fig. 5, it can be observed that similar relations hold for all val-ues of S in the binary representation:
Sð1;0Þ ¼X
q�ð1;0ÞAðqÞ ¼ Að1;0Þ þ Að0;0Þ; ð32Þ
Sð0;1Þ ¼X
q�ð0;1ÞAðqÞ ¼ Að0;1Þ þ Að0;0Þ; ð33Þ
Sð0;0Þ ¼X
q�ð0;0ÞAðqÞ ¼ Að0;0Þ: ð34Þ
The values A correspond to the sums on each of the four rectan-gles defined by the origin, the axis, and the corners, and the valuesS correspond to the integral values on the rectangle’s corners (seeFig. 5).
We can also find an expression similar to (31) for the sumA(1,1) defined on the rectangle (x0,x1]:
Að1;1Þ ¼X
q�ð1;1ÞlðqÞSðqÞ
¼ Sð1;1Þ � Sð0;1Þ � Sð1;0Þ þ Sð0;0Þ; ð35Þ
rners of the rectangle of interest.
(C) (d)Fig. 5. Example of binary representation in two-dimensional space.
200 E. Tapia / Pattern Recognition Letters 32 (2011) 197–201
where l(q) is a coefficient that also depends on the corner (1,1). Theother values of A can also be written in terms of S and partial order-ing. Note that (35) is the optimal formula (6) by Viola and Jones, butexpressed in terms of binary representation. We can easily verifythis visually by inspecting and comparing Figs. 4 and 5.
However, using only visual inspection to obtain optimal expres-sions for sums on a rectangle in dimension higher than two is verydifficult, if not impossible. We actually need a general expression tocompute sums on the rectangles in terms of the integral array. Wewill show that the generalization of (6) is algebraically possible bydemonstrating that equations similar to (31) and (35) also hold ingeneral, using the binary representation and the following result ofcombinatorial theory:
Proposition 2 (Möbius Inversion Formula). Let f(q) be a real-valued function, defined for q ranging in a locally finite partiallyordered set Q. Let an element m exist with the property that f(q) = 0unless q P m. Suppose that
gðqÞ ¼Xp6q
f ðpÞ: ð36Þ
Then
f ðqÞ ¼Xp6q
lðp; qÞgðpÞ; ð37Þ
where the function l is called the Möbius function of the partially or-dered set Q. The value l(p,q) is computed recursively for p 6 q as
lðp; qÞ ¼1; p ¼ q;
�P
p6r<qlðp; rÞ; p – q:
8<: ð38Þ
Interested readers can refer to Rota (1964) for the proof ofMöbius Inversion Formula.
Now, we can state an important result of this section.
Proposition 3. We can express the binary representation of therectangle (x0,x1] as
SðqÞ ¼Xp�q
AðpÞ ð39Þ
and
AðqÞ ¼Xp�q
ð�1Þ‘ðqÞ�‘ðpÞSðpÞ; ð40Þ
where
‘ðqÞ ¼Xd
i¼1
qi: ð41Þ
Proof. Eq. (39) is easily proved using
½0; xq� ¼[p�q
ðxp�1; xp�: ð42Þ
Now, let us prove (40). Observe that the element m mentionedin Proposition 2 guarantees that the sum (36) is well defined. Inour case, we don’t have to prove the existence of this element,because the sum (39) runs over a finite number of indices, and thusit is already well defined. Thus, the partial order � and (39) satisfythe hypothesis of the Möbius Inversion Formula, and can be con-cluded that
AðqÞ ¼Xp�q
lðp; qÞSðpÞ: ð43Þ
Finally, we have to prove that
lðp; qÞ ¼ ð�1Þ‘ðqÞ�‘ðpÞ: ð44ÞTherefore, we use definition (38) of the Möbius function. Considerfirst that p = q. Then, we have ‘(p) = ‘(q) and thus
lðp; qÞ ¼ 1 ¼ ð�1Þ0 ¼ ð�1Þ‘ðqÞ�‘ðpÞ: ð45Þ
E. Tapia / Pattern Recognition Letters 32 (2011) 197–201 201
Suppose that p – q and that (44) is valid for l(p,r), with p � r � q.Using definition (38), we have
lðp; qÞ ¼ �X
p�r�q
lðp; rÞ ð46Þ
¼ �X‘ðqÞ�‘ðpÞ�1
i¼0
Xp�r�q
‘ðrÞ¼‘ðpÞþi
lðp; rÞ; ð47Þ
¼ �X‘ðqÞ�‘ðpÞ�1
i¼0
Xp�r�q
‘ðrÞ¼‘ðpÞþi
ð�1Þ‘ðrÞ�‘ðpÞ ð48Þ
¼ �X‘ðqÞ�‘ðpÞ�1
i¼0
Xp�r�q
‘ðrÞ¼‘ðpÞþi
ð�1Þi ð49Þ
¼ �X‘ðqÞ�‘ðpÞ�1
i¼0
ð�1Þijfp � r � q : ‘ðrÞ ¼ ‘ðpÞ þ igj ð50Þ
¼ �X‘ðqÞ�‘ðpÞ�1
i¼0
ð�1Þi‘ðqÞ � ‘ðpÞ
i
� �ð51Þ
¼ �X‘ðqÞ�‘ðpÞi¼0
ð�1Þi‘ðqÞ � ‘ðpÞ
i
� �þ ð�1Þ‘ðqÞ�‘ðpÞ ð52Þ
¼ ð�1Þ‘ðqÞ�‘ðpÞ: � ð53Þ
The above-mentioned result lets us conclude the second step inthe generalization of the approach:
Proposition 4. Given an image i, its integral image I, and therectangle (x0,x1], we can compute sum A of the image values on therectangle using 2d references to the integral image with the formula
A ¼X
p2f0;1gd
ð�1Þd�‘ðpÞIðxpÞ: ð54Þ
Proof. Eq. (54) is an immediate consequence of Proposition 3 and
SðqÞ ¼ IðxqÞ; ð55ÞAð1Þ ¼ A; ð56Þ
which are derived from the definition of binary representation. h
3. Concluding remarks
This paper gives a direction to generalize the Integral Image ap-proach for d-dimensional images. The generalization consists ofcomputation of an integral array in one pass and the optimal com-putation of sums on rectangles using 2d references to the integralarray.
However, the generalization has some drawbacks for high d.The computation of the integral array uses d � 1 extra arrays, sig-nifying a memory increase that many personal computers couldnot support for large images. Another problem is the curse ofdimensionality. The boosting method used by Viola and Jones se-lects the best feature from all possible ones generated by scaling,rotating, and translating a base feature through the image. If weconsider, for example, the first (and simplest) volumetric featurein Fig. 3, then the number of features is proportional to n2d for animage of dimension n � n � n. Despite these drawbacks, the resultspresented in this study seem to be an attractive starting point forboosting-based classification in high-dimensional imaging.
There is another generalization not directly related to objectrecognition. If we use integration on the rectangle of interest in-stead of addition in Proposition 4, then we can informally state that(54) offers a generalization of the Fundamental Theorem of Calculus(Apostol, 1967). Remember that this theorem states that if f is acontinuous function on the interval [x0,x1] and F is an antideriva-tive of f, thenZ x1
x0f ðxÞdx ¼ Fðx1Þ � Fðx0Þ: ð57Þ
Note that the integral image can be regarded as an ‘‘antiderivative’’of the original image, because it is an integral of the image with avariable upper limit, similar to the antiderivative of a function ofone variable. Using this analogy, generalization to several variablescomputes the integral of a function f on the interval using its anti-derivative F evaluated at the rectangle’s cornersZ½x0 ;x1 �
f ðxÞdx ¼X
p2f0;1gd
ð�1Þd�‘ðpÞFðxpÞ: ð58Þ
This generalization is a ‘‘direct’’ generalization of the FundamentalTheorem of Calculus, if we compare it to the generalization givenby Stokes’ theorem, which involves specialized concepts, such asmanifolds and differential forms (Katz, 1979). A formal statement ofgeneralization (58) needs the definition of integrability and antide-rivative for functions of several variables, among other concepts.However, we are sure the proof of our generalization could followthe procedure that we had developed in this study to demonstrateProposition 4.
Acknowledgments
The author is very grateful to Marte Ramírez for his commentsabout the preliminary results of this study. Special thanks to Dr.Waldemar Barrera, Dr. Fernando Galaz-Fontes, and Dr. Luis Verdeas well as the anonymous reviewers for their comments and cor-rections, which helped to improve this study.
References
Apostol, T.M., 1967. Calculus: One-Variable Calculus with an Introduction to LinearAlgebra, vol. 1. John Wiley & Sons Inc.
Crow, F.C., 1984. Summed-area tables for texture mapping. In: SIGGRAPH ’84: Proc.11th Annual Conference on Computer Graphics and Interactive Techniques.ACM, New York, NY, USA, pp. 207–212.
deCharms, R.C., 2007. Reading and controlling human brain activation using real-time functional magnetic resonance imaging. Trends Cogn. Sci. 11 (11), 473–481.
deCharms, R.C., 2008. Applications of real-time fMRI. Nat. Rev. Neurosci. 9 (9), 720–729.
Huettel, S.A., Song, A.W., McCarthy, G., 2004. Functional magnetic resonanceimaging. Sinauer Associates, Sunderland, MA.
Katz, V.J., 1979. The history of Stokes’ theorem. Math. Mag., 146–156.Ke, Y., Sukthankar, R., Hebert, M. 2005. Efficient visual event detection using
volumetric features. In: IEEE Internat. Conf. on Computer Vision, vol. 1,Washington, DC, USA, pp. 166–173.
Preusser, T., Rumpf, M. 2003. Extracting motion velocities from 3d image sequencesand coupled spatio-temporal smoothing. In: SPIE Conf. on Visualization andData Analysis, vol. 5009, pp. 181–192.
Rota, G.C., 1964. On the foundations of combinatorial theory I – theory of Möbiusfunctions. Prob. Theory Relat. Fields 2 (4), 340–368.
Viola, P., Jones, M., 2004. Robust real-time object detection. Internat. J. Comput.Vision 57 (2), 137–154.
Weiskopf, N., Mathiak, K., Bock, S.W., Scharnowski, F., Veit, R., Grodd, W., Goebel, R.,Birbaumer, N., 2004. Principles of a brain-computer interface (BCI) based onreal-time functional magnetic resonance imaging (fMRI). IEEE Trans. Biomed.Eng. 51 (6), 966–970.