Upload
madhurima-hooda
View
154
Download
2
Embed Size (px)
Citation preview
Master Erasmus Mundus in Color in Informatics and Media Technology (CIMET)
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi‐dominant Colors Tracking
Master Thesis Report
Presented by Priyanto Hidayatullah
and defended at the
University of Jean Monnet Saint‐Etienne, France
22nd June 2010 Jury Committee: Supervisor: Prof. Alain Tremeau Hubert Konik, Ph.D Prof. Jon Yngve Hardeberg Faouzi Alaya Cheikh, Ph.D Javier Hernández‐Andrés, Ph.D Damien Muselet, Ph.D Eric Dinet, Ph.D
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
i
Abstract
Object tracking is a wide area in which a lot of methods available and wide variety of
applications. One of the applications would be tracking an object in a clickable hypervideo
to enrich the interactivity of video application. In this thesis, some state of the art of object
tracking methods are reviewed and closely observed. We then select one of object tracking
state of the art methods to improve. Our selection goes to CAMSHIFT which has been very
well accepted as one of the most prominent methods in object tracking which has real time
speed performance and more suitable for clickable hypervideo. CAMSHIFT is very good
for single hue object tracking and in the condition where object’s color is different with
background’s colors.
In this thesis, we try to improve the robustness of CAMSHIFT for multihued object
tracking and the situation where object’s colors are similar with background’s colors. To
improve robustness on the condition where object’s colors are similar to background’s
colors, we use object localization by selecting each dominant color object part using
combination of Mean-Shift segmentation and region growing. Hue-distance, saturation
and value color histogram are used to describe the object. We also track the dominant
color object parts separately and combine them together to improve robustness of the
tracking on multihued object. Our experiments showed that those methods improved
CAMSHIFT significantly. This improvement hopefully will be useful for object tracking in
clickable hypervideo.
Keywords: Object tracking, CAMSHIFT, Segmentation, Mean-Shift, Hypervideo.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
ii
Table of Contents
Abstract ...................................................................................................................... i
Table of Contents ....................................................................................................... ii
Table of Figures ......................................................................................................... iv
1 Introduction ...................................................................................................... 1
1.1 The General Aim of The Master Thesis ....................................................... 1
2 Previous Work ................................................................................................... 2
2.1 Test Videos ................................................................................................ 2
2.2 Object Tracking Categorization .................................................................. 4
2.3 Corner Detector Combined with Optical Flow ............................................ 4 2.3.1 Corner detection .................................................................................... 5 2.3.2 Optical flow ........................................................................................... 5
2.4 Speeded Up Robust Features (SURF) ......................................................... 7
2.5 Mean Shift Tracking .................................................................................. 9
2.6 CAMSHIFT Tracking ................................................................................ 11 2.6.1 Color probability distribution and histogram back projection ............... 12 2.6.2 Mass center calculation ....................................................................... 13 2.6.3 CAMSHIFT advantages and disadvantages .......................................... 14
2.7 Local Binary Pattern ................................................................................ 15
2.8 Beyond Semi-Supervised Online Boosting Tracking ................................. 17
2.9 Method that We Choose ........................................................................... 19
2.10 CAMSHIFT/Mean-Shift Improvement in Literatures ...............................20 2.10.1 Mean-Shift tracking combined with texture histogram .....................20 2.10.2 CAMSHIFT and Mean-Shift combined with interest points .............. 21 2.10.3 CAMSHIFT improvement using new HSV model ............................. 24 2.10.4 CAMSHIFT improvement using hue-distance and saturation features25 2.10.5 CAMSHIFT with improvement of object localization........................ 27 2.10.6 CAMSHIFT improvement using adaptive background (ABCShift) .... 28 2.10.7 CAMSHIFT improvement by background subtraction ...................... 31 2.10.8 The CAMSHIFT improvement method that we choose ..................... 32 2.10.9 The more specific aim of the master thesis ....................................... 32
3 Proposed Method ............................................................................................. 34
3.1 Object Localization .................................................................................. 34 3.1.1 Preprocessing ...................................................................................... 34 3.1.2 Image color transformation ................................................................. 36 3.1.3 Object Selection ................................................................................... 36 3.1.4 Minimum and maximum values storing ............................................... 36
3.2 Object Modeling ...................................................................................... 37
3.3 Making Color Mask.................................................................................. 38
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
iii
3.4 Segmentation .......................................................................................... 38
3.5 Histogram Back Projection ...................................................................... 39
3.6 Tracking .................................................................................................. 39
4 Implementations and Experiments .................................................................. 42
4.1 Implementations ..................................................................................... 42
4.2 Experiments Setting ................................................................................ 43
5 Results and Discussions ................................................................................... 44
5.1 Results .................................................................................................... 44 5.1.1 First Experiment Results ..................................................................... 44 5.1.2 Second Experiment Results ................................................................. 46 5.1.3 Third Experiment Results.................................................................... 49 5.1.4 Forth Experiment Results.................................................................... 49
5.2 Discussion ............................................................................................... 52 5.2.1 Some Advantages ................................................................................ 52 5.2.2 Some Limitations ................................................................................ 52
6 Conclusions and Future Works ........................................................................ 54
6.1 Conclusions ............................................................................................. 54
6.2 Future Works .......................................................................................... 54
7 Bibliography .................................................................................................... 55
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
iv
Table of Figures
Figure 2.1 Test Videos. ............................................................................................... 2 Figure 2.2 Illustration of optical flow[28]. .................................................................. 6 Figure 2.3 Shi Tomasi corner detector also detect the background corners inside object’s rectangle .................................................................................................................... 7 Figure 2.4 SURF Tracker Result. ................................................................................ 9 Figure 2.5 Intuitive description of Mean-Shift.[14] ................................................... 10 Figure 2.6 Summary of CAMSHIFT algorithm. ........................................................ 13 Figure 2.7 LBP and CS-LBP features for a neighborhood of 8 pixels [16]................... 15 Figure 2.8 Example of LBP calculation[16]............................................................... 16 Figure 2.9 LBP Tracker Result in second video.......................................................... 16 Figure 2.10 The core classifier system: detector, recognizer and tracker.[20] ............ 18 Figure 2.11 Comparing LBP Image and its back projection image. ............................ 21 Figure 2.12 SURF and CAMSHIFT 1......................................................................... 23 Figure 2.13 SURF and CAMSHIFT 2. ....................................................................... 23 Figure 2.14 CAMSHIFT with new HSV model. ......................................................... 26 Figure 2.15 CAMSHIFT improvement with hue-distance saturation features. ........... 29 Figure 2.16 Foreground extraction. .......................................................................... 29 Figure 2.17 Sample of elongated object.....................................................................30 Figure 2.18 Background subtraction in static background. ....................................... 31 Figure 2.19 Background subtraction in dynamic background. ................................... 33 Figure 3.1 A sample of complex shape object ............................................................ 34 Figure 3.2 Object Localization using only region growing ......................................... 35 Figure 3.3 More precise object localization with only a single click. .......................... 35 Figure 3.4 Text file configuration to tune the parameters ......................................... 36 Figure 3.5 Color mask illustration ............................................................................ 38 Figure 3.6 Segmentation for smoothing and noise removal of third test video........... 38 Figure 3.7 Histogram Back Projection of first test video ........................................... 39 Figure 3.8 Maximum rectangle illustration. .............................................................40 Figure 3.9 The proposed method’s schema .............................................................. 41 Figure 4.1 Hue histogram of air plane body. ............................................................. 43 Figure 5.1 First video result with the proposed method. ........................................... 44 Figure 5.2 First video result with classic CAMSHFT at frame 33. .............................. 45 Figure 5.3 Object localization comparison ................................................................ 45 Figure 5.4 Second video result with our proposed method. ....................................... 47 Figure 5.5 Second video result with classic CAMSHIT. ............................................. 47 Figure 5.6 Third video result with the proposed method. .......................................... 48 Figure 5.7 Third video best result with classic CAMSHIFT at frame 300. .................. 50 Figure 5.8 Object (marked with red rectangle) tracked by the proposed method. ...... 50 Figure 5.9 Forth video best result with classic CAMSHIFT at frame 57. .................... 50 Figure 5.10 Drifting tracker...................................................................................... 51 Figure 5.11 Multiple object tracking using our proposed method .............................. 51
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
1
1 Introduction
Object tracking has been one of the most emerging areas in computer vision. There are
a lot of applications of object tracking. One of which would be tracking an object in a
clickable hypervideo. Hypervideo is a displayed video stream that contains embedded
user-clickable anchors[19]. In this application, user can interact with the video like
interaction between user with a website. This enriches the interactivity of a video.
There will be a lot of advantages with this capability. For example, user can monetize
their videos by putting company’s links inside the video and, in reverse way,
companies now able to promote their product in videos.
Another capability that would be interesting is object tracking in hypervideo. This
means user can select any object in a video and track along the video sequence. For
example, user has favorite football player in football match and want to track his
movement along the match, then it would be possible with this capability. This also
true if a user want to track his favorite racer in F1 videos, track his favorite movie stars
in a movie cinema, etc.
In this thesis, we try to improve an object tracking method that can be used in
hypervideo. Some state of the art of object tracking methods are reviewed,
experimented and closely observed. We then select one of object tracking state of the
art methods and improve it.
1.1 The General Aim of The Master Thesis The general objective of the master thesis can be summarized into these points:
1) Study some state of the art object tracking methods
2) Choose one to improve based on some criteria
3) Improve the chosen method with some constraint if needed
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
2
2 Previous Work
Object tracking is a very wide area in computer vision. There are many kinds of
method which are sometimes suitable only for specific conditions. This part will
describe the review of some state of the art object tracking methods available now.
2.1 Test Videos Before we go deeper into the state of the art methods, in this section we present some
test videos which we used to examine the state of the art methods and help us to
choose one of them. Secondly, these test videos will be used to test our own proposed
method compare to the chosen method without our improvement.
The first one is a yellow trunk (Figure 2.1(a)). This is the simplest case where the
object is single hue with scaling, rotation and little deformation in front of dynamic
background which color is quite different with the object. The object is yellow while
the background is mostly blue. In the middle of the video, partial dynamic occlusion
occurs. The dimension of the video is 1280 x 720 pixels in 24 bit. The purpose of this
(a) (b)
(c) (d)
Figure 2.1 Test Videos. (a) First video: yellow trunk (b) Second video: air plane (c) Third video: small scaling toy
(d) Forth video: Football match
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
3
video is to test the robustness of the state of the art object tracking methods as well as
our proposed method on partial occlusion, scaled, rotated and deformed object.
The second test video is an air plane flying above sea with some small islands below
and mild cloud distraction (Figure 2.1(b)). This is a multihued object which passing
through a dynamic background. There are some distractions from background which
has similar color to some object parts’ color. The dimension of the video is 1280 x 720
pixels in 24 bit. The purpose of using this video is to test the robustness of object
tracking methods on multihued object with some distractions.
The third video is a small toy contains several dominant colors which moves across a
complex background (Figure 2.1(c)) which is available in [33]. This is a multihued
object in front of complex background which has very similar color to the object. Some
more challenges of this video are scaling and skewing of the object. The object is
moving outward the camera until the size is very small and skews several times. The
object is also moving very fast so then it is harder to track.
The background is actually static. Usually, for this kind of video, background
subtraction is very powerful. But because of the object stayed for quite a long time in
the early frames, even background subtraction will have a problem. It needs a lot of
training data to have a very good background model. More over, in the middle of the
video, there is some movement of the background that can ruin the background model.
The last thing, not only the wanted object is moving, the hand and the paper below the
object are also moving which make some more challenges if we are using background
subtraction.
The dimension of the video is 640 x 480 pixels in 24 bit. The purpose of this video is to
test the robustness of object tracking methods in multihued object in front of similarly
color background. Some challenging scaling and skewing on the object is also
important to test the object tracking performance.
The forth video is a football match video (Figure 2.1(d)) which is available in [34]. In
this video there is almost full occlusion and there is distraction from similar color
moving object. The object is also very small which will be a great challenge for some
object tracking methods. The dimension of the video is 544 x 436 pixels in 24 bit. The
purpose of using this video is to test the robustness of object tracking methods on very
small object with almost full occlusion and distraction from other similar color objects.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
4
2.2 Object Tracking Categorization In [17], Yilmaz et. al. wrote a result of object tracking methods survey. He proposed
object tracking categorization with methods that represent each category. The
categories themselves are divided into object detection and object tracking categories.
The categories are presented on 2006. We update the categorization examples with
some recent methods in each category so that it will be more relevant to our master
thesis. The categorization can be summarized in Table 2.1 and Table 2.2.
In the next section, we then choose some representative methods to study based on
these criteria:
1) Acceptability. The methods that are widely used by researchers are more
preferable.
2) Recentness. We prefer to choose more recent methods than the old ones.
2.3 Corner Detector Combined with Optical Flow In [5] Bradski stated that one of the basic method to do object tracking is selecting
representative point features and track that features using optical flow. This method is
one of the most intuitive methods in object tracking. The KLT tracker as the
representative of this method is a well known object tracking method. That is why we
choose this method to review. This method represents the point detectors and kernel
tracking according to Yilmaz et. al. categorization[17].
Categories Representative Work Point detectors Harris detector [Harris and Stephens 1988],
KLT detector [Shi and Tomasi 1994], Scale Invariant Feature Transform [Lowe 2004], Speeded Up Robust Features [Bay 2006]
Segmentation Active contours [Caselles et al. 1995]. Mean-shift [Comaniciu and Meer 1999],
Texture Descriptor Gray concurrence matrices [C. C. Gotlieb et. al., 1990] Gabor filtering [G. Wouwer et. al., 1999] Local Binary Pattern [Ojala, Pietikainen, 2001]
Background Modeling Mixture of Gaussians[Stauffer and Grimson 2000], Eigenbackground[Oliver et al. 2000], Dynamic texture background [Monnet et al. 2003].
Supervised Classifiers Support Vector Machines [Papageorgiou et al. 1998], Neural Networks [Rowley et al. 1998], Adaptive Boosting [Viola et al. 2003]. Beyond Semi Supervised Online Boosting [Stalder et al 2009]
Table 2.1 Object Detection Categories[17]
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
5
Categories Representative Work Point Tracking • Deterministic methods MGE tracker [Salari and Sethi 1990], GOA tracker [Veenman et al. 2001]. • Statistical methods Kalman filter [Broida and Chellappa 1986],
JPDAF [Bar-Shalom and Foreman 1988], PMHT [Streit and Luginbuhl 1994].
Kernel Tracking • Template and density based appearance models
KLT [Shi and Tomasi 1994], CAMSHIFT [Bradski, 1998], Layering [Tao et al. 2002],
• Multi-view appearance models
Eigentracking [Black and Jepson 1998], SVM tracker [Avidan 2001].
Silhouette Tracking
• Contour evolution State space models [Isard and Blake 1998], Variational methods [Bertalmio et al. 2000], Heuristic methods [Ronfard 1994].
• Matching shapes Hausdorff [Huttenlocher et al. 1993], Hough transform [Sato and Aggarwal 2004].
Table 2.2 Tracking Categories [17]
2.3.1 Corner detection Representative features naturally are the features that most probably have some
significant change in the next frame. We hopefully can select unique (or almost
unique) points so that it can be tracked more easily. One can take the points that have
strong derivative. Those points may be the points along the edge. But if we take two
derivatives in orthogonal directions, then we can hope that the points are unique.
Those points called corners.
To detect corner, one method that can be used is KLT Shi Tomasi corner detector [26].
The implementation is available in OpenCV 2.0[24] with function name called
cvGoodFeaturesToTrack(). This function computes the second derivatives (using Sobel
operators) that are needed and from those computes the needed Eigen values. It then
returns a list of points that meet the requirements of good features to track.
2.3.2 Optical flow Another approach to track a region defined by a primitive shape is to compute its
translation by use of an optical flow method. Optical flow methods are used for
generating dense flow fields by computing the flow vector of each pixel[17]. One of the
famous optical flow algorithm is the Lucas Kanade algorithm. The most basic equation
of it is stated in [27] which is
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
6
(1)
The goal of feature tracking: for a given point u in image I, find its corresponding
location v = u + d in next image J such as I(u) and J(v) are “similar". Displacement
vector d is the image velocity at x which also known as optical flow at x [27]. The
similarity function is measured on an image neighborhood of size (2ωx + 1) x (2ωy +1).
This neighborhood will be also called integration window. Let ωx and ωy are two
integers which has typical values 2, 3, 4, 5, 6, 7 pixels.
The basic idea of Lucas-Kanade algorithm based on three assumptions[5]:
1) Brightness constancy. A pixel of an object in an image does not change in
appearance as it (possibly) moves from frame to frame. For grayscale image,
this means we assume that the brightness of a pixel does not change as is
tracked from frame to frame.
2) Small movements. The image motion of an object changes slowly in time.
3) Spatial coherence. Neighboring points in a scene belong to same surface have
similar motion.
Figure 2.2 Illustration of optical flow[28].
The disadvantage of using small local window in Lucas-Kanade is the large motions
can move points outside of local window and makes it impossible to track[5]. This led
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
7
to the development of pyramidal LK algorithm which start tracking from highest level
of an image pyramid (lowest detail) and working down to lower levels (finer detail).
Tracking using image pyramids makes it possible to track a large motions in local
windows. In 1994, Shi and Tomasi proposed the KLT tracker which iteratively
computes the translation (du, dv) of a region (e.g., 25 × 25 patch) centered on an
interest point[17].
Figure 2.3 Shi Tomasi corner detector also detect the background corners inside object’s
rectangle
We have tried both methods combined together by using implementations in OpenCV
2.0. We make a rectangle bounding the object and detect the corners using Shi Tomasi
method and detect the movement of that corners using pyramidal Lucas-Kanade
method. To update the position of the object’s rectangle, we do averaging the
movement of all the corners movement based on the assumption of spatial coherence.
Nevertheless, the result is not satisfying. Because, the Shi Tomasi corner detection give
us, not only the object’s corners, but also background corners inside the object’s
rectangle (Figure 2.3). For example, if the object moves upward then the background
moves downward. When we do movement averaging to all of the corners inside the
object’s rectangle (which has object’s and background’s corners) for movement
calculation of object’s rectangle, the result is not satisfying. It is difficult to tune the Shi
Tomasi corner detection parameters so that it gives only object’s corners inside
object’s rectangle.
2.4 Speeded Up Robust Features (SURF) SURF is an image detector and descriptor using interest points of the image. This
method is very well accepted by researchers as one of the most prominent image
interest point detector and descriptor. That is why we choose this method to review.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
8
This method represents the point detectors according to Yilmaz et. al.
categorization[17].
In [15] Bay et. al. describe that the points detection used a very basic Hessian-matrix
approximation. This lends itself to the use of integral images which reduces the
computation time drastically. Interest points need to be found at different scales, not
least because the search of correspondences often requires their comparison in images
where they are seen at different scales. Scale spaces are usually implemented as an
image pyramid. The images are repeatedly smoothed with a Gaussian and then sub-
sampled in order to achieve a higher level of the pyramid. In order to localize interest
points in the image and over scales, a non-maximum suppression in a 3 x 3 x 3
neighborhood is applied.
For interest point description and matching, they build on the distribution of first
order Haar wavelet responses in x and y direction rather than the gradient, exploit
integral images for speed, and use only 64 dimensions. This reduces the time for
feature computation and matching, and has proven to simultaneously increase the
robustness. Furthermore, they introduced new indexing step based on the sign of the
Laplacian which increases the robustness of the descriptor and the matching speed.
The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the
reverse situation. This feature is available at no extra computational cost as it was
already computed during the detection phase.
In conclusion, they claimed that SURF proves to be work great for classification tasks,
performing better than the previous methods (SIFT, GLOH), while still being faster to
compute. They stated that SURF should be very well suited for tasks in object
detection, object recognition or image retrieval.
This method has been widely used nowadays and drives us to try this method in object
tracking. We try the code provided by the authors in [25]. We do a simple tracking by
making a bounding rectangle around the object and find the interest points. We store
the interest points as object model. We evaluate the next frame and find the matched
points. We calculate the displacement of those matched points. We move the object
rectangle based on the displacement of those matched points.
With the steps above, we test SURF using our test videos (Figure 2.4). In the first video
(yellow trunk), SURF failed to detect the interest points. In the second video (air
plane), SURF can detect the object and move the rectangle quite nicely. For third
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
9
video, SURF can detect the object in several first frames. But when the object is too far
from the camera and become too small, SURF fails. More over, sometimes the SURF
implementation gives some wrongly matched interest points.
(a) (b)
(c) (d)
(e) (f)
Figure 2.4 SURF Tracker Result. (a) SURF Tracker in first video at frame 1 (b) SURF Tracker in first video at frame 65
(c) SURF Tracker in second video at frame 1 (d) SURF Tracker in first video at frame 95 (e) SURF Tracker in third video at frame 1 (f) SURF Tracker in third video at frame 148.
Object rectangle in the left corner means SURF fails to detect object’s interest points
2.5 Mean Shift Tracking Mean-Shift is a robust method on finding mode in a density distribution of data set[5].
This method has multi-functionality since the density is not only for color distribution,
but also texture, motion, etc [2,5]. This is an easy process for continues distributions
which merely just hill climbing applied to a density histogram of the data[5]. This
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
10
method is efficient compare to standard template matching since it eliminates brute
force search[17]. Those characteristics made Mean-Shift very well accepted by
researchers. That is why we choose this method to review. This method represents the
segmentation object detection according to Yilmaz et. al. categorization[17].
The method can be summarized intuitively as follows [14]:
1) Process is started by taking arbitrary position and size of a window (region of
interest).
2) Find the mean-shift vector.
3) Move the window according the vector so then the center of the window now is
the end point of the vector (the mean).
4) Recalculate the vector inside the current window position.
5) Return to step 3) until the convergence. Convergence here means the window
movement is below threshold or the mean-shift procedure has been carried
out for a particular number of iterations.
(a) (b)
Figure 2.5 Intuitive description of Mean-Shift.[14]
Mean-shift is not meant to be tracking algorithm at the first time [1]. In [2] Comaniciu
et. al. applied mean-shift for discontinuity preserving filtering and image
segmentation. But then, Comaniciu introduce mean-shift to track non-rigid
objects[29].
Bradski in [5] stated that mean-shift calculation can be simplified by considering a
rectangular kernel. A rectangular kernel is a kernel with no falloff with distance from
the center, until a single sharp transition to zero value. This is in contrast to the
exponential falloff of a Gaussian kernel and the falloff with the square of distance from
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
11
the center in the commonly used Epanechnikov kernel. The simplification reduces the
mean-shift vector equation to calculating the center of mass of the image pixel
distribution using image moment
1) Zeroth moment calculation
(2)
2) First moment calculation
(3)
3) Mean search window calculation
(4)
So then practically, the mean-shift tracking algorithm runs as follows[5]:
1) Choose a search window with its characteristics
i. Initial location
ii. Type (uniform, polynomial, exponential, or Gaussian)
iii. Shape (symmetric, rounded, rectangular)
iv. Size
2) Compute the window center of mass using moment
3) Center the window at the center of mass
4) Return to step b until convergence.
This method is good for a single hue object on background which has different color
with the object. The disadvantage is mean-shift only gives the mean position. It does
not give the object’s orientation. In [5], Bradski implicitly implied that Mean-Shift
does not give object size. In [17], Yilmaz denotes that Mean-Shift is not rotation
invariants.
2.6 CAMSHIFT Tracking This is the improvement of famous Mean-Shift method by making the distribution of
the color adaptive to the changing in each frame. The heart of CAMSHIFT is Mean-
Shift. Mean-Shift will give the center position of the rectangle. CAMSHIFT gives not
only the position of the object, but also the size of the object and its orientation[1,5].
The ability of CAMSHIFT to improve Mean-Shit by giving size and orientation of the
object is very important in our case. That is why we choose this method to review. This
method represents the template and density based appearance models according to
Yilmaz et. al. categorization[17].
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
12
The intention of CAMSHIFT actually to develop a real time perceptual user interface
which in this case, the application is tracking human faces [1]. This method is based on
mean-shift method. The mean-shift method is modified so it can be adaptive to
dynamically changing color probability distributions from frame sequences in a
video[1]. This is due to color distribution from frame sequence is changing over time.
CAMSHIFT now is used for computer interface for controlling computer games.
For the need of computer interface, they develop CAMSHIFT so it fulfills some
characteristics:
1) Real time
2) Can be run on inexpensive consumer cameras without lenses calibration
For these purposes, they decide to focus on color based tracking. To track the color
object, they use color histogram.
The CAMSHIFT algorithm can be summarize with these steps [1]:
1) Chose the initial region of interest which contain the object we want to track
2) Make a color histogram of that region as the object model
3) Make a probability distribution of frame using the color histogram. As a
remark, in the implementation, they use histogram back projection method.
4) Based on the probability distribution image, find the center mass of the search
window using mean-shift method.
5) Center the search window to the point taken from step 4 and iterate step 4
until convergence.
6) Process the next frame with the search window position from the step 5.
2.6.1 Color probability distribution and histogram back projection
In order CAMSHIFT can track colored object, it needs a probability distribution image.
They use HSV color system and using only hue component to make the object’s color
1D histogram. This histogram is stored to convert next frames into corresponding
probability of the object. The probability distribution image itself is made by back
projecting the 1D hue histogram to the hue image of the frames. The result called back
projection image. CAMSHIFT then used to track the object based on this back
projection image
Regarding histogram back projection, it is a technique to find probability of a
histogram in an image. It means each pixel of the image is evaluated on how much
probability it has to the histogram.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
13
2.6.2 Mass center calculation The mean location of the probability image inside search window is computed using
image moments. Given that I(x,y) is the intensity of the discrete probability image at
(x,y) within the search window.
1) Zeroth moment calculation using formula (2)
2) First moment calculation using formula (3)
3) Mean search window calculation using formula (4)
From this phase, we can have the center position of the image in every frame. But, this
is not enough since the size of the object can be varied over time. For example, if the
object is moving towards and away from the camera, then the size is changing. This
information can be calculated using second moments which not only give the length
and width of the object, but also the orientation.
Figure 2.6 Summary of CAMSHIFT algorithm.
The gray box is the mean-shift algorithm [1]
Second moments are:
(5)
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
14
The orientation is:
(6)
Then length l and width w from the distribution centroid are
(7)
(8)
With a, b, c are
(9)
(10)
(11)
2.6.3 CAMSHIFT advantages and disadvantages Thus, with all those characteristics, classic CAMSHIFT has advantages along with
disadvantages
Some advantages:
1) Computationally efficient with the performance of real time tracking (30 fps).
2) Invariants to scaling and rotation
3) Ignores image distractors as long as they lie outside the search window.
4) Can deal with occlusion as long as the occlusion is not 100%
5) Insensitive to object deformation [4]
Some disadvantages:
1) Problems with multi hue object. If the object has more than one hue, the
tracker tend to track the most significant object part and leave the small part
untracked. The problem also occurs in the case of complex background.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
15
2) Because it only takes hue component, problems may occur if the object past a
background that has similar colors to the object.
3) Sensitive to changing illuminant
4) In addition, when the object moves so fast that the target area in the two
neighboring frame will not overlap, tracking object often converges to a wrong
object [4]
2.7 Local Binary Pattern Local binary pattern (LBP) is a descriptor which describes each pixel in region by its
texture calculated by relative gray levels of its neighboring pixels. The LBP is a
powerful illumination invariant texture primitive[16]. The histogram of the binary
patterns computed over a region is used for texture description. Figure 2.7 show us
how to calculate LBP using classic LBP and center-symmetric local binary pattern (CS-
LBP).
LBP is a quite recent texture descriptor method which has been developed rapidly by a
lot of researchers and shows some encouraging results in texture descriptor. Beside
CS-LBP that has been illustrated above, there are also Rotation Invariant Volume LBP
(RIV-LBP) and LBP from Three Orthogonal Planes (LBP-TOP) [22]. That is why we
choose this method to review. This method represents the texture descriptor based on
the categorization in Table 2.1.
Based on the shared implementation LBP code in[23], we try to use it as object tracker
in such following simple way. First we make a rectangle that marks the object. We
calculate the LBP histogram using Rotation Invariant Volume LBP (RIV-LBP) and
stored the histogram as the object model. We then we build a search region which is
Figure 2.7 LBP and CS-LBP features for a neighborhood of 8 pixels [16].
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
16
Figure 2.8 Example of LBP calculation[16]
twice larger than the object’s rectangle on the next frame. Then we make a sliding
window with the same size with the object rectangle. We propagate the sliding window
inside the search region. Each time the sliding window propagates, we calculate the
RIV-LBP histogram. We then compare the similarity of the propagated RIV-LBP
histograms with the model RIV-LBP histogram. Histogram that gives best similarity
assumed to be the object. The last step is to move the object rectangle to the sliding
window that gives best histogram similarity. The histogram similarity that we used is
histogram intersection implemented in OpenCV 2.0[24].
(a) (b)
Figure 2.9 LBP Tracker Result in second video.
(a) Object rectangle at frame 1 (b) Object rectangle at frame 33 The result is not what we expected, even for a simple video which has homogeneous
plain background and a rigid fix sized object (Figure 2.9). The object rectangle often
goes to a position which, in visual perspective, is not the best position of the object in
the next frame. So then we do not continue to base our thesis on improving LBP.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
17
2.8 Beyond Semi-Supervised Online Boosting Tracking This method is based on online boosting mechanism in giving weight to every feature.
With this approach, tracking problem is treated as classification problem. In [20]
Stalder et. al. presented a multiple classifier system which split the tasks of detection
(finding the object of interest), recognition (distinguishing similar objects in a scene),
and tracking (retrieving the object to be tracked) into separate classifiers. The purpose
of this splitting is to simplify each classification task.
This method represents the supervised classifiers according to Yilmaz et. al.
categorization[17]. It is one of the most recent methods in this object tracking category
which shows a lot of encouraging results. That is why we choose this method to study
from supervised classifiers category.
For feature selection, the goal of boosting is to minimize the error by selecting and
combining a set of N “weak” classification algorithms into a strong classifier. In [20]
they describe the on-line variant, where the main idea is to perform on-line boosting
on selectors rather than on the weak classifiers directly. A selector holds a set of M
weak classifiers and selects the one with the lowest estimated error. For tracking,
online boosting is used by building initial classifier by taking positive samples from
object and negative samples from background. Then, the classifier is evaluated
exhaustively on the image at time t+1. The resulting confidence distribution is
analyzed and in the simplest case the local maximum is considered to be the new
object position. In order to adapt to appearance changes of the object (e.g. different
illumination) or changed background, the classifier gets updated and the loop repeats.
In supervised online boosting tracker used self-labeled data for updates, but in semi-
supervised online boosting tracking, the updates is using unlabeled data.
In beyond semi-supervised tracking, they proposed to use multi classifier system
(Figure 2.10). For detector, they used offline classifier which purpose is to reliably find
the object of interest. The detector classifier is not updated during tracking. Any kind
of object detector can be integrated in the system as long as it is generic and can be
applied on any kind of scene. For recognizer, they used supervised online classifier.
Updates are performed. The positive training set consists of tracked samples which are
validated by the detector. The negative training set is composed of hard examples
collected in the background image at the time of detection. This allows to distinguish
similar objects in a scene. For tracker, they used semi-supervised on-line classifier.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
18
The confidence map is analyzed via semi-supervised updates to retrieve a stable
maximum.
Figure 2.10 The core classifier system: detector, recognizer and tracker.[20]
They used Haar-like features, histograms of oriented gradients (HOG), and color
histograms. They managed to get 10 fps in common 3.0 GHz PC Dual Core with 2 GB
RAM[20].
Some of the advantages of this method are:
1) Currently one of the state of the arts in supervised/semi-supervised object
tracking method
2) It can distinguish between very similar object. They have given an example to
track a coca cola bottle near another similar coca cola bottle. This can
distinguish which one is the tracked object and which one is not
3) It can track partially or fully occluded object, either by static or dynamic
occlusion.
4) It can track an object which has similar texture with its background.
5) It can do multiple object tracking.
6) Face tracking is also possible with a good result
7) Able to re-track the object.
The weakness is this method works only if the size of the object is unchange in each
frame. If the size change (e.g. move out from the viewer), then it will not detect the
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
19
object. This is very critical lack in our case. The speed is also not high enough for a real
time application.
This method is very promising actually. The authors also share the code [21].
Unfortunately, we have some difficulties in some ways. Firstly is to understand the
code. The implementation of multi classifiers system, which contains a lot of pattern
recognition algorithms, is not so easy to understand. Even though this code contains
only Haar-like feature, which means it is the simple version, this is still hard to
understand. Secondly Pattern recognition is not our main field. So we decided to skip
this. We do not continue to further study nor improve the method.
2.9 Method that We Choose After studying, experimenting and observing some state of the art methods in object
tracking, we decide to choose CAMSHIFT. This decision comes up with the following
reasons:
1) CAMSHIFT has a real time performance while some others not. Our
experiment shows that it reaches 30 fps. This capability is very important for
our case (clickable hypervideo). Hypervideo application needs the tracker to
have real time capability. Semi-Supervised Online Boosting Tracking fails to
achieve real time performance. In our experiments, LBP, KLT and SURF also
fail.
2) Invariants to scaling and rotation while some others not. Scaling and rotation
is not avoidable in a real video application such as hypervideo. This capability
is the next important method characteristic that we need. CAMSHIFT is very
robust for scaling and rotated object. SURF is able to detect rotating object.
Unfortunately it fails to detect highly scaled object. In our case, if the object
goes too small, SURF fails. Semi-Supervised Online Boosting Tracking is
certainly fails for scaled object. Mean-Shift is using fix kernel size and moving
the kernel towards the mean position. The size is not changing over time while
CAMSHIFT is adaptive to changing color distribution over time which makes
it scaling invariants[5]. Mean-Shift is suitable for translational and scaling
motion but not suitable for rotational motion[17]. KLT is only for affine
motion but not suitable for translational, rotational nor scaling motion[17].
3) From all of our test videos, CAMSHIFT is one of methods that can track the
object with reasonable result. It can track the object but starts to fails after
some frames. For the first video, the tracker drifts at frame 145. It drifts at
frame 573 and 61 for third and forth video consecutively. It succeed to track
the object in the second video but only some object parts (propellers). With
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
20
the procedure provide in 2.3 and 2.7, KLT and LBP fail for all of the videos.
SURF succeed to track the object in second video. Nevertheless it directly fails
to detect object interest points in the first video and forth video. It succeed to
track the object in third video for some frames, but fails when the object is
getting too small. We managed to get Beyond Semi-supervised Online
Boosting tracking simple source code and try it to our test videos but not the
full version one. This simple source code contains only Haarlike features
without color histogram and Histogram Oriented Gradients (HOG) features.
With this simple, it can track the object in the second video successfully. It
certainly fails for third test video as stated by the author in [33].
4) Insensitive to object deformation [4]. Changing object shape is not a problem
for CAMSHIFT.
5) We believe that we can improve the some critical disadvantages of CAMSHIFT
within the master thesis duration. We eliminate Semi-Supervised Online
Boosting Tracking since mostly it deals with Pattern Recognition while our
main field is image analysis and processing with specialization in color
features (Color In Informatics in Media Technology). We believe that we can
improve this later method to meet our requirements but the time to spend for
improving this method is not feasible within the master thesis duration.
6) Actually CAMSHIFT is closed to Mean-Shift. But since CAMSHIFT is an
improved version of Mean-Shift with the capability of adaptive to the color
distribution changes during time, we believe that CAMSHIFT is a better
starting point than Mean-Shift.
2.10 CAMSHIFT/Mean-Shift Improvement in Literatures Before we start to improve CAMSHIFT, we study some literatures to know what the
researchers have done to improve CAMSHIFT/Mean-Shift. After that, we will precise
what CAMSHIFT’s improvement will be done in this thesis.
2.10.1 Mean-Shift tracking combined with texture histogram Ning et. al. in [8] proposed a joint color-texture histogram to represent an object and
then applying it to the mean-shift framework. The purpose is to improve tracking
accuracy and efficiency by adding conventional color histogram features with texture
features, which is in this case, the local binary pattern (LBP). The idea is to improve
the object model by modeling every pixel in the object by, not only color information,
but also the texture value. Ning[8] combine mean-shift tracking with LBP because of
LBP’s fast computation and rotation invariants. They claimed the proposed method
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
21
performs much better than the original color based method with fewer iteration
numbers, especially in tracking objects that have similar color appearance to the
background.
We have described LBP in 2.7. We try once again the LBP with an implementation
code in [30] in the following ways:
1) We use the code to have LBP image as the result using our test video (Figure
2.11).
2) Soon after that, we calculate LBP texture histogram using RIV-LBP[23].
3) We do back projection to get probability image based on texture.
Unfortunately, the result is beyond our expectation. Back projection should give high
intensity (probability) to the pixel that has similar characteristic regarding the model
histogram, but the result seems against that (Figure 2.11). Then we decide not to use
texture to improve CAMSHIFT.
(a) (b)
Figure 2.11 Comparing LBP Image and its back projection image. (a) LBP Image (b) The Back projection image. The small toy intensity is low.
2.10.2 CAMSHIFT and Mean-Shift combined with interest points Another way to improve CAMSHIFT is to combine with interest point feature. Interest
point feature is well known with its invariant to illuminant, rotation and scaling. This
advantage is very useful to fill the disadvantage of color histogram in CAMSHIFT
which is sensitive to illuminant change.
One of the implementation regarding this method is done by Ganoun et. al. in [10].
The aim of their research is to widen the field of CAMSHIFT method so that it can be
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
22
applied to gray image sequences. It tried to do so by improving the object model by
adding feature points information to the color histogram.
In principle, they measure the displacement of search window by calculating the
displacement of matched interest points. Then they use the CAMSHIFT to determine
the final object position.
The method can be summarized by these following steps[10]:
1) Calculation of the object model using color histogram and feature points
2) Calculation of the temporary object displacement by the feature points
matching between image It of the sequence at instant t and image It+1 at
instant t+1.
3) Determination of a reduced search window positioned on the centre Ctemp
calculated at step 2.
4) Calculation of the probability image in the search window.
5) Application of the Mean-Shift algorithm to determine the new object centre.
6) Actualization of the object model.
One other implementation of this method is by Qiu Xuena et. al. in [9] which is using
SIFT as the interest point descriptor combined with spatial features to create
probability distribution of the tracked object. They use the spatial features in order to
increase robustness when the tracker deals with occluded situation. They claimed their
method can handle object scale, orientation, view, and illumination changes. It could
also deal with the camera movement mode.
The SIFT was added with the purpose of increasing the robustness of CAMSHIFT
when dealing with the condition of occlusion and object has similar color with the
background. Meanwhile color feature can help SIFT segment the target. In addition,
the spatial feature can handle the object occluded situation. The probability
distribution of the tracked object is represented by linear weighted combination of the
kernel function of the above three features.
The entire algorithm can be summarized as follows[9]:
1) Define a rectangle on the region of interest in the first frame;
2) Compute the color histogram of this region, at the same time extracting SIFT
features within this region;
3) In the second frame, let the previous location be the center of the interested
region and the size of this interested region is one quarter the size of the
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
23
frame. In this interested region, also let each pixel be the center of the sub-
window and the size of the sub-window is the same as the target region;
4) For every sub-window calculate probability density;
5) Determine the final tracking window.
(a) (b)
Figure 2.12 SURF and CAMSHIFT 1. (a) Yellow trunk is tracked by CAMSHIFT in frame 33
(b) Air plane is successfully tracked by SURF in frame 300
(a) (b) (c)
Figure 2.13 SURF and CAMSHIFT 2. (a) Small scaling toy is successfully tracked by SURF at frame 2 (b) SURF sometimes match
wrong points (c) SURF fails completely when the object skews at frame 360
As we know that there is publicly available interest point detector and descriptor which
is widely used and perform very well called SURF. So then we try SURF once more
time with purpose to improve CAMSHIFT. We use SURF implementation in OpenCV
2.0 [24] and compare with the previous experiment in 2.4. We do it with the following
ways:
1) Make a rectangle that covers the object in the first frame
2) Calculate the conservative CAMSHIFT color histogram of the object inside the
rectangle. Store it as object’s color model
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
24
3) Use SURF to inside the rectangle to detect interest points of the object. Store it
as object’s interest points model
4) In each frame we use both models to track the object.
The result is more or less the same to what we have done in 2.4, good for some cases,
but not good for some other cases. For the first test video (yellow trunk), SURF simply
fails to track the interest points while CAMSHIFT can track the object quite perfectly.
In our second test video (Figure 2.12), SURF helps very much with giving quite precise
object’s rectangle while CAMSHIFT fails to cover the whole object. CAMSHIFT only
gives ellipse that covers both air plane’s propellers. So for these videos, both methods
complement each other which is good.
For the third video (Figure 2.13), CAMSHIFT succeed for 275 frames but then drift and
track the background which has similar color with the object. SURF can manage to
track the object also for 105 frames. But when the object’s size goes to small or skews,
SURF fails. SURF detector can not recognize the object anymore even if we update the
interest points object model by taking the last successfully tracked object.
Based on these result, we decide not to choose interest points as the improvement of
CAMSHIFT.
2.10.3 CAMSHIFT improvement using new HSV model
In [4], G. Tian et. al. propose an improved H, S, V combined one-dimensional color
histogram model for CAMSHIFT object tracking. The purpose is to improve
CAMSHIFT tracking accuracy in the condition where color distribution of the object
and background is similar or even in complex background. This method is based on
Munsell 3D color coordinate system which has been confirmed suitable for human
visual system[4].
Based on optical theory that each color has corresponding wavelength, they quantify
the H, S, V into different ranges. In summary, the processes are:
1) Divided color scope: Base on the human visual ability to distinguish color, they
divide H color space into 8 part, S color space into 3 parts, V color space into 3
parts.
2) Quantify the value of H, S, V with different intervals: based on the human
subjective color perception on different scopes of the colors.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
25
3) Building combined one-dimensional feature vector from 3 color component:
G = HQs Qv + SQv + V
They choose Qs = 3, Qv = 3, where Qs and Qv are S component and V
component’s weight.
Therefore: G = 9H + 3S + V
They showed some result which is quite encouraging. We have not tried this method
but we put it on our algorithms list to improve CAMSHIFT.
2.10.4 CAMSHIFT improvement using hue-distance and saturation features
J. A. Corrales et. al. in [3] use different approach. Their purpose is to improve
CAMSHIFT in tracking objects in dynamic backgrounds with similar hue values by
using hue and saturation color component but modifying the hue component. Instead
of using hue, they use hue-distance.
The hue-distance is a function which represents each hue value H as a distance from a
reference hue value Href. The following distance function is used instead of the hue
component:
(12)
The hue reference Href is the hue value which has the highest frequency in the
histogram h(x) obtained from the histogram calculation in the first step of CAMSHIFT
algorithm:
(13)
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
26
(a) (b)
(c) (d)
Figure 2.14 CAMSHIFT with new HSV model. (a) The result image using CAMSHIFT (b) The back projection image using CAMSHIFT (c) The
result image using CAMSHIFT with new HSV model (d) The back projection image using CAMSHIFT with new HSV model[4]
Firstly, a histogram of the hue component of the standard HSV model is calculated in
order to obtain the hue reference Href using (13). Afterwards, the histogram is re-
computed using the hue distance in (12) and it is used as target distribution. All the
following images are transformed to the HSV model but using the hue distance instead
of the hue component.
The use of the hue component to obtain the probability distribution image using
classic CAMSHIFT is not sufficient when there are elements in the background which
have similar hue values to the target object. In this case, the CAMSHIFT algorithm
may include wrongly in the search window elements from the background.
Two histograms are calculated: one for the hue distance hd(H,Href)(x) and another
one for the saturation component hS(x). In the step 3 of the algorithm, the histogram
hd(H,Href)(x) is used to obtain the back-projection Bd(H,Href)(x, y) of the hue
distance channel, and the histogram hS(x) is used to obtain the back-projection BS(x,
y) of the saturation channel. These two back-projections are combined according to
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
27
the following equation in order to create a final probability distribution image B(x, y)
which is used by the Mean Shift algorithm to find the center of mass:
(14)
(15)
Equation (15) removes from the hue-distance back projection those pixels whose
saturation channel does not match the saturation values of the tracked object. Most of
the background pixels, whose hue values are similar to the tracked object, can be
removed because their saturation values are different. Therefore, only the pixels with
similar hue and saturation to the object are considered.
We have implemented this method according to the procedure written above but with
modification that is V color component included. We believe that value is also
important to take into account in the model. The result is encouraging (Figure 2.15).
This has similar purpose with the previous new HSV model method.
2.10.5 CAMSHIFT with improvement of object localization These methods try to improve CAMSHIFT object’s model by improving object
localization method.
Foreground extraction • This method tries to increase the robustness of tracking by giving more weight to
the center of the rectangle by putting very high positive weight and giving low
negative weight to the parts beyond the range[12]. The range is given as circular.
Illustration is given in Figure 2.16. The formula is
(16)
Where x,y is pixel coordinate, hi is a any value desired by user to filter out
background color clusters.
• This still gives problem because, in real applications, many objects are not in
circular shape. For example, if the object is an elongated object (Figure 2.17), there
will be some background information taken into the object model or some object
parts are not taken into account to the object model.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
28
Weighted and Ratio Histogram This method has, in some way, similar with foreground extraction. For pixels inside
the object search window, it gives higher weight to the pixels near the center and gives
0 to the pixels far from the center in the model histogram calculation process [7].
They used ratio histogram which gives lower weights to the pixels outside the object
search window.
We decide not to choose these methods since it can not effectively localize the object.
We propose another method which will be describe in 3.1.
2.10.6 CAMSHIFT improvement using adaptive background (ABCShift)
The aim of this method is to track robustly in two situations where CAMSHIFT fails;
firstly with scenery change due to camera motion and secondly when the tracked
object moves across regions of background with which it shares significant colors.[11]
It tries to improve the tracker by modeling the background based on Bayesian
probability model.
In summary, the algorithm is[11]:
1) Identify an object region in the first image and train the object model.
2) Center the search window on the estimated object centroid and resize it to
have an area r times greater than the estimated object size.
The centroid position can be calculated using:
(17)
where i is index of all pixels in the search window and ci is the color of pixel i.
Then the position of the centroid is
(18)
where (xi, yi) is the position of pixel i in the search window. At the end of the
iteration the center of the search window is shifted to the new position (xc, yc)
and the procedure is repeated until two consecutive center positions are
within ε of each other.
3) Learn the color distribution, P(C), by building a histogram of the colors of all
pixels within the search window.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
29
Figure 2.15 CAMSHIFT improvement with hue-distance saturation features.
(a) Tracked image with selected object in rectangle (b) hue distance back projection (c) Saturation back projection (d) Hue distance – Saturation combination back projection[]
Figure 2.16 Foreground extraction.
FEM applies high positive value to the pixels near the center and applies negative values to the pixels toward the edges of the object region[12].
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
30
Figure 2.17 Sample of elongated object.
Elongated object is not suitable using foreground extraction method.
4) Use Bayes’ law, to assign object probabilities, P(O|C), to every pixel in the
search window, creating a 2D distribution of object location.
(19)
where P(O|C) denotes the probability that the pixel represents the tracked
object given its color, P(C|O) is the color model learned for the tracked object
and P(O) and P(C) are the prior probabilities that the pixel represents object
and has the color C respectively.
5) Estimate the new object position as the centroid of this distribution and
estimate the new object size (in pixels) as the sum of all pixel probabilities
within the search window.
6) Repeat steps 2-6 until the object position estimate converges.
7) Return to step 2 for the next image frame.
They shows some videos that confirm their claim that ABCShift gives good result
though the object is passing through a background which has similar color with it.
They implement this method in robotics. The authors show some encouraging results
but not multiple object tracking.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
31
2.10.7 CAMSHIFT improvement by background subtraction This method tries to improve CAMSHIFT tracking robustness in complex background
by modeling the background and subtracting it from every frame sequence.
Background subtraction has been widely used in object tracking in the case of static
background. The basic principle is to model the background and subtract the frame
sequences by that background. The subtraction result more or less is the moving object
inside the frame. This method works in the assumption that the object is moving.
Otherwise the object will be identified as background.
Some methods of background subtraction are using average, median, code book and
Gaussian mixture model. Those methods implementation are also available publicly.
Average background subtraction code is available in [31], code book is available in
OpenCV 1.0 [24] or higher, Gaussian mixture model code is available in [32]. For
median background subtraction, we develop it by modifying the average background
subtraction code.
We have tried three of the methods combined with CAMSHIFT and here are the
results in frame 30 in scaling small toy test video (Figure 2.18).
Figure 2.18 Background subtraction in static background.
First row is the result image, the second row is the foreground image. First column is using average. Second column is median . Third column is using Gaussian mixture model
For the conclusion, background subtraction helps CAMSHIFT very much in static
background videos. More over, with the help of background subtraction, we can
extract object contour. Unfortunately, in dynamic background videos, background
subtraction is not helpful. This is because the movement in the background is detected
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
32
as foreground. The result of using background subtraction in airplane test video frame
30 can be seen in Figure 2.19
Actually, we tried this method because we the previous improvement had difficulties in
solving the challenging tracking problem in third video (Figure 2.19). We realized that
this video has static background. This information is very useful because usually
background subtraction method works very well in this kind of condition. Then we
make sure by trying background subtraction method which in fact helps CAMSHIFT to
track the object in third video. After that we think that we can use this method if we
have a technique to detect whether a background is static or dynamic. When we detect
the background is static, then we use background subtraction method otherwise, the
background subtraction is not used. But this will be another research which will take
quite amount of time. We do not continue to use this method in improving CAMSHIFT
because hypervideo can not be limited to only video with static background and
background subtraction’s limitation in dynamic background videos.
2.10.8 The CAMSHIFT improvement method that we choose After studying, experimenting, and observing we found that there are still some
problems with the previous improvement. For improving CAMSHIFT using LBP, we
have tried with our simple procedure as stated in 2.10.1. Unfortunately, the result is
beyond our expectation so we decide not to use texture to improve CAMSHIFT.
Combining interest points with CAMSHIFT also gives unsatisfying result for our case.
We do not choose CAMSHIFT improvement using new HSV model because the bins
ranges rigidity. CAMSHIFT object model improvement with foreground extraction and
weighted histogram fails to exclude background information into the model or fails to
include all object information into the model. ABCShift is actually improve
CAMSHIFT significantly especially in the condition where object’s colors are similar
with background’s colors. Nevertheless this method is closer to Pattern Recognition
method which is not our main interest field. We want to try another approach. We do
not use background subtraction as its effectiveness only occurs for video with static
background. The only method that we adopt is the use of hue-distance histogram
which is explained in 2.10.4. For this method also we slightly modify the method by,
not only using hue-distance and saturation histogram, but also value histogram.
2.10.9 The more specific aim of the master thesis Based on discussion above, we decide to improve some critical disadvantages of
CAMSHFIT that are not fully solved by the researchers or not meet our aim for this
thesis. From this conclusion, we define our specific aim of the master thesis as:
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
33
1) Improve the robustness of classic CAMSHIFT for multihued object tracking.
2) Improve the robustness of CAMSHIFT for the condition where object’s colors
are similar with background’s colors.
3) Improve CAMSHIFT capability to do multiple object tracking. We did not find
any literature stating improvement of CAMSHIFT so that it can do multiple
object tracking.
4) Speed and illuminant change are not our main concern. In [], Colantoni et. al.
stated that using Graphical Processing Unit (GPU), the speed performance of
an image processing task (e.g. object tracking) can be increased until 10 times
faster. As our main field is image analysis and processing, we focus our work
in improving the robustness while speed will be improved in our future works
or given to the person who is expert in this GPU area.
To achieve our aim, we adopt hue distance histogram idea as one of ways to improve
CAMSHIFT for the condition where object’s colors are similar with background’s
colors. For object localization, we propose another method which will be described in
3.1. For increasing the robustness of tracking, we also propose another method which
will be explained in 3.2 and 3.6. Multiple object tracking will be easy to do if we can
solve the first two problems.
Figure 2.19 Background subtraction in dynamic background.
First row is the result image, the second row is the foreground image. First column is using average. Second column is median . Third column is using Gaussian mixture model
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
34
3 Proposed Method
In this thesis, we propose several ways to improve CAMSHIFT. The use of multi-
dominant color object localization and track the dominant color object parts separately
are the key methods to improve CAMSHIFT. This section will describe the proposed
method. Details of implementation such as parameters values will be describe in
Implementation section.
3.1 Object Localization Object rectangle is the most common method to do object localization. User makes a
rectangle to the object. This is simple and easy to use, nevertheless problem may occur
because most of the time, the object is not exactly in rectangle shape. This makes some
background’s information is included in object model. If this happens, drifting often
occurs. The tracker is not robust in tracking the object.
Another method is using points as object boundary. The selection continue with
making line between those points consecutively. This method can give exact region of
the object. This method is suitable for simple object shape. The problem with this
method is that it is not practical for object with complex or irregular shape (Figure
3.1). Even for object with simple circular shape, this method is not so practical.
Figure 3.1 A sample of complex shape object
Some recent methods [7][12] gives not precise localization of the object. They failed to
give the exact information of the object. This drives us to create a more sophisticated
method in object localization.
3.1.1 Preprocessing We propose object localization by combining mean-shift segmentation and region
growing. Mean-shift is a preprocessing before the object parts selection. It applied to
segment each part of the object and make them homogeneous enough to be chosen
easily. Mean-shift segmentation smoothen the image while preserving
discontinuity[2]. This is actually happens until some level. Because if the object is in
front of very similar color background at the localization phase, then the object will be
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
35
merge with the background. We need this preprocessing step because using region
growing itself is not enough even though we increase the color tolerance. Figure 3.2
gives simple illustration for the case. As remark, this preprocessing is needed only for
the frame in object localization step.
(a) (b)
Figure 3.2 Object Localization using only region growing (a) before selection (b) after selection
(a) (b)
Figure 3.3 More precise object localization with only a single click. (a) before selection (b) after selection
Actually for preprocessing step, one other alternative is using K-Means segmentation.
One can segment the object and background by selecting some colors as means and all
pixels will be classified into those means. But this method will be not practical because
we have to choose, not only the object colors, but also the background colors. If there
are a lot of colors in the background, which is very common in every day life video as
well as in hypervideo, practicality will be an issue. For examples are the third and forth
test video.
To be more adaptive to user’s need, we designed the segmentation is tunable using text
file. User can change the segmentation parameters value that makes all object appears
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
36
to be selected. Figure 3.4 shows that we can tune the preprocessing segmentation
parameters as well as color tolerance for region growing, number of bins, etc.
Figure 3.4 Text file configuration to tune the parameters
3.1.2 Image color transformation The next step is we transform the image into HSV color space. We choose HSV because
this model is based on the human perception of eyes, which use the Munsell three-
dimensional color coordinate system to present. Munsell color space has been
confirmed suitable for human visual comprehending by human eyes.[4].
3.1.3 Object Selection Next step is the object selection. After mean-shift segmentation, user can choose the
object by clicking each object parts. The clicks’ positions become set of “seed” points
that have specific properties. From these seed points, the region grows by appending
their neighbors which have similar properties to the seeds [18]. We also give some
tolerance values so that the seeds can grow further more until reaching the edges of
object parts. Those selected object parts are considered as containing the dominant
colors of the object which will be tracked in the tracking phase.
3.1.4 Minimum and maximum values storing Each time an object part is selected, we store the minimum and maximum value of the
object parts color component. This means we store the hmin, hmax, smin, smax, vmin,
vmax. These values are needed to make color mask which will be describe in section
3.3.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
37
One advantage with the proposed method is no surrounding background information
added to the model. This makes the tracker more robust and avoids drifting problem.
Another advantage is for some less object part case, this method is more practical. For
example, for the yellow trunk which has only one hue, the selection is just clicking any
part of it, and it will be chosen entirely (Figure 3.3). Even though the shape of the
object is complex and irregular, there will be no problem as long as it has
homogeneous color.
3.2 Object Modeling The next step is object modeling. We use color histogram to model the object. We try
to improve CAMSHIFT using only color information of the object. No background
information used in the process, only the object’s information included.
We use HSV color space. In the original implementation of CAMSHIFT, Bradski was
using only hue component. This leads to a problem in a condition where the object
passes through a background which has similar hue to the object. Some other methods
are using only hue-saturation components. This might be sufficient for some cases, but
for some other cases, hue-saturation component are not enough to distinguish the
object from its background. Value color component often gives good discrimination
between object with its background. That is why we use all three components and
make the histogram of them.
The use of those three components apparently is not enough to distinguish the object
from its background. Tian et. al. in [4] proposed a new HSV color model to describe
the object. They claim to improve the CAMSHIFT tracking. Corrales et. al. in [3]
proposed to use hue-distance histogram instead of hue histogram which we have
described in 2.8.4. They combine it with saturation component.
In this thesis, we combine those ideas so then we use hue-distance, saturation and
value histogram to model the object. This gives a better discrimination.
We do quantization to each color histogram by using less number of bins. The number
of bins for each component is the next important factor. Based on our experiments, the
use of 30 bins for hue-distance component, 9 bins for saturation and 6 bins for value
gives good result.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
38
3.3 Making Color Mask Color mask is made for each object part. It is made based on the minimum and
maximum values taken from the object localization step (3.1.4). Each pixel in every
frame will be evaluated according to those minimum and maximum values. If a pixel is
inside the minimum and maximum values range, then it will be given a value 1.
Otherwise, it will be given value 0.
(a) (b) (c)
Figure 3.5 Color mask illustration
(a) Original frame (b) Selected object (airplane body) (c) Color mask
3.4 Segmentation For next frames, mean-shift segmentation is carried out. This will ease the
differentiation of the object from the background. With small spatial and color ranges,
mean-shift will smoothen the image and removes noise while preserving the
discontinuity. Mean-shift will merge some close-color background areas into one
region which is hopefully has color information beyond the object histogram. Figure
3.6 shows some yellow color noises in the background which is close to object color.
Those noises are merged with neighboring pixels and assigned by color information
which is different with object’s color information.
(a) (b)
Figure 3.6 Segmentation for smoothing and noise removal of third test video (a) Original frame (b) Segmented frame
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
39
3.5 Histogram Back Projection Histogram back projection means evaluating each pixel in the frame sequence based
on the histogram model we have made in the object modeling phase. Before we do
back projection, we put the color mask to the frame to pass only pixels that satisfy the
object color ranges. We then do histogram back projection to hue-distance histogram,
saturation histogram and value histogram. Each histogram back projection will give a
back projection image as the result which contains the probability of each pixel in the
frame according to the histograms. We then combine all back projection image into a
single back projection image using AND operator. This single back projection image is
the last input for the tracking (Figure 3.7).
Figure 3.7 Histogram Back Projection of first test video
3.6 Tracking Good localization is a first step towards good tracking. But that does not mean we will
have 100% accurate tracking. If we choose all the object parts as one part and take
HSV color histogram on it, it will be difficult to track the object if it passes through a
background which has color in the range of the object color range.
One good way to improve the robustness of the tracking is doing “divide and conquer”
which means we split the problem of the tracking itself so then it will be easier to solve.
To split the problem, we propose to track the object parts separately. Object parts
represent the dominant color parts of the whole object. Each object part is modeled
using hue-distance, saturation and value histogram and then track it. The whole object
rectangle is the maximum rectangle of each object rectangle (Figure 3.8). The whole
object center position is the center of the maximum rectangle. Maximum rectangle is
defined as the smallest possible rectangle that covers all rectangles inside it[5].
We also propose a mechanism to detect if the tracker lost the object and how it deals
with the whole object rectangle. If the next tracking rectangle area of an object part is
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
40
equal to 0, that particular object tracker is stated as lost. If an object tracker is lost,
then the rectangle will not be taken into account in the whole object rectangle. The
whole object rectangle will only consider the “surviving” tracking rectangle.
Figure 3.8 Maximum rectangle illustration.
Maximum rectangle (thick red) of body, trouser and shoes’ rectangle (thin blue).
The proposed method can be summarized in the following steps (Figure 3.9) :
1) Do mean-shift segmentation to the first frame so the object parts will be easier
to choose.
2) Transform the frame into HSV space
3) Choose each object by clicking it and do region growing starts from the click
position.
4) For each object part (In red zone, Figure 3.9):
a. Take minimum and maximum value of hue, saturation and value for
each object part.
b. Calculate hue-difference, saturation and value histogram
c. Make a color mask by evaluating each pixel in the frame based on the
minimum and maximum values taken from step 5
d. Do mean-shift segmentation to the image for smoothing the image
and reducing noise.
e. Do histogram back projection from the histogram in step 6 and
combine all back projection images.
f. Track the object based on the combined projection image and stored
the new tracking window information.
g. If the tracker lost, leave it. Otherwise go to step 5
5) Find maximum rectangle of each object parts rectangle.
6) Return to step 4c using new tracking window from step 4f.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
41
Figure 3.9 The proposed method’s schema
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
42
4 Implementations and Experiments
In this section, the implementations and experiments will be explained thoroughly.
The implementation environment, library used, and some parameters used will be
described.
4.1 Implementations In the implementation phase, OpenCV 2.0 library is used. We use Microsoft Visual
Studio 2005 as development tools.
For mean-shift segmentation, we use mean-shift segmentation implementation in
OpenCV library. As proposed by Bradski et. al. in [5], we use hs = 20 and hr = 40 and
maximum level = 2 which is good for an image with dimension 640 x 480. In the case
of image with dimension 1280 x 720 (e.g. first test video), these parameters also give
good segmentation result.
For region growing, we use Flood Fill method which is also available in OpenCV 2.0
library [24]. Flood fill will append each neighboring pixels based on the color
characteristics. If a neighboring pixel has color characteristics that are close to the seed
(i.e. within the tolerance ranges), the pixel will be appended. We use 20 as tolerance
for hue and value color component and 40 for saturation.
When seed is growing, it marks the area with perfect white (H=255, S=255, V=255).
This perfect white area will become a mask to find the extreme value of the area and
calculate the object part histogram. The extreme values are hmin, hmax, smin, smax,
vmin, vmax. These values correspond to hue minimum, hue maximum, saturation
minimum, saturation maximum, value minimum, and value maximum. These extreme
will be used for making color mask of each object part.
In histogram calculation, we use 30 bins for hue and hue-distance component, 9 bins
for saturation and 6 bins for value. These parameters come up from experiments and
observations which give good result. We keep the hue histogram. After histogram
calculation which takes hue-distance, saturation and value color component, we make
a threshold to hue histogram. This threshold is important to retain close hue pixels
and remove unwanted far hue pixels in the back projection. First we make a threshold
of 255 for the histogram. Any bin that has value above the threshold will be stated as
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
43
peak. Number of close hue is 70% of number of peaks. We round it after because we
need an integer number. For example, if we have 10 peaks, then the number of close
hue will be 70% x 10 = 7. This means, only hue that has distance below 7 to the hue
reference that will be taken into the back projection. All other hues will be discarded
(Figure 4.1). If no histogram bin exceed 255, then the hue-distance threshold will be
set into 1.
Figure 4.1 Hue histogram of air plane body.
Number of peaks = 10, Hue-distance threshold= 7
We smoothen the frames by mean-shift segmentation starting from second frame. We
use spatial range hs=5, color range hr = 20, and maximum pyramid level = 2. With
this parameters, noise is reduced significantly while the discontinuity (e.g. edge) is still
preserved.
Implementation of creating color mask is simply evaluates each pixel in the current
frame based on extreme (minimum and maximum) values that we have taken in the
previous step. While histogram back projecting, CAMSHIFT tracking, and maximum
rectangle calculation, we use the available functions in OpenCV 2.0 library [24].
4.2 Experiments Setting All experiments are carried out in a laptop with specification of AMD Turion X2 (dual
core) 1.6 GHz with 2 GB RAM.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
44
5 Results and Discussions
This section will provide the description of experiment results continued with
Discussions about them.
5.1 Results
5.1.1 First Experiment Results In the first video, our proposed object localization method shows its powerfulness.
With a single click, the object is exactly selected without any surrounding background
included. Surrounding background means background that is spatially closed to the
object. This will help to make a robust histogram and robust tracking. The object can
be tracked perfectly though there is object shape change and orientation change. There
is partial occlusion occurs in the middle of video but the tracking can deal with it
without any problem. The average frame rate is 2.4 frame per second (fps).
This shows us that for a single hue object in front of a very distinct color background,
our object localization method works very well.
Figure 5.1 First video result with the proposed method.
First column is the frame, the second column is the back projection image, the third column is the hue histogram with 30 bins at frame 33.
If we use the classic CAMSHIFT, we have to configure the parameters for hue,
saturation and value manually. These parameters are used to make a color mask.
Every pixel that fits these parameters will be assigned 1, otherwise it will be assigned 0.
In the implementation example of CAMSHIFT in [24], the hue parameters is set to
hmin=0 and hmax=180 which mean takes all possible color types. For saturation, it is
set to smin=30 and smax=256. For value, it is set to vmin=10 and vmax=256 (Table
5.1).
We made some tuning to these parameters to get the best result using classic
CAMSHIFT. The tuning is carried out on saturation and value parameters. In the first
video, we set the smin=50. In the first experiment for this video, we set vmin= 150.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
45
Extreme Value
Minimum hue (hmin) 0
Maximum hue (hmax) 180 (maximum hue in OpenCV implementation)
Minimum saturation (smin) 30
Maximum saturation (smax) 256
Minimum value (vmin) 10
Maximum value (vmax) 256
Table 5.1 Default parameters of classic CAMSHIFT in [24]
(a) (b) (c)
Figure 5.2 First video result with classic CAMSHFT at frame 33. (a) The frame (b) back projection image (c) the hue histogram with 16 bins.
(a) (b)
(c) (d)
Figure 5.3 Object localization comparison (a) Object localization using proposed method (b) Object localization using classic CAMSHIFT
(c) Hue histogram using proposed method (d) Hue histogram using classic CAMSHIFT
The result shows the yellow trunk can be tracked successfully. Unfortunately, it starts
drifting from frame 145. The tracker is starting to track the background from that
frame. This is due to the problem of object localization. In classic CAMSHIFT, we use
rectangle to localize the object. With rectangle, there is big possibility that some
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
46
background information will be taken into the object model. As we can see from Figure
5.3(b), there is background inside rectangle which will be included into object model
(hue histogram). That is why the blue bin is filled (Figure 5.3(d)).
5.1.2 Second Experiment Results For the second video (Figure 5.4), the localization is done by 3 clicks. First is the air
plane body and wing, the second and third are for the propellers. Each of the objects
will be tracked separately. The result will be the whole object rectangle (search
window) which is the maximum rectangle of each object parts’ rectangle. The
experiment shows that the object parts and the whole object can be tracked
successfully. The whole object rectangle is very stable in bounding the air plane. Even
when there is cloud distraction, the tracker can still track the object (Figure 5.4 (a)).
The average frame rate is 1.15 fps.
Mean while, if we use classic CAMSHIFT, problem occurs. In CAMSHIFT, it takes
whole range of hue which is in the OpenCV implementation from 0 to 180 and
maximum saturation (smax) = 256. After selecting the whole air plane, the tracking
ellipse goes very large when we set the value maximum (vmax) into 150 (Figure 5.5 1st
row). While if we use vmax=100, the tracking ellipse goes to the island in the
background (Figure 5.5 2nd row). Finally, if we use vmax = 70, the tracking ellipse
covers only the propellers (Figure 5.5 3rd row). This can be explained using hue
histogram and back projection image.
Back projection image is the projection of the hue histogram to the hue image of the
frame sequences. In the case of vmax=150, we have hue histogram which is close to the
background color characteristic. That is why the background gives very high intensity
in the back projection image. More over, the object gives very less intensity in the back
projection image. That is why the tracking ellipse covers almost whole background. In
the case of vmax=100, some parts in the background gives more intensity than the
object. That is why, when the object is passing through that background, the tracking
ellipse jumps into that background parts. In the case of using vmax=70, the tracking
ellipse is robustly track the object. But unfortunately, it tracks only the propellers. It
does not track the whole air plane. We also try to use different minimum and
maximum saturation values. But this does not help to improve the tracking.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
47
(a) (b)
(c) (d)
Figure 5.4 Second video result with our proposed method. (a) Tracked image passing through cloud at frame 290 (b) Air plane body back projection image
at frame 10 (c) Left propeller back projection image at frame 10 (d) Right propeller back projection image at frame 10
Figure 5.5 Second video result with classic CAMSHIT.
First column is the frame, the second column is the back projection image, the third column is the hue histogram with 16 bins. First row using vmax=150 at frame 95, Second row is using
vmax=100 at frame 325, Third row is using vmax=70 at frame 29
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
48
Extreme Values
Minimum hue (hmin) 0
Maximum hue (hmax) 180 (maximum hue in OpenCV implementation)
Minimum saturation (smin) 50
Maximum saturation (smax) 256
Minimum value (vmin) 10
Maximum value (vmax) 150 / 70
Table 5.2 Tuned CAMSHIFT parameters for test video 1 and test video 2
Extreme Values
Minimum hue (hmin) 0
Maximum hue (hmax) 180 (maximum hue in OpenCV implementation)
Minimum saturation (smin) 30
Maximum saturation (smax) 256
Minimum value (vmin) 10
Maximum value (vmax) 70
Table 5.3 Tuned classic CAMSHIFT parameters for test video 3
(a) (b)
(c) (d)
Figure 5.6 Third video result with the proposed method. (a) Tracked image when the object skewed at frame 639 (b) Yellow body part back projection
image at frame 10. (c) Blue trouser back projection image at frame 10 (d) Orange shoes back projection image at frame 10
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
49
This is one of the main problems of classic CAMSHIFT method. When it deals with
multihued image, it often drifts or it tracks only some object parts. We also have to
specify the parameters manually for different kind of videos. This makes practically
uncomfortable.
5.1.3 Third Experiment Results In the third video, there is a problem occurs which is the head part is merge with the
background by the mean-shift segmentation method so then it can not be selected.
This is the problem if in the localization phase, the object is in front of a background
which has similar color to the object. We select the body, trouser and the shoes. Apart
from localization lack, our proposed method still can track the object nicely (Figure
5.6). The shoes can be tracked until frame 855. The tracker lost because the shoes is
getting to small to detect. The trouser can be tracked until frame 105. When it goes far
away from the camera, the color is getting darker which is hard to detect the hue. The
body can be tracked successfully until the end of the video. The average frame rate is
1.96 fps.
If we use classic CAMSHIFT method, we found some problems. As we have mentioned
in the second video, the problem with multihued image happens again here. We do the
same experiments configuration which bring us to this result. The best result is shown
in Figure 5.7. We do not change the saturation minimum value as we have tried to vary
it, it does not give much influence to the result. We use the default value smin=30. The
result shows us that as soon as the object passing through the background which has
similar color, the tracker drifts and start tracking the background. (Figure 5.7).
5.1.4 Forth Experiment Results For the forth video (Figure 5.8), we select one of the football player. Object can be
selected with one click. The result shows the object can be tracked successfully by the
proposed method. In the middle of sequence, the object almost fully occluded by
opponent player. Nevertheless, the tracker is still able to track the remaining
unoccluded part of the object. There is also distraction from a team mate that run
towards the object, but the tracker can still track the object. The average frame rate is
11.77 fps.
When we test using CAMSHIFT, we tune the parameter to get the best result (Table
5,4). We set minimum value vmin = 10, minimum saturation smin = 10, maximum
saturation = 256, and take all the hue range. We vary the maximum value with 150,
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
50
100, and 70. When we use vmax = 70 gives better tracking result. Nevertheless, when
there is occlusion, the tracker drifts and track the occluder (Figure 5.10)
(a) (b) (c)
Figure 5.7 Third video best result with classic CAMSHIFT at frame 300. (a) the frame (b) back projection image (c) hue histogram with 16 bins.
Extreme Values
Minimum hue (hmin) 0
Maximum hue (hmax) 180 (maximum hue in OpenCV implementation)
Minimum saturation (smin) 10
Maximum saturation (smax) 256
Minimum value (vmin) 10
Maximum value (vmax) 70
Table 5.4 Tuned classic CAMSHIFT parameters for test video 4
(a) (b)
Figure 5.8 Object (marked with red rectangle) tracked by the proposed method. (a) The object is almost fully occluded at frame 57 (b) The corresponding back projection image
(a) (b) (c)
Figure 5.9 Forth video best result with classic CAMSHIFT at frame 57. (a) the frame (b) back projection image (c) hue histogram with 16 bins.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
51
(a) (b)
(c) (d)
Figure 5.10 Drifting tracker. (a) The object is tracked at frame 54 (b) The object almost fully occluded at frame 57 (c) The ellipse covers the object and occluder at frame 61 (d) Tracker drifts: it tracks the occluder at
frame 62
(a) (b)
Figure 5.11 Multiple object tracking using our proposed method (a) Frame 2 (b) Frame 45
Classic CAMSHIFT can not do multiple object tracking. With our proposed method,
CAMSHIFT has the capability to do that (Figure 5.11). The average frame rate is 4.18
fps.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
52
5.2 Discussion
5.2.1 Some Advantages Our experiments results show that the proposed method improve CAMSHIFT
significantly. This happens because of some of these improvements.
First, the object localization is more precise. It avoids the object model from taking its
surrounding background information. With this, tracker drifts less. While classic
CAMSHIFT uses rectangle which takes surrounding background information into the
object model for a lot of cases. Some other improvements [7,12] also fail to model the
object precisely. We use default preprocessing mean-shift segmentation parameters
proposed by Bradski [5] which works very well in every test video and reduce the need
to tune the parameters manually. But, we also give the possibility to tune the
preprocessing mean-shift segmentation parameters so that it is adaptive to the user’s
need. The proposed method can detect the extreme values of the object automatically
while in classic CAMSHIFT, we have to tune the parameters manually.
Second, the use of hue-distance histogram with threshold increase the robustness of
CAMSHIFT in the situation object passing through background which has similar
color to the object. The automatic threshold limits only very similar hue pixels will be
taken into hue-distance back projection image. In classic CAMSHIFT, the use of only
hue histogram make it difficult to track the object in that situation.
Third, splitting the problem of tracking into smaller problem by tracking the object
parts separately, increase the robustness of tracking multihued object. Classic
CAMSHIFT and current CAMSHIFT improvement methods track the object as a
whole which make them often drift. This is one of the main advantages of our method
in term of robustness.
Forth, our proposed method has a capability to track multiple object. We have tried
tracking 6 object simultaneously with very good result. All objects can be tracked
successfully. We did not find any CAMSHIFT improvement methods that support
multiple object tracking.
5.2.2 Some Limitations Our proposed method has increased the robustness of CAMSHIFT tracking.
Nevertheless, there are some limitations in using our method.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
53
First, in the case of object has a lot of hue, the object localization may be not so
practical. In addition to that, the performance may be slower due to more tracker to
compute. In the case of textured image, such as a running cheetah, the object
localization also suffer from its object localization method.
Second, if the object passing through a background with exactly the same color (hue,
saturation, value) to the object, then the tracker will most likely fail. The reason is
because we use only color information. So if the background has exactly the same color
as the object, it will be considered as the object as well.
Third, the tracker can not re-track the lost object parts. This is because we only use
color information. If we re-track using color only, there is big possibility the re-tracker
result will be wrong. To re-track, we need some other features to model the object.
Forth, the speed. Our method does not achieve real time performance due to the
separate tracking and some additional tasks to increase robustness. Actually, we have
defined in the first time that this issue is not our main concern.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
54
6 Conclusions and Future Works
6.1 Conclusions We have developed different ways to improve CAMSHIFT robustness. The proposed
object localization method improves the robustness of object model. With it, we can
significantly avoid surrounding background information to be taken into the object
model. The use of hue-distance histogram, tracking dominant-color object parts
separately and the use of maximum rectangle that combine each object part rectangle
also help CAMSHIFT so it can track multihued object in similar color background.
With all the experiments result, we have shown that the proposed method is able to
significantly improve CAMSHIFT robustness in challenging videos.
6.2 Future Works Our future works will be improving the methods using graphical processing unit
(GPU) or parallel programming in multi-core processor to increase the speed so it can
achieve real time speed.
Beside that, we propose to improve the proposed method so it can re-track the lost
object parts, improve the ability to track textured object and object with has a lot of
hue. These are very important things to do because there are a lot of real world
applications need these capabilities.
One remaining work is to improve the tracker in condition the object has exactly the
same color with the background and apply this tracker into clickable hypervideo.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
55
7 Bibliography
[1] Bradski, G. R. 1998. “Computer Vision Face Tracking for Use in a Perceptual
User Interface”. Intel Technology Journal, 2(2), 13-27.
[2] Comaniciu, D. and P. Meer. 2002. “Mean Shift: A Robust Approach Toward
Feature Space Analysis”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 24(5), 603-619.
[3] J. A. Corrales, P. Gil, F. A. Candelas, F. Torres. 2009. “Tracking based on Hue-
Saturation Features with a Miniaturized Active Vision System”. In Proceedings
Book of 40th International Symposium on Robotics, Asociación Española de
Robótica y Automatización Tecnologías de la Producción – AER-ATP,
Barcelona, Spain. pp.107
[4] Tian, G., Hu, R., Wang, Z., and Fu, Y. 2009. “Improved Object Tracking
Algorithm Based on New HSV Color Probability Model”. In Proceedings of the
6th international Symposium on Neural Networks: Advances in Neural
Networks - Part II, Wuhan, China.
[5] Bradski, G., and Kaehler, A. 2008. Learning OpenCV: Computer Vision with
the OpenCV Library. O'Reilly Media, Inc.
[6] Intel Corporation. 2001. Open Source Computer Vision Library Reference
Manual, 123456-001
[7] J. G. Allen, R. Y. D. Xu, and J. S. Jin. 2004. “Object tracking using camshift
algorithm and multiple quantized feature spaces”, in Proceedings of the Pan-
Sydney area workshop on Visual information processing, ser. ACM
International Conference Proceeding Series, vol. 100. Darlinghurst, Australia:
Australian Computer Society, Inc., pp. 3–7.
[8] J. Ning, L. Zhang, David Zhang and C. Wu. 2009, “Robust Object Tracking
using Joint Color-Texture Histogram”. International Journal of Pattern
Recognition and Artificial Intelligence, vol. 23, No. 7 (2009), World Scientific
Publishing Company 1245–1263
[9] Qiu, X., Liu, S., Liu, F. 2009. Kernel-based Target Tracking with Multiple
Features Fusion. Joint 48th IEEE Conference on Decision and Control and
28th Chinese Control Conference, Shanghai, P.R. China.
[10] Ganoun, A., Ould-Dris, N., and Canals, R. 2006, “Tracking System Using
CAMSHIFT and Feature Points”. 14th European Signal Processing Conference
(EUSIPCO 2006), Florence, Italy.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
56
[11] Stolkin, R., I. Florescu, M. Baron, C. Harrier and B. Kocherov. 2008. Efficient
Visual Servoing with the ABCshift Tracking Algorithm. In: IEEE International
Conference on Robotics and Automation, pp. 3219-3224, Pasadena, California,
USA.
[12] Xu, R Y D; Allen, J & Jin, J S .2003. Robust real-time tracking of non-rigid
objects, Conferences in Research and Practice in Information Technology,
VIP'03, Sydney, Australia.
[13] K. Fukunaga and L.D. Hostetler .1975. ”The estimation of the gradient of a
density function, with applications in pattern recognition”, IEEE Trans.
Inf0rmation Theory, vol. 21, pp. 32-40.
[14] Collins, R. 2007. “Lecture 29: Video Tracking: Mean-Shift” CSE/EE486
Computer Vision I, CSE Department, Penn State University
http://www.cse.psu.edu/~rcollins/CSE486/lecture29.pdf (visited June 2010)
[15] H. Bay, T. Tuytelaars, and L. Van Gool. 2006. SURF: Speeded Up Robust
Features. In ECCV (1), pages 404–417.
[16] M. Heikkila, M. Pietikainen, and C. Schmid. 2009. Description of interest
regions with local binary patterns. Pattern Recognition 42(3):425–436.
[17] A. Yilmaz, O. Javed, M. Shah. 2006). “Object tracking: a survey”, ACM
Computing surveys, vol. 38, no. 4, pp.1-45.
[18] R. C. Gonzalez, R.E. Woods, and S. L. Eddins. 2004. Digital Image Processing
Using MATLAB 1st Edition, Dorsing Kindersley, USA.
[19] J. McC. Smith and D. Stotts. 2002. An Extensible Object Tracking Architecture
for Hyperlinking in Real-time and Stored Video Streams, Technical Report
TR02-017, Department of Computer Science Univ of North Carolina at Chapel
Hill, USA.
[20] S. Stalder, H. Grabner, and L. Van Gool. 2009. Beyond Semi-Supervised
Tracking: Tracking Should Be as Simple as Detection, but not Simpler than
Recognition. In Proceedings ICCV’09 WS on On-line Learning for Computer
Vision, 2009
[21] S. Stalder, H. Grabner, and L. Van Gool. Beyond Semi-Supervised Tracking
Code.
http://www.vision.ee.ethz.ch/boostingTrackers/download.htm (visited
February 2010)
[22] M. Pietikainen and G. Zhao. 2009. Local Texture Descriptors in Computer
Vision. Tutorial in: IEEE International Conference on Computer Vision ICCV.
Object Tracking: State of The Art and CAMSHIFT Improvement Using Multi-dominant Color Tracking
57
[23] G. Zhao & M. Pietikainen. C++ implementation of spatio-temporal LBP.
http://www.ee.oulu.fi/research/imag/texture/download/STLBP_VC.zip
(visited March 2010)
[24] 2009. OpenCV 2.0 library. Code from web.
http://sourceforge.net/projects/opencvlibrary/ (visited February 2010)
[25] H. Bay, T. Tuytelaars, and L. Van Gool. 2006. Code from web.
http://www.vision.ee.ethz.ch/~surf/download.html (visited February 2010)
[26] J. Shi and C. Tomasi. 1994. Good features to track, Proc. IEEE Comput. Soc.
Conf. Comput. Vision and Pattern Recogn., pages 593-600.
[27] Jean-Yves Bouguet. Pyramidal Implementation of the Lucas Kanade Feature
Tracker, Description of the algorithm. Intel Corporation Microprocessor
Research Labs
[28] D. Stavens. 2007. The OpenCV Library: Computing Optical Flow. Stanford
Artificial Intelligence Lab, USA
[29] D. Comaniciu, V. Ramesh and P. Meer. 2000. Real-time Tracking of Non-Rigid
Objects Using Mean Shift. CVPR.
[30]
M. Heikkilä and T. Ahonen. 2009. Code from web.
http://www.ee.oulu.fi/mvg/page/lbp_matlab (visited May 2010)
[31] Code from web. http://opencv.jp/sample/accumulation_of_background.html
(visited May 2010)
[32] Z. Zirkovic. 2004. Improved adaptive Gausian mixture model for background
subtraction. Code from web.
http://staff.science.uva.nl/~zivkovic/Publications/CvBSLibGMM.zip (visited
May 2010)
[33] S. Stalder, H. Grabner, and L. Van Gool. 2009. Video from web.
http://www.vision.ee.ethz.ch/boostingTrackers/contactBoosting.html (visited
February 2010)
[34] R Valenti, F Hageloh. Video from web.
http://student.science.uva.nl/~rvalenti/uva/MIR/movies/soccer.avi
(visited April 2010)
[35] P. Colantoni, N. Boukala, J. Da Rugna. 2003. Fast and Accurate Color Image
Processing Using 3D Graphics Cards, VMV 2003. Munich, Germany