Kapitel 11 Tracking – p. 1 Tracking Fundamentals Object representation Object detection Object tracking (Point, Kernel, Silhouette) Articulated tracking

  • View

  • Download

Embed Size (px)


  • Slide 1

Kapitel 11 Tracking p. 1 Tracking Fundamentals Object representation Object detection Object tracking (Point, Kernel, Silhouette) Articulated tracking A. Yilmaz, O. Javed, and M. Shah: Object tracking: A survey. ACM Computing Surveys, Vol. 38, No. 4, 1-45, 2006 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Kapitel 11 Slide 2 Kapitel 11 Tracking p. 2 Fundamentals (1) Slide 3 Kapitel 11 Tracking p. 3 Fundamentals (2) Applications of object tracking: motion-based recognition: human identification based on gait, automatic object detection, etc. automated surveillance: monitoring a scene to detect suspicious activities or unlikely events video indexing: automatic annotation and retrieval of videos in multimedia databases human-computer interaction: gesture recognition, eye gaze tracking for data input to computers, etc. traffic monitoring: real-time gathering of traffic statistics to direct traffic flow vehicle navigation: video-based path planning and obstacle avoidance capabilities Slide 4 Kapitel 11 Tracking p. 4 Fundamentals (3) Tracking task: In the simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene. In other words, a tracker assigns consistent labels to the tracked objects in different frames of a video. Additionally, depending on the tracking domain, a tracker can also provide object- centric information, such as orientation, area, or shape of an object. Two subtasks: Build some model of what you want to track Use what you know about where the object was in the previous frame(s) to make predictions about the current frame and restrict the search Repeat the two subtasks, possibly updating the model Slide 5 Kapitel 11 Tracking p. 5 Fundamentals (4) Tracking objects can be complex due to: loss of information caused by projection of 3D world on 2D image noise in images complex object shapes / motion nonrigid or articulated nature of objects partial and full object occlusions scene illumination changes real-time processing requirements Simplify tracking by imposing constraints: Almost all tracking algorithms assume that the object motion is smooth with no abrupt changes The object motion is assumed to be of constant velocity Prior knowledge about the number and the size of objects, or the object appearance and shape Slide 6 Kapitel 11 Tracking p. 6 Object represention (1) Object representation = Shape + Appearance Shape representations: Points. The object is represented by a point, that is, the centroid or by a set of points; suitable for tracking objects that occupy small regions in an image Primitive geometric shapes. Object shape is represented by a rectangle, ellipse, etc. Object motion for such representations is usually modeled by translation, affine, or projective transformation. Though primitive geometric shapes are more suitable for representing simple rigid objects, they are also used for tracking nonrigid objects. Slide 7 Kapitel 11 Tracking p. 7 Object represention (2) Object silhouette and contour. Contour = boundary of an object. Region inside the contour = silhouette. Silhouette and contour representations are suitable for tracking complex nonrigid shapes. Articulated shape models. Articulated objects are composed of body parts (modelled by cylinders or ellipses) that are held together with joints. Example: human body = articulated object with torso, legs, hands, head, and feet connected by joints. The relationship between the parts are governed by kinematic motion models, e.g. joint angle, etc. Skeletal models. Object skeleton can be extracted by applying medial axis transform to the object silhouette. Skeleton representation can be used to model both articulated and rigid objects. Slide 8 Kapitel 11 Tracking p. 8 Object represention (3) Object representations. (a) Centroid, (b) multiple points, (c) rectangular patch, (d) elliptical patch, (e) part-based multiple patches, (f) object skeleton, (g) control points on object contour, (h) complete object contour, (i) object silhouette Slide 9 Kapitel 11 Tracking p. 9 Object represention (4) Appearance representations: Templates. Formed using simple geometric shapes or silhouettes. Suitable for tracking objects whose poses do not vary considerably during the course of tracking. Self-adapation of templates during the tracking is possibe. http://www.cs.toronto.edu/vis/projects/dudekfaceSequence.html Slide 10 Kapitel 11 Tracking p. Probability densities of object appearance can either be parametric (Gaussian and mixture of Gaussians) or nonparametric (histograms, Parzen estimation) Characterize an image region by its statistics. If the statistics differ from background, they should enable tracking. nonparametric: histogram (grayscale or color) 10 Object represention (5) Slide 11 Kapitel 11 Tracking p. 11 Object represention (6) parametric: 1D Gaussian distribution Slide 12 Kapitel 11 Tracking p. 12 Object represention (7) parametric: n-D Gaussian distribution Centered at (1,3) with a standard deviation of 3 in roughly the (0.878, 0.478) direction and of 1 in the orthogonal direction Slide 13 Kapitel 11 Tracking p. 13 Object represention (8) parametric: Gaussian Mixture Models (GMM; Chapter Bayes Klassifikator) Slide 14 Kapitel 11 Tracking p. 14 Object represention (9) Beispiel: Mixture of three Gaussians in 2D space. (a) Contours of constant density for each mixture component. (b) Contours of constant density of mixture distribution p(x). (c) Surface plot of p(x). Slide 15 Kapitel 11 Tracking p. 15 Object represention (10) Object representations are chosen according to the application Point representations appropriate for tracking objects, which appear very small in an image (e.g. track distant birds) For the objects whose shapes can be approximated by rectangles or ellipses, primitive geometric shape representations are more appropriate (e.g. face) For tracking objects with complex shapes, for example, humans, a contour or a silhouette-based representation is appropriate (surveillance applications) Slide 16 Kapitel 11 Tracking p. 16 Object represention (11) Feature selection for tracking: Color: RGB, L u v, L a b, HSV, etc. There is no last word on which color space is more effective; a variety of color spaces have been used. Edges: Less sensitive to illumination changes compared to color features. Algorithms that track the object boundary usually use edges as features. Because of its simplicity and accuracy, the most popular edge detection approach is the Canny Edge detector. Texture: Measure of the intensity variation of a surface which quantifies properties such as smoothness and regularity In general, the most desirable property of a visual feature is its uniqueness so that the objects can be easily distinguished in the feature space Slide 17 Kapitel 11 Tracking p. 17 Object represention (12) Mostly features are chosen manually by the user depending on the application domain. Among all features, color is one of the most widely used for tracking. Automatic feature selection (see Chapter Merkmale): Filter methods Wrapper methods Principal Component Analysis: transformation of a number of (possibly) correlated variables into a smaller number of uncorrelated, linearly combined variables called the principal components Slide 18 Kapitel 11 Tracking p. 18 Object detection (1) Object detection mechanism: required by every tracking method either at the beginning or when an object first appears in the video Point detectors: find interest points in images which have an expressive texture in their respective localities (Chapter Detection of Interest Points) Segmentation: partition the image into perceptually similar regions Slide 19 Kapitel 11 Tracking p. 19 Object detection (2) Background subtraction: Object detection can be achieved by building a representation of the scene called the background model and then finding deviations from the model for each incoming frame. Any significant change in an image region from the background model signifies a moving object. The pixels constituting the regions undergoing change are marked for further processing. Usually, a connected component algorithm is applied to obtain connected regions corresponding to the objects. Slide 20 Kapitel 11 Tracking p. 20 Object detection (3) Frame differencing of temporally adjacent frames: Slide 21 Kapitel 11 Tracking p. 21 Object detection (4) Bildsequenz: 5 Bilder/s Slide 22 Kapitel 11 Tracking p. 22 Object detection (5) Bildsubtraktion: Variante 1 Schwche: Doppelbild eines Fahrzeugs (aus dem letzten und aktuellen Bild); Aufteilung einer konstanten Flche Slide 23 Kapitel 11 Tracking p. 23 Object detection (6) Bildsubtraktion: Variante 2 Referenzbild f r (r, c): Mittelung einer langen Sequenz von Bildern Slide 24 Kapitel 11 Tracking p. 24 Object detection (7) Slide 25 Kapitel 11 Tracking p. 25 Object detection (8) Statistical modeling of background: Learn gradual changes in time by Gaussian, I (x, y) N((x, y), (x, y)), from the color observations in several consecutive frames. Once the background model is derived for every pixel (x, y) in the input frame, the likelihood of its color coming from N((x, y), (x, y)) is computed. Example: C. Stauffer and W. Grimson: Learning patterns of activity using real time tracking. IEEE T-PAMI, 22(8): 747-757, 2000. A pixel in the current frame is checked against the background model by comparing it with every Gaussian in the model until a matching Gaussian is found. If a match is found, the mean and variance of the matched Gaussian is updated, otherwise a new Gaussian with (mean = current pixel color) and some initial variance is used to replace the least probable Gaussian. Each pixel is classified based on whether the matched distribution represents the background process. Slide 26 Kapitel 11 Tracking p. 26 Object detection (9) Mixture of Gaussi