Region-based tracking usingsequences of relevance measures
Keio University3-14-1, Hiyoshi, Kohoku-ku,Yokohama, Kanagawa,223-8522, Japansandy@hvrl.icskeio.ac.jpBruce Thomas
University of South AustraliaD1-07, Mawson Lakes SA5095, AustraliaBruce.Thomas@unisa.edu.auHideo Saito
Figure 1: Region-based tracking in action (first row) and thevisualization of task support system for making a paper craftobject (second row).
Copyright is held by the author/owner(s).ISMAR13, October 1 4, 2013, Adelaide , South Australia.
AbstractWe present the preliminary results of our proposal: aregion-based detection and tracking method of arbitraryshapes. The method is designed to be robust againstorientation and scale changes and also occlusions. In thiswork, we study the effectiveness of sequence of shapedescriptors for matching purpose. We detect and tracksurfaces by matching the sequences of descriptor so calledrelevance measures with their correspondences in thedatabase. First, we extract stable shapes as the detectiontarget using Maximally Stable Extreme Region (MSER)method. The keypoints on the stable shapes are thenextracted by simplifying the outline of the stable regions.The relevance measures that are composed by threekeypoints are then computed and the sequences of themare composed as descriptors. During runtime, thesequences of relevance measures are extracted from thecaptured image and are matched with those in thedatabase. When a particular region is matched with onein the database, the orientation of the region is thenestimated and virtual annotations can be superimposed.We apply this approach in an interactive task supportsystem that helps users for creating paper craft objects.
Author KeywordsArtificial, augmented, virtual realities
Extended Abstracts of the IEEE International Symposium on Mixed and Augmented Reality 2013Science and Technology Proceedings1 - 4 October 2013, Adelaide, SA, Australia978-1-4799-2869-9/13/$31.00 2013 IEEE
ACM Classification KeywordsH.5.1 [Multimedia Information Systems]: Artificial,augmented, and virtual realities.
IntroductionIn design processes such as clothes making, designersusually draw particular shapes or write notes on thefabrics. Showing virtual annotations instead of actuallywriting the notes on the fabrics for helping designers tofinish their work is interesting use case in augmentedreality.
In order to show virtual information on fabrics, we need toregister it. When the surface is registered, the camerapose then can be estimated and annotation can besuperimposed. Conventionally, a rectangular fiducialmarker  is used to register a planar surface. The markermust be visible in the surface for performing detection andtracking. Therefure, such marker is not suitable forsystem that requires physical change of the surface. Onthe other hand, random dot markers  can have irregularshapes. However, random dot markers and othertexture-based registration methods are not fit fortexture-less fabric. Therefore, we can only rely on otherfeatures such as edges or outlines of the shapes that aredrawn on the fabric.
We propose a method that applies region detection(MSER) method for tracking shapes in a surface forrealizing a task support system that uses some patternsprinted or drawn on the fabric as illustrated in Figure 2.
Figure 2: Scenario for making a paper craft object. The usercuts and folds a paper for making a paper bag as instructed inthe virtual annotations.
Related WorksShapes or regions registration problem has been exploredin previous investigations. Bergig et al. have developed anapplication for augmented reality that recognizehandwriting in real time . Their method recognizes 2Ddrawings and displays its corresponding 3D shapes.Explorations on arbitrary shapes for planar registration isdone by Hagbi et al. . They used a classificationmethod for searching region template in database.Similarly, Donoser et al. proposed a method for trackingregions using MSER . We explore the similar method asproposed by Donoser et al  and add the robust localfeature in the registration process. We simplify thematching process by keeping the keypoints from thepolygon instead of using many MSER region templates.
Proposed MethodIn a nutshell, we proposed a tracking method that consistsof features extraction and shape registration usingsequence of relevance measures as descriptors.
The features extraction is initialized by applying MSER tothe input image as illustrated in Figure 3. In order todetect a region, one MSER must be extracted. Theborder of the MSER is simplified using relevance measurethat is computed using three consequent points that formtwo connected lines in the shape outline is defined as
where l1 and l2 are the length of two connected segments(lines) and is the angle between two segments. Thepoint that connects two segments is removed from thepolygon if the relevance measure is smaller than threshold.This removal process is iterated until only the points withhigh relevance measure remains (the remaining points arecalled keypoints). The relevance measure are recomputedand then used to describe a particular keypoint for theshape registration.
The features extraction is done off-line for making thefeature database (hash table) and shape database. It isalso done on-line for registering the unknown shapes (seeFigure 4).
Figure 3: A stable region is extracted using MSER and itsoutline polygon is simplified by filtering the keypoints with thehigh relevance measure.
During the shapes registration, the keypoints ofunidentified region are extracted. The sequences ofrelevance measures are then computed and thecorresponding tuple (region id, keypoint id) is looked upin the hash table (see Figure 4). Since the hash table ismany-to-one relationship, in order to get the matchedregion and keypoint, the voting using a histogram ofmatched keypoints is performed. This process yieldskeypoints correspondences between a shape captured inthe camera and a shape in the database.
Figure 4: Sequences of relevance measures are stored in thedatabase as the indices of the hash table. An index of the hashtable refers to a tuple that consist of a region id (region id, formultiple regions detection) and keypoint id (pt id).f(rn1, rn, rn+1) is the hashing function for keypoint n thatcan be implemented as a string of (rn1, rn, rn+1)
Technically, we choose three neighbouring relevancemeasure values to represent a keypoint of a shape. In thiscase, one keypoint is actually described by its fourneighbours keypoints. We assume that we can also
increase the number of relevance measure to four in orderto represents one keypoints for example r0, r1, r2, r3 forkeypoint with id = 1.
The camera pose is estimated using homography that iscalculated using at least four keypoints correspondencesas the result of the shape registration. The camera pose isthen optimized using Levenberg-Marquardt  byminimizing the re-projection error that is the distancebetween the projected keypoints from the shape databaseand the extracted keypoints in the captured frames. Thecamera pose is also refined by considering the keypointscorrespondence to the detected shape in previous frame.These two optimizations produce a stable camera pose.
Scenario of task support system
Figure 5: The flow of instructions seen by the user and thestate for tracking individual region. Three templates areprepared beforehand in order to identify the current state ofusers action. The user cuts the region of the paper craft andfollows the instructions that are displayed.
We are developing a prototype of the task support systemusing the proposed region-based tracking. We use a pieceof paper is used as the target surface. The user then cuts
and folds the paper in order to make the final product byfollowing steps illustrated in Figure 5. The information issuperimposed virtually over the paper to guide the user asillustrated in Figure 1.
EvaluationWe show the accuracy of detection and tracking bycalculating the re-projection error as the results oforientation and scale changes and also occlusions. Wealso evaluated our method by detecting multiple shapes.
We capture the paper craft shape (template in thescenario) in 1184 image frames. The error are calculatedby projecting the keypoints in the template (database) tothe captured image using the computed homography.Then the average distance of the projected keypoints tothe shape outline in the captured image is plotted as theerror (See Figure 6).
Figure 6: Re-projection error of tracking using paper craftshape. The error for successful detection and tracking is lowerthan 1 pixel. Error that higher than 1 pixel and missing valuesare because of failure detection, extreme orientation andocclusions.
Orientation and scale changes
For making a shape descriptor, we take into account theratio of length of the edge and angle which are invariant
to the scale and rotation changes. Therefore, the shapedescriptor is effective to handle the scale changes asillustrated in Figure 7.
Figure 7: Scale changes.
Likewise, the angle between two edges does not change inevery rotation, which make our method works robustlyagainst rotation changes as illustrated in the Figure 8.
Figure 8: Rotation changes.
In handling the orientation changes, we update thedescriptor database in every successful tracking for thenext matching. Furthermore, we use also the previousframe in addition of the descriptor informati