SPIE_EI_9026_2014

Embed Size (px)

Citation preview

  • 8/10/2019 SPIE_EI_9026_2014

    1/14

  • 8/10/2019 SPIE_EI_9026_2014

    2/14

  • 8/10/2019 SPIE_EI_9026_2014

    3/14

    S h o u l d e r

    E l b o w

    M

    H i p

    r

    W r i s t

    K n e e

    A n k l e

    Figure 1: Illustration of specific joints on the human body to be tracked.

    images. The former uses the depth information generated either by a depth sensor such as the kinect or by a 3Dreconstruction algorithm from multiple video sources at high resolution and is suitable for applications in indoorscenarios such as in gaming consoles or for human interactive systems. The latter is used in surveillance scenarious

    which does not have any source of depth information such as video feed from CCTV cameras monitering a parkinglot or a shopping mall. One of the most recent and popular work was done by Shotten et al6 for locating 3D jointposition in the human body from a depth image acquired by a Kinect sensor. They used a part-based recognitionparadigm where they converted a difficult pose estimation problem to an easier per-pixel classification problemand subsequently estimate the 3D joint locations irrespective of pose, body shape or clothing. In a more recentapproach by Huang et al,7 human body pose is estimated and tracked across the scene using information aquiredby a multi-camera system. Here, both the skeletal joint positions as well as the surface deformations(body shapechanges) are estimated by fitting a reference surface model to the 3D point reconstructions from the multi-camerasystem. This also makes use of a learning scheme which divides the point reconstructions into rigid body partsfor accurate estimation. But the above research areas requires the use of high resolution imagery under controlledlighting or environmental conditions to work and that too requires the depth information for direct usage or forpoint cloud reconstruction.

    One of the earlier and popular works which does not use the depth information and which uses only a single

    video camera to track human motion is done by Markus Kohler. 8 Here, he designs a Kalman Filter to tracknon-linear human motion in such a way that the non-linearity in motion can be considered as a motion withconstant velocity and changing acceleration which can be regarded as white noise. The process noise covarianceof the Kalman filter is designed in such a manner so as to incorporate this changing acceleration. In our proposedalgorithm, we use a modification of this Kalman filter and the design of the process noise covariance to trackcertain body joints across the video sequence. Kaniche et al4 used the extended version of the Kalman filterto track specific points or corners detected at every frame of the video sequence for the purpose of gesturerecognition. Each point is described by a region descriptor such as the Histogram of Oriented Gradietns(HOG)and the Kalman filter tracks the position of the corner by using a HOG-based region matching. For trackingspecific joints however, this methodology does not suffice as any corner point which does not get matched withprevious frame gets discarded. Bilimski et al5 extended this methodology by incorporating the object speed andorientation to track multiple objects under occlusion.

    In recent years, the problem of human body pose estimation has not just being limited to tracking pointsor corners or using depth information. Ramakrishna et al9 proposed an occlusion aware algorithm which trackshuman body pose in a sequence where the human body is modeled as a combination of single parts such as thehead and neck and symmetric part pairs such as the shoulders, knees and feet. Here, the important aspect inthis algorithm is that it can differentiate between similar looking parts such as the left or right leg/arm, therebygiving a suitable estimate of the human pose. This kind of a human pose estimation algorithm can aid in ourtracking mechanism and vice versa to get more accurate locations of specific joints. In the next section, we willdescribe the theory involved in the various modules of the tracking framework.

    SPIE-IS&T/ Vol. 9026 90260H-3

    wnloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

  • 8/10/2019 SPIE_EI_9026_2014

    4/14

    I n p u t f r o m

    C a m e r a 1

    C o m p u t e

    B a c k g r o u n d

    F o r e g r o u n d

    S e g m e n t a t i o n

    i

    I n t e r e s t P o i n t

    C o m p u t a t i o n

    O p t i c a l F l o w

    C o m p u t a t i o n

    (a) Block Schematic of Optical Flow Computation to compute global velocity. (b) Optical Flow Illustration

    Figure 2: Framework for Computing Optical Flow and Illustration

    3. THEORY

    This section describes the necessary theoretical background required for a deeper understanding of the proposedmodel for joint estimation and tracking. The two main sections which will be covered are : a) Lucas KanadeOptical Flow estimation, b) Region descriptors and Matching, c) Kalman Filter. Our proposed methodology isa combination of the variant of these techniques designed to estimate and track joints in a low-resolution video,given the initial estimate of the joint locations.

    3.1 Lucas Kanade Optical Flow

    Optical flow between two frames of a video sequence estimates the velocity of a point in the real world sceneby finding a relationship between the projections of that point in the corresponding frames. In other words,optical flow measures the velocity or movement of a pixel or region between two time instances. In our case, thepoint of interest is the corresponding joint of a human body and we need to estimate the velocity of that jointin the current frame given its location in the previous frame. There exists two main methods for computingthis velocity : one is the Horn Shunck method which takes into consideration a global constraint (i.e the entireimage pixels are used in the determination of the velocity of a single pixel) while the other is the Lucas Kanade 10

    method which is more localized (i.e. it considers only a neighborhood region around the point of interest, therebysetting a local constraint). The optical flow of both the methods are based on a single basic equation given byI(x, y, t) = I(x+x, y+y, t+t). Here, lets consider that a pixel p = (x, y) at time t has moved to a position

    p = (x + x,y + y) at time t + t. The equation then assumes that the brightness of the pixel remains constant

    through its movement. This is one of major assumptions of the optical flow. Other assumptions such as thespatial coherence where the point describing an object region does not change shape with time and temporalpersistance where the motion happening in a pixel or region is purely based on the motion of the object andnot due to the camera movement. For tracking joint regions of the human body, the localized regions remainrigid or do not change shape and thus does not violate the spatial coherence assumption. Since in our testingscenarios, we use video sequences captured from a stationary camera with a constant background, the temporalpersistance assumption is not violated. So, for our purposes, we employ the Lucas Kanade(L-K) Optical Flowestimation technique which uses a local constraint. The optical flow equation can be derived by using a Taylorseries expansion of the basic equation and is given by

    I

    xvx+

    I

    yvy+

    I

    t = 0 or I.v +It (1)

    where (vx, vy) are the optical flow velocity of a pixel p= (x, y). As mentioned earlier, L-K method uses a localconstraint. A small window region (local neighborhood) around the point p = (x, y) is considered and withinthis neigbhorhood, a weighed least squares estimate equation is minimized. This equation is given by

    x,y

    W2(x, y)[I(x, y, t) +It(x,y,t)]2 (2)

    Using the above equation and the optical flow constraint equation (Equation 1), we can uniquely computethe solution v. The assumption here is that the optical flow within that local region is constant. But there are

    SPIE-IS&T/ Vol. 9026 90260H-4

    wnloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

  • 8/10/2019 SPIE_EI_9026_2014

    5/14

    some issues when computing the Lucas Kanade Optical flow. One issue is that the motion in the scene is notsmall enough and we will need the higher order terms in the optical flow constraint equation. The alternativeapproach is to use a pyramidal iterative Lucas Kanade approach where the image scene at a particular instantis down sampled to form a Gaussian pyramid and at each level, optical flow is computed. The other issue isthat if the point in a local region does not move like its neighbors. This brings back to our earlier assumptionof spatial coherence where the objects or points to be tracked should be rigid. So, one of the important design

    criteria is to determine what would be the ideal window size (local region size) for computing the optical flowat a certain point. For the joint tracking problem, this window size depends on the resolution of the video andthus for poor resolutions, we use a window size of 7 7. An illustration of optical flow estimation on the pointsof human body silhouette is shown in Figure 2. For the proposed algorithm, we use the optical flow estimationin two scenarios: one to compute the global velocity of the motion of the human body; and the other to find acoarse estimate the location of a particular body joint in the next instant. For the latter, we compute the opticalflow for every point surrounding the joint region using L-K method and then compute the median flow.

    3.2 Region Descriptors

    The region descriptors such as the Histogram of Oriented Gradients(HOG)11 and the Local Binary Patterns(LBP)12

    are used to describe the edge information and the textural content in a local region and the combination of thesedescriptors can be a very effective descriptor for region-based image matching. Many efficient image descriptorsare out there such as the SIFT,ORB etc.. but one of the assumptions is that the images should be of highresolution. The HOG and LBP are effective in describing an image region in spite of low resolution. Henceforth,we use the combination of the HOG and LBP to describe the local neighborhood of a joint region. In the nextfew sections, we will give an overview of the two popular region descriptors widely used in literature today.

    3.2.1 Histogram of Oriented Gradients

    The histogram of oriented gradients or HOG descriptor is a weighted histogram which represents the edgeinformation in a certain region. The histogram is computed from the gradient of the region which gives theedge magnitude and orientation at a pixel. The HOG is then a weighted histogram of the pixels over the edgeorientation where the weights are the corresponding edge magnitude. To compute the gradient of this region,we convolve with the Prewitt operators [ 1 0 1 ] and [1 0 1 ]T to get Gx and Gy. The gradient

    magnitude is then

    G2x+G2y and the gradient direction is tan

    1(GyGx

    ). By dividing the gradient direction into

    Nbins, we accumulate those pixels whose orientation falls into a certain bin and weight its contribution by its

    corresponding gradient magnitude.

    3.2.2 Local Binary Patterns

    The local binary pattern is an image coding scheme which brings out the textural features in a region. Forrepresentating a joint region and to associate a joint in sucessive frames, the texture of the region plays a vitalpart in addition to the edge information. The LBP considers an local neighborhood of 8 8 or 16 16 in a jointregion, and labels the neighborhood pixels by either a 1 or 0 based on the center pixel value. The coded valuerepresenting this local region is then the decimal representation of the neighborhood labels taken in clockwisemanner. Thus, for every pixel within the joint region, a coded value is generated which represents the underlyingtexture in its local region. The LBP operator is defined as

    LBPP,R =P

    p=0

    s(gp gc)2p ; s(z) = 1 z 0

    0 z

  • 8/10/2019 SPIE_EI_9026_2014

    6/14

    3.2.3 Joint Region Representation and Matching

    As mentioned earlier, the representation of the joint should contain two types of descriptions; one which givesthe edge information in a compact and lighting invariant manner, and then other which provides a texturerepresentation. A suitable combination of the HOG (represented as h )and the LBP (represented as l) is usedto describe the joint region. Since both HOG and the LBP descriptor are vector spaces which are independentof each other, we can form the final feature vector as f = h l which is a concantenation of these two vectorspaces. The matching between two joint regions is done using the Chi-squared metric12 given by

    2(f1, f2) =b

    (f1(b) f2(b))2

    f1(b) +f2(b) (4)

    where f1,f2 are feature vectors corresponding to a certain joint in successive frames.

    3.3 Kalman Filter

    The Kalman filter13,14 is a mean squared estimator which estimates the true value of a measurement in aniterative procedure where each iteration will be a certain noisy measurement at a time instant. The underlyingmodel of the filter is a set of equations having a state-space representation and is given by

    xk+1= Axk+ qk ; zk = Hxk+ rk (5)

    where xk is the state vector at instant k, A is the transition matrix, H is the measurement matrix and zk isthe measurement vector. q and r are random variables generated from a white noise process with covariancesgiven by Q= E[qkq

    Tk ] and R = E[rkr

    Tk ]. Here, we can define Pk = E[eke

    Tk ] as the error covariance matrix at

    time instant k where we can consider a prior estimate of the state xk from the knowledge of the system andposterior estimate of the statexk after knowing the current measurement zk. The errorek is then defined as thedifference between the true state and the posterior estimate of the state (xk xk). For obtaining a true value ofa response(or state) generated by a process or system, an iterative procedure will be to get a prior estimate ofthe state xk at instant k which is obtained from the posterior estimate(xk1) at instant k 1. Then, using themeasured value of the response (zk) at instant k, we compute the innovation or measurement residual zkHx

    k

    and use this to obtain a posterior estimate xk = x

    k + Kk(zk Hx

    k). The kalman gain Kk at instant k is givenby

    Kk

    = Pk

    HT(HPk

    HT +R)1 (6)

    where Pk =E[e

    keTk ] , e

    k = (xk x

    k) and Pk = (I KkH)P

    k . This iterative procedure can be divided intotwo stages; Time update (prediction stage) and Measurement Update(correction stage). Thus, the kalman filtercan be thought of as a procss which estimates the state at one instant and then obtains the feedback in the formof noisy measurements of the response of the system. The time update stage is the stage which predicts the nextstate based on the previous one while the measurement stage is the one which updates the priori estimate usingthe measurement obtained to get the posterior estimate. An illustration of this iterative procedure is shown inFigure 3b.

    The recursive version of the Kalman filter can also be used for tracking purposes and in literature, it hasbeen widely applied for tracking points in video sequences. In the proposed algorithm, we use the Kalman filterto track a specific body joint across the scene. This is done by setting the state of the process (which in thiscase is the human body movement) as the (x, y) coordinates of the joint along with its velocity (vx, vy) to get a

    state vector xk R4

    . The measurement vector zk = [xo, yo] R2

    will be provided either by the given manuallyannotated points or a human body pose estimation algorithm or which in the proposed algorithm by the L-Koptical flow estimate. By approximating the motion of a joint in a small time interval by a linear function, wecan design the transition matrixA so that the next state is a linear function of the previous states. As done byKohler,8 to account for non-constant velocity often associated with accelerating image structures, we design theprocess noise covariance matrix Q appropriately.

    The modification of the Kalman Filter recursive algorithm used for the joint tracking is shown in Figure3a. It is shown from the figure that the measurement is obtained from the optical flow estimate. There are a

    SPIE-IS&T/ Vol. 9026 90260H-6

    wnloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

  • 8/10/2019 SPIE_EI_9026_2014

    7/14

    (a) Joint Tracking Algorithm using Kalman Filter

    (b) Iterative/Recursive Kalman Filter Algorithm

    Figure 3: Kalman Filter Overview and its implementation with respect to Joint Tracking

    couple of scenarios which needs to be tackled in order to use the optical flow as a reliable measurement vector.The first one is that the optical flow estimate falls in the elliptical search region computed during the predictionphase and this confirms the correctness of the optical flow thereby making the optical flow estimate as a suitablemeasurement vector. The elliptical search region is computed by using the posterior state and the predictedstate as two foci of an ellipse and computing the major and minor axis using the possible error values from theprior error covariance matrix.4 The second scenario is when the optical flow estimate does not fall in the searchregion, thereby confirming that the optical flow estimate is noisy and is not suitable for measurement. Thus, weuse a region based matching technique where at every point of the elliptical search region, a region descriptoras described in previous section is computed. The point within the search region which is closest to the jointlocation in previous frame in the region descriptor space will be new estimate used as the measurement vector.

    4. PROPOSED FRAMEWORK

    For tracking specific joints of the human body in a low-resolution video, we use a optical flow-based Kalman filter.As illustrated in the overall schematic shown in Figure 4, the first step is to compute the foreground/backgroundmodel to estimate the global velocity and is used to initialize the tracker associated with a joint. Each jointregion will be described by a region descriptor which is a concatenation of the HOG descriptor and the LBPdescriptor. The detection of joints in the initial frame within a window of interest can be done using a poseestimation technique as described in the Literature survey. However, for the purpose of this research articleand to demonstrate the tracking ability of the framework, we use manually annotated points. Every possibledetection of the joint in consecutive frames will be provided by the optical flow estimate and this would serve

  • 8/10/2019 SPIE_EI_9026_2014

    8/14

    I n p u t V i d e o

    R e a d

    F r a m e

    F o r e g r o u n d

    S e g m e n t a t i o n

    C o m p u t e G l o b a l

    V e l o c i t y ( U s i n g L K )

    I n i t i a l i z e J o i n t

    D e s c r i p t o r a n d

    J o i n t T r a c k e r

    M a n u a l l y

    A n n o t a t e d

    P o i n t s / A u t o m a t i c

    P o s e E s t i m a t i o n

    K a l m a n F i l t e r C o r r e c t i o n

    ( M e a s u r e m e n t U p d a t e U p d a t e )

    E s t i m a t e j o i n t

    l o c a t i o n i n

    c u r r e n t f r a m e

    u s i n g L K O p t i c a l

    F l o w

    L e a r n

    B a c k g o u n d

    M o d e l

    C o m p u t e G l o b a l

    V e l o c i t y ( U s i n g L K )

    e

    K a l m a n P r e d i c t i o n ( T i m e U p d a t e )

    C o m p u t e E l l i p t i c a l S e a r c h R e g i o n

    Figure 4: Block Schematic of Tracking

    as the measurement for the Kalman Filter update at the current frame. To validate the optical flow estimate ofa joint in the next frame, an elliptical search region is computed based on the prediction of the Kalman filterto the next instant and its current state. If optical flow estimate does not fall into the search region, a regiondescriptor based matching is done on the search region. The algorithm steps for joint tracking is summarized

    below :

    1. Compute the background model of the video sequence and extract the foreground at every frame of thevideo sequence.

    2. Extract the first frame(time instantt = 1) of the window of interest from the video sequence. Initialize theoptical flow estimation technique using the first frame.

    3. Extract the second frame(time instantt = 2). Compute the optical flow between the first and second andsubsequently, initialize a Kalman tracker for each body joint using the global velocity (median velocity ofthe body region) and the manually annotated locations of the body joints with respect to time instantt= 1. The state of the tracker for each body joint is xt = [x, y, vx, vy] where (vx, vy) are the optical flowvelocities obtained at location (x, y). This will considered as the corrected state xt1at timet = 1. Predict

    the state(get prior state) x

    t of the Kalman filter fort = 2.4. Using the manual points att = 2 as the measurement vector, perform the correction phase of the filter to

    get posterior statext . Update t t+ 1

    5. Read in the next frame and compute optical flow between current time instant t and the previous timeinstant t 1. Update the global velocity associated with the tracker, predict the joint state in the nexttime instant xt and compute the elliptical search region based on the predicted statex

    t and the apriorierror covariance Pt .

    6. Get the detected joint locations obtained by the modified Lucas Kanade optical flow estimation. Computethe correctness of the optical flow match by comparing it with the search region. If optical flow match isincorrect, then the region based matching is performed.

    7. Use the updated location of the joint from the optical flow/region based matching as the measurementvector z= [zx, zy] for the correction phase of the Kalman filter. The correction stage updates the currentstate by considering it as the measurement and thus we obtain the tracked body joint location at instantt. Update t t+ 1 and go to Step 5.

    8. Repeat for all remaining frames within that spatio-temporal window of the video sequence.

    SPIE-IS&T/ Vol. 9026 90260H-8

    wnloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

  • 8/10/2019 SPIE_EI_9026_2014

    9/14

  • 8/10/2019 SPIE_EI_9026_2014

    10/14

    1 8 8 . 5

    1 8 8

    1 8 7 . 5

    1 8 7

    }

    1 8 6 . 5

    1 8 6

    1 8 5 . 5

    1 8 5

    4 3 0

    *

    *

    *

    *

    +

    *

    +

    ? I E

    * T r a c k e d P o i n t s

    * D i c r e t e M a n u a l P o i n t s

    P o l y d e g r e e 2

    P o l y d e g r e e 3

    +

    *

    *

    * *

    +

    *

    *

    * * *

    * *

    * *

    *

    -

    *

    4 4 0 4 5 0 4 6 0 4 7 0

    * I * * * * * 1 * * 1 *

    * *

    4 8 0

    X - C o o r d i n a t e

    4 9 0 5 0 0 5 1 0 5 2 0 5 3 0

    1 9 9

    1 9 8

    1 9 7

    .

    1 9 6

    1 9 5

    1 9 4

    1 9 3

    4 3 0

    ) ; \

    +

    -

    * *

    * y

    * -

    . .

    * *

    *

    *

    j E

    * * * *

    * * *

    *

    T r a c k e d P o i n t s

    D i c r e t e M a n u a l P o i n t s

    P o l y d e g r e e 2

    P o l y d e g r e e 3

    * *

    *

    * *

    * *

    * *

    * *

    * *

    * + + + + * * * *

    1 *

    4 4 0 4 5 0 4 6 0 4 7 0

    4 8 0

    H - C o o r d i n a t e

    4 9 0 5 0 0

    5 1 0 5 2 0 5 3 0

    (a) Shoulder Joint

    (b) Elbow Joint

    Figure 7: Comparison of tracked locations of the shoulder/elbow joints with their manual locations and thecorresponding polynomials fits.

    SPIE-IS&T/ Vol. 9026 90260H-10

    wnloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

  • 8/10/2019 SPIE_EI_9026_2014

    11/14

  • 8/10/2019 SPIE_EI_9026_2014

    12/14

    5. RESULTS AND EXPERIMENTS

    The proposed algorithm has been tested on a private dataset provided by the Airforce Institute of Technol-ogy,Dayton,OH. It consists of 21 video sequences of a person walking along a track across the face of a buildingoutdoors and with a staircase in the front. Each sequence is divided into 5 phases; Phase A - E and is describedas follows.

    Phase A : Subject is walking clockwise around the track. The frames of interest are of the subject walkingon the cross over the platform.

    Phase B: Subject is walking clockwise around the track. The frames of interest are of the subject walkingon the grass after the ramp.

    Phase C: Subject is walking clockwise around the track. The frames of interest are of the subject walkingon the grass after the ramp on the side of the track away from the building.

    Phase D: Subject is walking counter-clockwise around the track. The frames of interest are of the subjectwalking on the grass before the ramp.

    Phase E: Subject is walking counter-clockwise around the track. The frames of interest are of the subjectwalking on the grass along the ramp.

    These 5 phases of a sequence along with the background are illustrated in Figure 5. In this manuscript, we providetest results obtained from all sequences in Phase A where a person climbs up the staircase of the ramp and walksalong the platform across the face of the building in a clockwise manner. Some of the challenges associatedwith the tracking mechanism is the very low resolution imagery which is provided where a 17 17 neighborhoodaround the shoulder joint will capture the entire upper body of the individual. This is illustrated in Figure 1.The other challenge is the interlacing effects present in the video which can affect the computation of a goodregion/joint description. We use a combination of HOG/LBP to describe each joint region as these descriptorsare good for low-resolution imagery. Optical flow is a good mechanism to track points in low-resolution imageryand thus used along with the region descriptors and Kalman filtering for tracking.

    As shown in Figure 6, using the manual points in the first frame, we initialize the joint descriptors andthe associated Kalman trackers. These trackers then predict the corresponding joint locations in frame 2 andprovide a search region in which the true location is present. In frame 2, we compute the optical flow and getthe corresponding estimate and use this as the measurement in the correction stage of the tracker (if the opticalflow estimate falls inside the search region) and obtain the posterior estimate of the joint location. Using thisposterior estimate, the Kalman tracker predicts its position in frame 3 and the iteration continues. In Figures 7and 8, we illustrate the tracked locations of four of the joints in successive frames of the sequence along with themanual locations which are provided. We also show the polynomial fit obtained from the manual locations andthis shows that the tracking mechanism infact gives a better variation in the location(more sinusoidal) than givenby the polynomial fit where these variations can be used for different applications such as gait identification.

    In Table 1, we provide a statistical measure which gives how close the tracked joint locations are to themanually annotated locations for each sequence associated with a particular subject. This statistical metric 15 isused to compare the covariance matrices of the tracked points and the manual points of each sequence and thismeasure is given by

    d(K, Km) =

    n

    i=1

    (log(i(K, Km))2 (7)

    where K is the covariance of the tracked points, Km is the covariance matrix of the manual points, i is the ith

    eigen value associated with |KKm| = 0. From experimental observation, we find that those sequences whosetracked joint locations are somewhat close to the manual joint locations have this measure to be Thresh < 4.Those sequences where the tracking mechanism fails have their measure to be Thresh >7. We can see that forthe shoulder, elbow and hip joints, the tracking mechanism succeeds for majority of the sequences whereas thewrist joint is partially successful. The knee and the ankle joint however, looses its track after a few frames andwould require input from an automatic joint detection algorithm for reinitialization of the tracker.

    SPIE-IS&T/ Vol. 9026 90260H-12

    wnloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

  • 8/10/2019 SPIE_EI_9026_2014

    13/14

    Shoulder Elbow Wrist Hip Knee Ankle

    Sub 11 (with Coat/Vest) 0.93 1.54 2.51 2.48 7.45 5.00Sub 11 (No Vest) 1.09 1.56 2.76 2.77 6.72 7.12Sub 18 (No Vest) 9.31 9.10 7.58 8.19 5.33 6.59

    Sub 21 (with Coat/Vest) 8.95 9.75 9.62 8.80 7.95 9.60Sub 21 (No Vest) 10.94 9.65 14.85 12.01 9.30 11.60

    Sub 22 (with Coat/Vest) 1.55 2.92 3.11 1.54 3.33 5.11Sub 22 (No Vest) 2.10 1.57 2.38 3.48 5.31 3.87Sub 23 (No Vest) 3.78 3.12 2.31 2.11 5.08 6.94

    Sub 24 (with Coat/Vest) 9.97 12.04 13.04 10.60 8.75 10.80Sub 24 (No Vest) 9.98 10.62 16.79 13.05 6.98 9.84

    Sub 25 (with Coat/Vest) 8.99 10.12 9.57 12.01 9.24 12.38Sub 25 (No Vest) 10.34 9.79 10.61 12.18 7.93 9.00

    Sub 26 (with Coat/Vest) 1.51 1.73 2.06 2.01 5.62 9.21Sub 26 (No Vest) 1.30 3.03 3.87 2.59 5.70 4.74

    Sub 27 (with Coat/Vest) 10.52 9.95 8.88 10.70 9.09 11.24Sub 27 (No Vest) 10.98 10.03 8.99 8.33 8.26 7.61

    Sub 3 (with Coat/Vest) 9.84 9.11 10.87 10.92 9.42 9.15

    Sub 3 (No Vest) 7.46 6.96 7.21 7.89 8.02 7.58Sub 7 (with Coat/Vest) 0.39 2.49 2.45 2.26 5.09 6.64Sub 8 (with Coat/Vest) 1.39 1.05 2.07 0.90 2.87 4.58

    Sub 8 (No Vest) 2.02 1.23 1.91 1.30 4.98 5.56Table 1: Closeness measure between tracked and manual joint locations for each video sequence.

    6. CONCLUSIONS AND FUTURE WORK

    We have proposed a body joint tracking algorithm for use in low-resolution imagery for outdoor sequences. Thealgorithm is a combination of primitive but effective point tracking techniques such as the optical flow and regionbased matching using HOG/LBP coupled with the learning ability of the Kalman filter. Some joints such as theshoulder, elbow and hip are sucessfully tracked in most of the sequences along with the wrist joint. However, theknee and ankle joints do not get tracked due to the mismatching of the optical flow caused by very low-resolution

    imagery and interlacing effects. An important addition which we plan to add in the future work is to use thefact that the movement of body joints are related to each other. This crucial aspect is missing in this proposedalgorithm as it assumes that the joint movement is independent of the other joints. This pertains to the use ofindividual Kalman filter for each joint. By using a single Kalman filter to track a graph model connecting thevarious body joints, there can be huge increase in accuracy of the joint tracks. Further improvements can also bemade to the framework by including an automatic human pose estimation algorithm which can provide a crudeestimate of the joint locations at crucial time instants where the region/optical flow matching estimates fail.

    ACKNOWLEDGMENTS

    Funding support for this effort was supported by the National Science Foundation grant No:1240734. The authorswould also like to thank the National Signature Program and the Air Force Institute of Technology for use oftheir video captured data for this study.

    REFERENCES

    [1] Ben Shitrit, H., Berclaz, J., Fleuret, F., and Fua, P., Tracking multiple people under global appearanceconstraints, in [Computer Vision (ICCV), 2011 IEEE International Conference on], 137144 (2011).

    [2] Shao, J., Zhou, S., and Chellappa, R., Tracking algorithm using background-foreground motion modelsand multiple cues, in [Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP 05). IEEEInternational Conference on], 2, 233236 (2005).

    SPIE-IS&T/ Vol. 9026 90260H-13

    wnloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

  • 8/10/2019 SPIE_EI_9026_2014

    14/14

    [3] Lu, W.-L. and Little, J., Simultaneous tracking and action recognition using the pca-hog descriptor, in[Computer and Robot Vision, 2006. The 3rd Canadian Conference on], 66 (2006).

    [4] Kaaniche, M. and Bremond, F., Tracking hog descriptors for gesture recognition, in [Advanced Video andSignal Based Surveillance, 2009. AVSS 09. Sixth IEEE International Conference on], 140145 (2009).

    [5] Bilinski, P., Bremond, F., and Kaaniche, M. B., Multiple object tracking with occlusions using hog de-scriptors and multi resolution images, in [Crime Detection and Prevention (ICDP 2009), 3rd International

    Conference on], 16 (2009).[6] Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A.,

    Real-time human pose recognition in parts from single depth images, in [ Computer Vision and PatternRecognition (CVPR), 2011 IEEE Conference on], 12971304 (2011).

    [7] Huang, C.-H., Boyer, E., and Ilic, S., Robust human body shape and pose tracking, in [3DV-Conference,2013 International Conference on], 287294 (2013).

    [8] Kohler, M., [Using the Kalman Filter to Track Human Interactive Motion: Modelling and Initialization ofthe Kalman Filter for Translational Motion], Forschungsberichte des Fachbereichs Informatik der UniversitatDortmund, Dekanat Informatik, Univ. (1997).

    [9] Ramakrishna, V., Kanade, T., and Sheikh, Y., Tracking human pose by tracking symmetric parts, in[Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on], 37283735 (2013).

    [10] Lucas, B. D. and Kanade, T., An iterative image registration technique with an application to stereo

    vision, in [Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2],IJCAI81, 674679 (1981).

    [11] Dalal, N. and Triggs, B., Histograms of oriented gradients for human detection, in [Computer Vision andPattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on], 1, 886893 vol. 1 (2005).

    [12] Ojala, T., Pietikainen, M., and Maenpaa, T., Multiresolution gray-scale and rotation invariant textureclassification with local binary patterns, Pattern Analysis and Machine Intelligence, IEEE Transactionson24(7), 971987 (2002).

    [13] Lacey, T., Tutorial: The kalman filter,Georgia Institute of Technology.

    [14] Welch, G. and Bishop, G., An introduction to the kalman filter, (1995).

    [15] Forstner, W. and Moonen, B., A metric for covariance matrices, (1999).