Applied Soft Computing - ایران عرضهiranarze.ir/wp-content/uploads/2016/10/E2180.pdfApplied Soft Computing 47 (2016) 33–46 Contents lists available at ScienceDirect Applied

Rr

RD

a

ARRAA

KTLMBTA

1

Cateiioeovtls

obsVi

s

h1

Applied Soft Computing 47 (2016) 33–46

Contents lists available at ScienceDirect

Applied Soft Computing

journa l homepage: www.e lsev ier .com/ locate /asoc

obust least squares twin support vector machine for human activityecognition

eshma Khemchandani ∗, Sweta Sharmaepartment of Computer Science, Faculty of Mathematics and Computer Science, South Asian University, New Delhi, India

r t i c l e i n f o

rticle history:eceived 5 August 2015eceived in revised form 13 April 2016ccepted 16 May 2016vailable online 24 May 2016

eywords:

a b s t r a c t

Human activity recognition is an active area of research in Computer Vision. One of the challenges of activ-ity recognition system is the presence of noise between related activity classes along with high trainingand testing time complexity of the system. In this paper, we address these problems by introducing aRobust Least Squares Twin Support Vector Machine (RLS-TWSVM) algorithm. RLS-TWSVM handles theheteroscedastic noise and outliers present in activity recognition framework. Incremental RLS-TWSVMis proposed to speed up the training phase. Further, we introduce the hierarchical approach with RLS-

win support vector machineeast squares twin support vector machineulti-category classification

inary tree based structureernary decision tree

TWSVM to deal with multi-category activity recognition problem. Computational comparisons of ourproposed approach on four well-known activity recognition datasets along with real world machinelearning benchmark datasets have been carried out. Experimental results show that our method is notonly fast but, yields significantly better generalization performance and is robust in order to handleheteroscedastic noise and outliers.

ctivity recognition

. Introduction

Human Activity Recognition is an active area of research inomputer Vision. It involves automatically detecting, trackingnd recognizing human activities from the information acquiredhrough sensors, e.g., a sequence of images captured by video cam-ras [1]. The research in this area is motivated by many applicationsn surveillance systems, video annotations and human computernteractions. A large amount of work has been done in this domainver the past decades, as the reviews [1–5] have summarized. Nev-rtheless, it is still an open and challenging problem. The difficultyf the problem lies in the endless vocabulary of the activity classes,arying illumination, occlusion and intra-class differences. Besidehis, high training and testing time complexity act as a major chal-enge for the efficient implementation of the activity recognitionystem in large-sized application domain [2].

A human activity is generally represented as a combinationf 3D (or 4D) spatio-temporal (depth) features which takes theest context of shape and motion of the actor(s) [3]. Recently,
hape-based method for feature representation along with Supportector Machine (SVM) has attracted much interest [6–10]. Accord-
ngly, the features are extracted from all the training sequences to

∗ Corresponding author. Tel.: +91 9871182779.E-mail addresses: [email protected] (R. Khemchandani),

[email protected] (S. Sharma).

ttp://dx.doi.org/10.1016/j.asoc.2016.05.025568-4946/© 2016 Elsevier B.V. All rights reserved.

© 2016 Elsevier B.V. All rights reserved.

represent the motion and shape context of the actor, and then eachfeature descriptor is classified into one of the activity classes usingSVM.

The Support Vector Machine (SVM) introduced originally byVapnik and his co-workers [11], is a discriminative classifier basedon maximal margin property. It has been successfully applied tovarious pattern recognition applications including activity recog-nition [6,8]. One of the main challenges for the classical SVM isthe high training time. This drawback challenges the applicationof SVM to activity recognition system which generally has a largescale dataset.

Recently, Jayadeva et al. [12] proposed Twin Support VectorMachine (TWSVM) which has lower computational complexity andbetter generalization ability when compared with SVMs. TWSVMseeks two non-parallel proximal hyperplanes such that each hyper-plane is closer to one of the two classes and is at least unitdistance away from the samples of other class. Kumar et al. [13]introduced the Least Squares Twin Support Vector Machine (LS-TWSVM) where the authors reduce the constraints of TWSVM inthe least squares sense which lead to the solution of the two mod-ified primal QPP via solving the system of linear equations. It leadsto extremely fast and efficient performance.

Mozafari et al. [9] introduced the application of LS-TWSVM
in activity recognition system with the local space time features.Nasiri et al. [14] proposed Energy-based least squares twin supportvector machine (ELS-TWSVM) on the lines of LS-TWSVM for activ-ity recognition in which they changed the unit distance constraints
dx.doi.org/10.1016/j.asoc.2016.05.025

http://www.sciencedirect.com/science/journal/15684946

www.elsevier.com/locate/asoc

http://crossmark.crossref.org/dialog/?doi=10.1016/j.asoc.2016.05.025&domain=pdf

mailto:[email protected]

mailto:[email protected]

dx.doi.org/10.1016/j.asoc.2016.05.025

3 plied S

utfs

rpnnhmtcact

TtpdbcbrsRio

•

•

•

c(Amcafim

opiri

4 R. Khemchandani, S. Sharma / Ap

sed in LS-TWSVM by some pre-defined energy parameter in ordero reduce the effect of intrinsic noise and outliers of spatio-temporaleatures. However, this parameter has to be externally fixed whichometimes leads to instability in problem formulation.

It is evident from the real-world that some activity classes (e.g.,unning and jogging) are more closely related to each other in com-arison to other activity classes (e.g., from waving). This leads tooise in similar activity classes. However, the idea of such related-ess can be used to deal with activity recognition problem in aierarchical manner where the system is first trained to deter-ine the group of activity class and once this group is determined,

he system is made to learn to determine the particular activitylass. Mozafari et al. [10] introduced binary tree based hierarchicalpproach using LS-TWSVM in activity recognition framework butlass hierarchy was determined externally and was applied to onlyhe activity classes of walking and jogging.

Taking motivation from aforementioned ideas, we propose a LS-WSVM based classifier called RLS-TWSVM, which is more robusto the effect of noise as in the case discussed above. Further, weropose to determine the activity class hierarchy from the trainingata itself using suitable partitioning strategy. We used binary treeased [15] and ternary decision structure [16] to handle the activitylass hierarchy. To deal with the huge size of training data (mainlyecause of the large vocabulary of actions and extensive trainingequired to learn the ways in which different actors can perform theame action differently), we introduce the incremental version ofLS-TWSVM. The incremental version considers only that relevant

nformation which contributes toward improving the performancef the classifier while discarding irrelevant information.

The contribution of this paper is summed up as follows:

To deal with heteroscedastic noise present in activity recog-nition framework, this paper propose a robust and efficientclassifier based on LS-TWSVM called Robust Least Squares TwinSupport Vector Machine (RLS-TWSVM), where the parametersare introduced in the optimization problem itself. Further, thesolution of proposed framework is obtained via solving systemof linear equations. For non-linear RLS-TWSVM, we have usedShermann–Morrison–Woodbury (SMW) formula and partitionmethod [17] to further reduce the training time for learning theclassifier.Introduce the incremental version of RLS-TWSVM in activityrecognition framework to deal with the computationally inten-sive activity recognition datasets.Introduce the hierarchical framework to deal with multi-classclassification problem of activity recognition system in which thehierarchy of classes is determined using the training data andRLS-TWSVM classifier.

We have used leave-one-out cross-validation technique for ourlassification with a variety of protocols like Leave-One-Actor-OutL1AO), Leave-One-Sequence-Out (L1SO) and Leave-One-Actor-ction-Out (L1AAO) to label the activities. We show that ourethod performs well in labelling activities with a much lower

omputational cost, in comparison to other machine learningpproaches. Further, computational comparisons of our proposedramework along with other approaches have been reported on var-ous UCI machine learning datasets using five-fold cross validation

ethodology.This paper is organized as follows. Section 2 gives the brief

verview of the related work and background needed for the
roposed work. Section 3 presents the proposed human activ-
ty recognition framework. Section 4 presents the experimentalesults. Section 5 summarizes our contributions along with its lim-tations and future work.

oft Computing 47 (2016) 33–46

2. Preliminaries

A human activity recognition problem is generally approachedin two phases: Action representation followed by classification andlabelling of activities. We begin with presenting a brief review ofthe related action representation methods followed by the reviewof the classification process.

2.1. A brief review of action representation in human activityrecognition

In human activity recognition framework, action representa-tion refers to the process of extraction of those features froma video that represents the characteristics of human motioninvolved in an activity. Features which are used to represent ahuman activity in video sequence are space time features (3D)and space time-depth features (4D). A comprehensive review ofaction representation in human activity recognition framework canbe found in [2,3,5]. With the development of recent depth sen-sor technology (e.g., Microsoft Kinect [18]), depth-based actionrepresentation has gained popularity [5,19–21]. However, thedepth dimension adds to the computational complexity of fea-ture extraction process. Therefore, the space time features are stillpopular due to their ease of representation and reduced complex-ity. Space time feature representations are generally divided intotwo categories, namely local representations and global represen-tations.

Local representations describe the observation as a collectionof independent patches. The local representations are obtainedin a bottom-up fashion where spatio-temporal interest points aredetected first, and local patches are calculated around these points.Local representations are less sensitive to noise and partial occlu-sion and do not strictly require background subtraction or tracking[5]. Trajectory-based features [2,22,23] are popular examples oflocal feature representation.

Global representations, on the other hand, are obtained in atop-down fashion in which a person is localized first in the imageusing background subtraction or tracking or a combination of both.Further, the Region of Interest (ROI) is encoded into the featuredescriptor. This representation is powerful as it encodes most of theinformation residing in each frame of the video sequence. Globalaction representation techniques rely on accurate localization andtracking of the actor. Hence, when the domain allows for goodcontrol over these factors, they usually perform well [5].

Since the goal of our classifier is to reduce the effect of noiseduring the labeling phase. Hence, we have used global representa-tion technique for feature extraction as they are more sensitive tonoise when compared to local action representation techniques. Inthe following part, we have limited our discussion to the relevantrepresentation techniques only.

One of the pioneering work among the several global featurerepresentation techniques is Efros et al. [24] where the authors haveused the optic flow in order to capture motion information inher-ent in a video sequence. Working on the lines of Effros et al. [24],Danafar and Gheissari [25] developed the idea of grid based repre-sentation in which the region of interest is divided into horizontalslices that approximately contain head, body and legs. Similarly,Ali and Shah [26] derived a number of kinematic features from theoptic flow. These include divergence, vorticity, symmetry and gra-dient tensor features. Principal component analysis (PCA) is thenapplied to determine dominant kinematic modes. On the simi-
lar lines, Ikizler et al. [27] combined the work of Efros et al. [24]with histograms of oriented line segments. However, the featuresobtained in [24–26] are solely motion based which could resultsinto inadequate interpretation of activity information.

plied S

cswmcr

ttdo[wtIbift(ctbs

icwS

2

u(bViml

ktaOceCmaWl

icapft

SmTw

R. Khemchandani, S. Sharma / Ap

Using shape-based approach, Zhang et al. [28] derived the shapeontext from each video frame, where each log-polar bin corre-ponds to a histogram of motion word frequencies. The limitationith this approach could be that considering only the shape infor-ation of the actor can lead to an under-representation of the time

ontext which would be considered a drawback of the shape basedepresentation approaches.

To overcome these limitations, Tran et al. [29] used a combina-ion of silhouette and optic flow for action representation whichhey termed as motion-context descriptor. In motion-contextescriptor, silhouette give the shape context of the actor(s) andptic flow represent the time context. In a similar work, Lin et al.30] combined the idea of localization of interest regions alongith motion flow fields to extract the 3D shape-motion descrip-

or which is further used for matching with the prototype tree.n other state-of-art methods, Baumann et al. [31] used the com-ination of Volume Local Binary Patterns (VBLP) and optic flow

n order to represent the actions. Volume Local Binary Patternseatures are used to describe object characteristics in the spatio-emporal domain. Recently, Melfi et al. [6] proposed a Bag of WordBoW) modelling based approach for human activity recognition,haracterizing spatio-temporal descriptor based on a combina-ion of 3D gradient and textual features. Textual appearance basedackground subtraction technique was used in [7] for backgroundubtraction with the aim to find the robust localization of the actor.

In our work, we have used the global representation discussedn Tran et al. [29] for action representation as these motion-ontext features are easy to obtain and are discriminative [29]hich make them appropriate for machine learning framework like

VMs, etc.

.2. Related work in the activity labelling process

While performing labelling task, various methods have beensed for classifying human actions including k-Nearest Neighbourk-NN) [30,35,36], support vector machine (SVM) [8–10,14,32],oosting-based classifiers [33] and Hidden Markov Models [34], etc.arious surveys [2–4] suggests that most of the recent researches

n activity recognition have witnessed the use of support vectorachines as a powerful paradigm for activity classification and

abelling [8,39–41].In the recent work on activity labelling, authors in [35,36] used

-NN classifiers. For action representation, Toutati et al. [35] usedhe idea of the generation and fusion of a set of prototypes gener-ted from different viewpoints of the data from the video sequence.n the other hand, Vishwakarma et al. [36] used a k-NN basedlassifier for the features obtained as a combination the spatialdge distribution of gradients and orientation of human silhouettes.heema et al. [37] proposed a hierarchical approach using asym-etrical bilinear modelling on the tensorial representation of the

ction videos to characterize styles of performing different actions.ang et al. [38] used probabilistic graphical models for activity

abelling using Locality constrained Linear Coding features.The advantage of using SVMs over aforementioned approaches

s that SVMs are comparatively simpler and computationally effi-ient. An activity recognition problem deals with large samples andlso involves a lot of noise due to inter-related activity classes whichroves to be a challenging problem for SVM which itself suffersrom the drawbacks such as high computational time, sensitivityo noise and outliers and the unbalanced dataset.

Various modifications have been carried out in SVM (e.g., LS-
VM [42], TWSVM [12], LS-TWSVM [13]) to overcome aboveentioned drawbacks. In the following part, we discuss some LS-
WSVM based classifiers which form the base of our proposedork.

oft Computing 47 (2016) 33–46 35

2.2.1. Least squares twin support vector machine based classifierConsider a data set D in which m1 data points that belong to class

+1 are represented by matrix A while m2 data points that belong toclass −1 are represented by matrix B. Therefore, the sizes of matri-ces A and B are (m1 × n) and (m2 × n), respectively, where n is thedimension of feature space.

Working on the lines of least squares support vector machine(LS-SVM) [42] and TWSVM [12], Kumar et al. [13] proposed leastsquares version of TWSVM called Least Squares Twin Support Vec-tor Machine (LS-TWSVM). Similar to TWSVM, LS-TWSVM [13] seeksa pair of hyperplane given as follows:

wT1x + b1 = 0 and wT

2x + b2 = 0 (1)

The objective was to reduce the constraints of TWSVM in leastsquares sense by modifying the inequality constraints to equalityconstraints. This transforms the QPP formulation used in case ofTWSVM into a system of linear equations. The primal problem ofLS-TWSVM is given by

(LS-TWSVM 1) Minw1,b1,y2

12

||Aw1 + e1b1||2 + c1

2yT

2y2

subject to −(Bw1 + e2b1) + y2 = e2.

(2)

(LS-TWSVM 2) Minw2,b2,y1

12

||Bw2 + e2b2||2 + c2

2yT

1y1

subject to (Aw2 + e1b2) + y1 = e1,

(3)

where c1, c2 > 0, e1 and e2 are vector of ones of appropriate dimen-sions; and y1 and y2 are error variables.

The constraints of LS-TWSVM require the hyperplane to be at adistance of exactly one from points of other class which may makeLS-TWSVM sensitive to outliers [43].

Working on the line of LS-TWSVM, Nasiri et al. [14] pro-posed Energy-based least squares twin support vector machine(ELS-TWSVM) for activity recognition in which they changed theminimum unit distance measure used in LS-TWSVM by some pre-defined energy parameter in order to reduce the effect of intrinsicnoise and outliers of spatio-temporal features.

The primal problem of ELS-TWSVM is given below:

(ELS-TWSVM 1) Minw1,b1,y2

12

||Aw1 + e1b1)||2 + c1

2yT

2y2

subject to −(Bw1 + e2b1) + y2 = E1

(4)

(ELS-TWSVM 2) Minw2,b2,y1

12

||Bw2 + e2b2||2 + c2

2yT

1y1

subject to (Aw2 + e1b2) + y1 = E2

(5)

where E1 and E2 are user defined energy parameters.On the similar lines of LS-TWSVM, we obtained the solution of

QPP (4) and (5) as follows:

[w1 b1]T = −[c1GT G + HT H]−1

[c1GT E1] (6)

[w2 b2]T = [c2HT H + GT G]−1

[c2HT E2] (7)

where H = [A e1] and G = [B e2].A new data point x ∈ Rn is assigned to class i (i =+1 or −1) using

the following decision function.

f (x) =

⎧⎪⎨⎪⎩

+1, if|xiw1 + e1b1||xiw2 + e2b2| ≤ 1,

−1, if|xiw1 + e1b1||xiw2 + e2b2| ≥ 1.

ELS-TWSVM takes advantage of prior knowledge available inhuman activity recognition problem about the uncertainty andintra-class variations and thus improves the performance of theactivity recognition to some degree [14]. However, the drawback is

3 plied Soft Computing 47 (2016) 33–46

tlp

2

ftec([

2

asThi

2

movw

2

cibosA2t

2

tVrktha

acw

3

oV

Rolan


hat energy parameter has to be externally fixed which sometimeseads to instability in problem formulations which effect the overallrediction accuracy of the system.

.3. Multi-category classification approaches

SVMs are primarily binary classifier. They have been success-ully extended to multi-class classification problems. Some ofhese multi-class classification approaches have been successfullyxtended to TWSVM and LS-TWSVM as well. Among various multi-lass classification approaches most popular are One-Against-AllOAA) [44], One-Against-One (OAA) [45], Half-Against-Half (HAH)46], Ternary-Decision-Structure (TDS) [16], etc.

.3.1. One-against-all methodOne-Against-All (OAA) [44] is one of the most widely used

pproaches in activity recognition system. OAA based TWSVM clas-ifiers find K classifiers for K class problem where each class has itsWSVM hyperplane passing through its samples. This approach,owever, suffers from limitations of high training time and class

mbalance problem.

.3.2. One-against-one methodFor K-class classification problem the one-against-one (OAO)

ethod [45] needs (K(K − 1))/(2) classifiers, each of which is trainedn samples of each pair of classes. The decision strategy deals withoting rule i.e. the unlabelled test sample is assigned to the classith the largest vote.

.3.3. Binary tree based structureBinary tree based structure (BTS) is based on the idea of HAH. In

ase of binary tree based approach, we recursively divide the datanto two halves and create a binary tree of classifiers, as proposedy Shao et al. [15]. The two halves are denoted by the two nodesf tree represented by −1 and +1 and corresponding TWSVM clas-ifier is obtained that classifies the samples at right and left nodes.

balanced binary tree based approach using TWSVM determines(K − 1) TWSVM classifiers for a K-class problem. For testing, binaryree based approach requires at most � log 2K� TWSVM evaluations.

.3.4. Ternary decision tree based structureTernary Decision Structure (TDS) is a recently proposed

echnique by Khemchandani et al. [16], based on One-Versus-One-ersus-Rest approach. During the training phase, TDS approachecursively divides the training data into three groups by applying-means (k = 2) clustering and creates a ternary decision struc-ure of TWSVM classifiers. It creates a decision structure witheight � log 3K�. Thus, TDS is an improvement over OAA multiclasspproach with respect to training time of classifier.

TDS approach is more efficient than OAA and binary tree basedpproach considering the time required to build the multi-categorylassifier. Also, test sample can be tested by � log 3K� comparisons,hich is more efficient than OAA and binary tree testing time.

. Human activity recognition framework

In this paper, we consider a model in which the learner is trainedn a sequence (V1, y1), (V2, y2), . . ., (Vn, yn) of labelled videos. Video

i contains Ti frames v(i)1 , . . ., v(i)

j, . . ., v(i)

Ti, where each frame v(i)

j∈

D is represented by D features (see Section 3.1 for a description
f the features used in this work). The video is annotated with aabel yi denoting an action from a given set Y = 1, . . ., C of possiblectivities. The learner’s task is to build a classifier, mapping eachew video to its correct label.
Fig. 1. An overview of the proposed activity recognition framework.

Here, we discuss the activity recognition framework which com-prises of feature extraction and classification of activity introducedin this paper with a flowchart in Fig. 1.

3.1. Feature descriptor

Given the frames v(i)1 , . . ., v(i)

j, . . ., v(i)

Tiof video Vi, we have used

global features descriptor proposed by Tran et al. [29] for repre-senting the activity sequence. Authors have shown that their globalfeature descriptor is efficient and robust in representing the humanactivity sequence which further outperforms the complex featuresand achieves state-of-art performance [29].

Each frame of the video sequence is normalized to (120 × 120)box while maintaining the aspect ratio. The normalized boundingbox is used to compute the silhouettes using background subtrac-tion method from each frame. The frames v(i)

1 , . . ., v(i)j

, . . ., v(i)Ti

arefurther used to obtain the optic flow values using pyramidal imple-mentation of Lucas Kanade algorithm [47]. The optic flow valuesare then split into horizontal and vertical channels. This give usoptic flow values in two directions Fx and Fy. These values aresmoothed using median filter to reduce the noise. The silhouettegives us the third set of values that represents the bounding coor-dinates of the binary image in four directions, namely-left, right, topand bottom. Each of these channels is histogrammed using the fol-
lowing technique: Divide the normalized bounding box into 2 × 2sub-windows and then each sub-window is divided into 18 pieslices covering 20◦ each. These pie slices do not overlap and thecenter of the pie is in the center of the sub-window. The values

plied S

ortd

3

rfuof((

d5rcmt2

3a

tcMpfia

�FodiprmaTc

w

3

p

ws

fvlEw

R. Khemchandani, S. Sharma / Ap

f each channel are integrated over the domain of every slice. Thisesult in a 72 (2 × 2 × 18)-dimensional histogram. By concatenatinghe histograms of all 3 channels we get a 216-dimensional frameescriptor [29].

.1.1. Motion contextThe feature descriptor obtained above gives a very rich rep-

esentation by incorporating static and dynamic features of eachrame of the video sequence [29]. Following Tran et al. [29] we alsose 15 frames around a given frame vi

j and split them into 3 blocksf 5 frames named as the past, current and future. We choose a 15-rame window because a triple of them makes a second long frameat 15 fps). Each frame descriptor is stacked in a block together as a15 × 216 =)1080-dimensional vector.

This 1080-dimensional vector is then projected onto 70-imensional vector using Principal Component Analysis [48] where0, 10 and 10 dimensions are from current, past and future block,espectively. These values were optimally chosen via experimentsarried out using different combinations of values. The resultingotion context 70-dimensional descriptor is finally appended to

he current frame 216-dimensional descriptor to form the final86-dimensional motion context descriptor [29].

.2. Robust least squares twin support vector machine for humanctivity recognition

Human activities may involve some very closely related activi-ies like jumping and skipping or jogging and running. Such activitylasses involve noise due to patterns of other closely related classes.oreover, often the classes of action involve some outliers. This

roblem acts as our motivation to find a robust and efficient classi-er which provides the advantage of low computational complexitynd is more flexible to noise and outliers.

To obtain a robust model, we introduce a pair of parameters and variables �1 and �2 in order to control noise and outliers.urther the optimal values of �1 and �2 are obtained as a part ofptimization problem. On the lines of Shao et al. [49], we intro-uce an extra regularization term 1/2wT w + b2 with the idea of

mplementing structural risk minimization and also to address theroblem of ill-posed solution encountered by ELS-TWSVM. Theegularization term ensures the positive definiteness of the opti-ization problem involved. We termed this improved LS-TWSVM

s the Robust Least Squares Twin Support Vector Machine (RLS-WSVM). In RLS-TWSVM the patterns of respective classes arelustered around the following hyperplanes.

T1x + b1 = �1 and wT

2x + b2 = �2 (8)

.2.1. Linear RLS-TWSVMFor the primal problem of RLS-TWSVM, we would modify the

rimal problem of LS-TWSVM as follows:

(RLS-TWSVM 1) Minw1,b1,�1,�

12

||Aw1 + e1b1||2 + ��T1 �1 + c1

2(wT

1 w1 + b21) + c2�T �

subject to −(Bw1 + e2b1) = e2(1 − �1) − �,

(9)

here c1, Cc > 0, e1 and e2 are vector of ones of appropriate dimen-ions; and � is the error variable.

Here, the parameter (1 − �1) would replace E1 of ELS-TWSVM,urther � > 0 would control the bound on the fractions of support
ectors and hence its optimal value would ensure the effect of out-iers has been taken care of in activity recognition framework. Here,LS-TWSVM can be thought of as a special case of RLS-TWSVMhere the energy parameter is fixed to the optimal value of (1 − �1).
oft Computing 47 (2016) 33–46 37

By substituting the value of � from the equality constraint intothe objective function would lead to the following unconstrainedoptimization problem:

Minw1,b1,�1

12

||Aw1 + e1b1||2 + ��T1�1 + c1

2(wT

1w1 + b21)

+ c2||(Bw1 + e2b1) + e2(1 − �1)||2. (10)

Defining z1 = [w1 b1]T , H = [A e1] and G = [B e2], the above equa-tion becomes

Minz1,�1

12

(zT1 HT Hz1) + ��T

1�1 + c1

2zT

1 z1 + c2||Gz1 + e2(1 − �1)||2. (11)

Setting the gradient of (11) with the respect to z1 and �1, respec-tively to zero gives

HT Hz1 + c1z1 + c2GT (Gz1 + e2(1 − �1)) = 0 (12)

��1 − c2eT2(Gz1 + e2(1 − �1)) = 0 (13)

Now rearranging these equations in matrix form and solving forz1 and �1 gives[

HT H + c1I + c2GT G −c2GT e2

−c2eT2G (c2 + �)

] [z1

�1

]=

[−c2GT e2

c2eT2e2

]

which leads to the solution[z1

�1

]=

[HT H + c1I + c2GT G −c2GT e2

−c2eT2G (c2 + �)

]−1 [−c2GT e2

c2eT2e2

](14)

where I is an identity matrix of appropriate dimensions.Similarly, the QPP for obtaining the decision hyperplane of

wT2x + b2 = �2 is given by (15).

(RLS-TWSVM 2) Minw2,b2,�2,�

12

||Bw2 + e2b2||2 + ��2T �2 + c3

2(wT

2 w2 + b22) + c4�T �

subject to (Aw2 + e1b2) = e1(1 − �2) − �,

(15)

where c3, c4 > 0, e1 and e2 are vector of ones of appropriate dimen-sions; and �1 is the error variable. The corresponding solution toQPP (15) is as follows:[

z2

�2

]=

[GT G + c3I + c4HT H c4HT e1

c4eT1H (−c4 + �)

]−1 [c4HT e1

c4eT1e1

](16)

where z2 = [w2 b2] and I is an identity matrix of appropriatedimension.

A new point x ∈ Rn is assigned to class i (i = 1 or 2), depending onwhich of the two aforementioned hyperplanes the point x is closerto, i.e.

Class(i) = arg Mini=1,2

|xT wi + bi||wi|

(17)

where |·| is the perpendicular distance of point x from the planexT wi + bi = 0, i = 1, 2.

To illustrate the effect of � on LS-TWSVM, the artificial-generated Ripley’s synthetic dataset [50] has been used. It is a twodimensional dataset which includes 250 patterns. Figs. 2 and 3
show the geometric interpretations and confirms difference of LS-TWSVM and RLS-TWSVM. The positive and negative samples aredenoted as ‘*’ and ‘0’, and the error samples are marked by squarebox, respectively.

38 R. Khemchandani, S. Sharma / Applied S

Fig. 2. Geometric interpretation of linear RLS-TWSVM on synthetic dataset.

3

T

K

wQQ

ws

l

ws

Fig. 3. Geometric interpretation of linear LS-TWSVM on synthetic dataset.

.2.2. Non-linear RLS-TWSVMWe extend the linear RLS-TWSVM to the non-linear RLS-

WSVM by considering the following kernel generated surfaces:

(xT , CT )u1 + b1 = �1 and K(xT , CT )u2 + b2 = �2 (18)

here C = [A; B] and K is an arbitrary kernel function. The primalPP for first hyperplane is modified similar to the case of linearPP (9) as:

Minu1,b1,�1

12

||K(A, CT )u1 + e1b1||2 + ��T1�1 + c1

2(uT

1u1 + b21) + c2�T �

subject to −(K(B, CT )u1 + e2b1) = e2(1 − �1) − �,(19)

here c1, c2 > 0, e1 and e2 are vector of ones of appropriate dimen-ions; and � is the error variable.

Similarly, second kernel surface will be obtained by solving fol-owing optimization problem:

Minu2,b2,�2

12

||K(B, CT )u2 + e2b2||2 + ��T2�2 + c3

2(uT

2u2 + b22) + c4�T �

subject to (K(A, CT )u2 + e1b2) = e1(1 − �2) − �(20)

here c3, c4 > 0, e1 and e2 are vector of ones of appropriate dimen-ions; and � is the error variable.


By defining N = [K(A, CT)e1] and M = [K(B, CT)e2] the solution tothe QPP (19) and (20) can be derived as (21) and (22), respectively[

�1

�1

]=

[NT N + c1I + c2MT M −c2MT e2

−c2eT2M (c2 + �)

]−1 [−c2MT e2

c2eT2e2

](21)

where �1 = [u1 b1]and[

�2

�2

]=

[MT M + c3I + c4NT N c4NT e1

c4eT1M (−c4 + �)

]−1 [c4NT e1

c4eT1e1

](22)

where �2 = [u2 b2]The decision function criteria for the non linear RLS-TWSVM is

similar to linear RLS-TWSVM.Further, in order to obtain the inverse of matrices

involved in (14), (16), (21) and (22), we here propose to useShermann–Morrison–Woodbury (SMW) formula and partitionmethod [17] which are illustrated in Appendix Appendix 1.

The linear RLS-TWSVM obtains its classifier by inverting matrixof order (n + 2) where n is the dimension of the input space. Itcan be further noted here that the solution of non-linear RLS-TWSVM using Eqs. (21) and (22) requires inversion of matrix oforder (m + 1) × (m + 1) twice. This would lead to high computationcost. However, the computation cost can be reduced by using par-tition method [17] and SMW method [17] which involves solvingtwo inverse problem of dimension (m1 × m1) and (m2 × m2) wherem1 m and m2 m, respectively.

3.2.3. Incremental RLS-TWSVMActivity recognition system is a complex problem in terms of

storage and computational power due to large number of pos-sible activities an actor can perform leading to large number ofactivity classes. The requirement of an intensive training stage forgood performance is also needed which may otherwise affect theperformance of the system. The limitation of the existing methodto handle the large datasets in limited time scenario motivatedus towards Incremental RLS-TWSVM. Incremental RLS-TWSVMimproves the speed of the training phase of the system. It skipsthe correctly assigned frames, which improves the training speedwithout affecting its generalization ability. Algorithm 1 gives thealgorithm for the incremental RLS-TWSVM in human recognitionframework.

Consider a dataset D divided into t number of chunks such thateach chunk is represented by matrix of dimension mt × n and thelabel information stored in mt × 1 vector. We generate an incre-mental linear classifier by retiring part of old data represented bythe sub matrix E0 ∈ Rmt×n and considering only those data pointsE1 ⊂ E0 that were wrongly classified by the system using the exist-ing hyperplanes obtained using E0, along with E2 ⊂ E0 that are thesupport vectors for the existing hyperplanes. These data points E1and E2 are appended with the new data in order to obtain the newRLS-TWSVM based classifier.

Fig. 4 depicts the flow chart for a activity recognition systemusing incremental approach.

Note that since each data block is relatively small as comparedto the large dataset, we need to store only small number of datapoints at a time for finding the classifiers. Furthermore, this incre-mental process allows us to handle arbitrarily large datasets, bysuccessively adding blocks of data.

3.3. Activity hierarchy for classification

Certain human activities (e.g., Walking and Running) are moreclosely related to each other when compared to other activities (forexample from Waving and Pushing). This knowledge of relatedness

R. Khemchandani, S. Sharma / Applied S

cAtcp

ingrRarcc

AnITODTSSS

S

S

ut

4

Mu

Further, we have performed Friedsman’s test on the resultsobtained in Tables 2 and 3. The Friedman test [57] is a non-parametric test for comparing three or more related samples andit makes no assumptions about the underlying distribution of the

Table 1UCI datasets used in the experiments.

Datasets Class Features Size

Iris 3 4 150Wine 3 13 178Seeds 3 7 210Ecoli 5 7 327

Fig. 4. Flowchart depicting incremental approach.

an be used to improve and simplify the system’s performance.ctivity Recognition can be dealt as a multi-level problem. Once

he particular group of activity class is determined, further classifi-ations and labelling is done to determine the exact activity beingerformed by the actor.

BTS and TDS tree based approaches exploit this idea and greatlymprove the system time complexity as well. Starting from the rootode, at each level training activties are divided into two (or three)roups using k-means (k = 2) clustering which groups the closelyelated activity classes together. These groups form the child nodes.LS-TWSVM learns the classifier to distinguish between the classest child nodes and then recursively keeps on doing so until it haseached the bottom where each leaf node represent a single activitylass. A test video is assigned an activity class based on the majoritylass label assigned to the features from the test video.

lgorithm 1. Incremental RLS-TWSVM for human activity recog-ition.

nput:raining dataset D, No. of chunks=t, Parameter values,utput:ecision Parametersraining Phase:tep 0: Divide D into t chunks E(0), E(1),. . .,E(t) such that ∪t

i=0E(i) = D

tep 1: Initialize the current increment of data as E0

tep 2:Use RLS-TWSVM classifier to obtain the hyperplanes separating theclasses present in E0.

tep 3: Let E1 denotes the wrongly classified data in the current subsetincrement E0 and E2 denotes the support vectors corresponding to decisionhyperplane of E0.

tep 4: Let E(N) denotes the new data increment. Re-initializeEN = E(N) + (E1 ∪ E2) and go to step 2.

Figs. 5 and 6 illustrate the hierarchy of activity classes obtainedsing binary tree based structure (BTS) and Ternary Decision Struc-ure (TDS), respectively with Weizmann dataset [51].

. Experimental results

The experiments are performed in MATLAB version 8.0 undericrosoft Windows environment on a machine with 3.40 GHz CPU

nder 16 GB RAM. We chose radial basis function (RBF) kernel for

oft Computing 47 (2016) 33–46 39

our classifiers because of non-linear relationship between actionclasses and histogrammed feature obtained in the descriptor forhuman activity recognition problem.

4.1. Parameter selection

For UCI benchmark datasets [52], we applied grid search method[53] to tune the parameters �, ci (i = 1 to 4) and kernel parameter�. For each dataset, a validation set comprising of 10% randomlyselected samples from the dataset is used. For our implementation,we have selected values of c1, c2, c3 and c4 from the range (0, 1]. Theparameters � are tuned in the range 0.1–1. Once the optimal param-eters are obtained, the validation set is sent back to the training setto find the optimal decision parameter. Mean of classification accu-racy along with the standard deviation is determined using 5-foldcross validation [54].

For experiments on activity recognition datasets, optimal valuesof parameter c1, c2, c3, c4, � and kernel parameter � were obtainedusing grid search with a set comprising 10% of the frames fromeach video sequence. Once the optimal value of these parametersare obtained these frames are sent back to the training set to obtainthe optimal values of the decision parameters.

4.2. UCI benchmark dataset

In order to prove the competence of proposed work in variedfields and applications, we performed classification experimentson a variety of benchmark datasets. These datasets include UCIdatasets [52] which are commonly used in testing the efficacy ofmachine-learning algorithms. The details of the datasets used havebeen summarized in Table 1. The samples are normalized beforelearning such that the features are located in the range [0, 1].

4.2.1. Classification resultsIn order to compare the computational performance of

RLS-TWSVM, we have also implemented LS-TWSVM [13], and WLS-TWSVM [55], with OAA, OAO, BTS and TDS approaches. Further,we have also implemented multiple birth support vector machine(MBSVM) [56].

The classification results obtained via 5-fold cross validationstrategy using Gaussian kernel for RLS-TWSVM, LS-TWSVM andWLTWSVM using One-Against-All (OAA) and One-Against-One(OAO) along with results from MBSVM are reported in Table 2.Results with BTS and TDS based tree approaches as are reportedin Table 3. In all the tables best results are marked in bold val-ues. The table demonstrates that RLS-TWSVM mostly outperformsthe other algorithms in terms of classification accuracy. Further,the results show that the standard deviation of results obtained viaRLS-TWSVM is less compared to other algorithm indicating the factthat RLS-TWSVM is more robust.

Glass 6 9 214Dermatology 6 34 366Satlog 6 36 6435Optical recognition of handwritten digits 10 64 5620

40 R. Khemchandani, S. Sharma / Applied Soft Computing 47 (2016) 33–46

Fig. 5. Activity class hierarchy of Weizmann dataset using binary tree based approach.

Fig. 6. Activity class hierarchy of Weizmann dataset using ternary decision structure.

Table 2Classification results on UCI benchmark datasets with Gaussian Kernel using OAA and OAO multi-category approaches.

Dataset OAA Pairwise

Mean accuracy (%) ± SD

LS-TWSVM RLS-TWSVM WLS-TWSVM MBSVM LS-TWSVM RLS-TWSVM WLS-TWSVM

Iris 97.33 ± 4.35 98.67 ± 2.98 97.33 ± 3.80 96.67 ± 2.36 99.33 ± 1.49 96.67 ± 3.33 98.67 ± 2.98Wine 98.86 ± 2.56 98.29 ± 2.56 98.17 ± 4.07 97.14 ± 2.02 98.29 ± 2.56 98.86 ± 1.56 98.17 ± 2.56Seeds 93.81 ± 6.43 94.29 ± 3.61 91.42 ± 4.87 95.24 ± 1.68 93.33 ± 3.10 94.29 ± 2.71 93.33 ± 3.10Ecoli 86.15 ± 5.86 87.08 ± 5.06 86.14 ± 3.51 85.54 ± 4.01 86.46 ± 6.47 88 ± 5.03 86.86 ± 5.01Glass 64.29 ± 9.37 68.1 ± 5.48 62.88 ± 4.37 67.14 ± 5.93 68.67 ± 9.37 70 ± 5.48 69.16 ± 6.43Dermatology 95.34 ± 2.29 92.88 ± 1.15 87.11 ± 5.42 95.62 ± 1.50 96.16 ± 2.63 98 ± 2.08 93.13 ± 2.26Satlog 89.29 ± 0.56 89.73 ± 0.85 86.82 ± 0.52 90.85 ± 0.42 91.35 ± 0.17 90.07 ± 0.62 91.37 ± 0.21Opt digits 98.87 ± 0.32 98.98 ± 0.28 68.91 ± 3.42 97.99 ± 0.43 99.11 ± 0.24 99.20 ± 0.28 99.43 ± 0.06

Mean accuracy 90.49 ± 3.93 91.00 ± 2.74 84.85 ± 3.74 90.77 ± 2.29 91.33 ± 3.25 91.89 ± 2.63 91.26 ± 2.83

R. Khemchandani, S. Sharma / Applied Soft Computing 47 (2016) 33–46 41

Table 3Classification results on UCI benchmark datasets with Gaussian Kernel using BTS and TDS multi-category approaches.

Dataset BTS TDS

Mean accuracy (%) ± SD

LS-TWSVM RLS-TWSVM WLS-TWSVM LS-TWSVM RLS-TWSVM WLS-TWSVM

Iris 96.67 ± 2.79 98 ± 2.98 97.33 ± 2.49 97.33 ± 2.79 98.67 ± 2.98 98.33 ± 2.49Wine 99.43 ± 1.28 98.29 ± 2.56 94.96 ± 3.09 100 ± 0 98.29 ± 2.56 98.17 ± 2.52Seeds 93.81 ± 6.21 94.76 ± 1.99 93.80 ± 3.61 96.67 ± 3.98 95.71 ± 3.10 92.38 ± 4.57Ecoli 86.5 ± 6.06 86.77 ± 5.28 85.88 ± 5.91 87.38 ± 2.53 87.69 ± 1.88 84.42 ± 3.67Glass 63.81 ± 7.37 64.76 ± 7.60 68.80 ± 7.26 66.19 ± 10.83 7 ± 8.52 72.83 ± 9.40Dermatology 95.34 ± 2.81 96.99 ± 2.81 93.55 ± 4.14 91.51 ± 4.58 96.16 ± 2.45 95.08 ± 3.90Satlog 89.15 ± 1.11 87.85 ± 0.75 83.43 ± 1.08 89.14 ± 0.84 89.2 ± 0.83 83.80 ± 1.71Opt digits 99.02 ± 0.51 98.99 ± 0.67 90.94 ± 0.47 98.79 ± 0.62 98.33 ± 0.67 90.64 ± 1.70

Mean accuracy 90.46 ± 3.51 90.80 ± 3.08 88.59 ± 3.57 90.87 ± 3.27 91.75 ± 2.87 89.46 ± 3.74

Table 4Friedsman’s test on OAA and OAO using results computed in Table 2.

Dataset OAA OAO

Friedsman’s test

LS-TWSVM RLS-TWSVM WLS-TWSVM MBSVM-TWSVM LS-TWSVM RLS-TWSVM WLS-TWSVM

Iris 2.5 1 2.5 4 1 3 2Wine 1 2 3 4 2 1 3Seeds 3 2 4 1 2.5 1 3Ecoli 2 1 3 4 3 1 2Glass 3 1 4 2 3 1 2Dermatlogy 2 3 4 1 2 1 3Statlog 3 2 4 1 2 3 1Opt digits 2 1 4 3 2 3 1

Mean rank 2.31 1.625 3.56 2.5 2.18 1.75 2.125

Table 5Friedsman’s test on BTS and TDS using results computed in Table 3.

Dataset BTS TDS

Friedsman’s test

LS-TWSVM RLS-TWSVM WLS-TWSVM LS-TWSVM RLS-TWSVM WLS-TWSVM

Iris 3 1 2 3 1 2Wine 1 3 2 1 2 3Seeds 2 1 3 1 2 3Ecoli 2 1 3 2 1 3Glass 3 2 1 3 2 1Dermatology 2 1 3 3 1 2Satlog 1 2 3 2 1 3

dp2ctomOTt

4

4

p(b(

Opt digits 1 2 3

Mean rank 1.875 1.625 2.5

ata. It ranks the algorithms for each data set separately, the besterforming algorithm getting the rank of 1, the second best rank

and so on. In case of ties average ranks are assigned. It thenompares the average ranks of algorithms, Rj = 1

N �irji

where rji

ishe rank of the jth algorithm on the ith of N data set. The resultsf Friedsman’s rank analysis is given in Tables 4 and 5. Since theean rank of RLS-TWSVM is 1.625, 1.75, 1.625 and 1.5 in case ofAA, OAO, BTS and TDS, respectively, this clearly shows that RLS-WSVM performed better than other algorithms reported here inhis paper.

.3. Activity recognition datasets

.3.1. Weizmann datasetWeizmann dataset [51] consists of 93 low-resolution (180 × 144

ixels) video sequences of 10 natural actions such as walkingwalk), running (run), jumping (jump), galloping sideways (side),ending (bend), one-hand-waving (wave1), two-hands-wavingwave2), jumping in place (pjump), jumping jack (jack), and

1 2 3

2 1.5 2.5

skipping (skip). Each action is performed by 9 persons. Some exam-ple actions belonging to weizmann dataset are given in Fig. 7a.

4.3.2. UIUC1 datasetUIUC1 is a high resolution (300 px) dataset collected by Tran

et al. [29]. It contains 8 different actors and 14 human actions. Thereare 532 sequences due to repeating the activities by the actors.The action sequence are more varied such as walking, raise-1-hand,crawling and pushing etc. Some example actions belonging to thisdataset are given in Fig. 7b.

4.3.3. IXMAS datasetThe IXMAS dataset [58] has 13 action classes (check watch, cross

arms, scratch head, sit down, get up, turn around, walk, wave,
punch, kick, point, pick up, throw over head and throw from bottomup) performed by 12 subjects, each 3 times. The scene is capturedby 5 cameras. Each camera has captured different viewing angles.Some example actions belonging to this dataset are given in Fig. 7c.


Fig. 7. Dataset used for

Table 6The variations in activity datasets.

Dataset Sequences Actors Action Views Dimension offeature set

Weizmann 93 9 10 1 5511 × 286UIUC1 532 8 14 1 42934 × 286

4

pb

i

4

o

1mann dataset using L1AAO methodology with TDS approach. As

UMD 100 1 10 2 15380 × 286IXMAS 36 12 13 1 41802 × 286

.3.4. UMD datasetThe UMD dataset [59] contains 100 sequences of 10 activities

erformed 10 times each by only one actor. Some example actionselonging to this dataset are given in Fig. 7d.

The details of the datasets used are summarised in Table 6. Typ-cal images for some activities are shown in Fig. 7.

.3.5. Evaluation methodologyWe used the following evaluation leave-one-cross validation-

ut methodology to report our results:

. Leave 1 Actor Out (L1AO) removes all video sequences of oneactor from the training set and use that actor’s video sequenceto measure the prediction accuracy.

the experiments.

2. Leave 1 Actor-Action Out (L1AAO) removes all examples of thequery activity performed by the query actor from the trainingset and measures prediction accuracy. This is more difficult taskthan L1AO.

3. Leave One Sequence Out (L1SO) removes only the query videosequence from the training set. If an actor performs every actiononce this protocol is equivalent to L1AAO, otherwise it appearsto be easy.

4.3.6. Performance resultsTraditionally human activity recognition systems use SVM for

classification and labelling task with OAA based multi-categoryapproach. OAA suffers from the problem of high training time andclass imbalance issue. We compared the performance of binarytree based BTS and ternary decision tree based TDS approaches ofhandling the activity recognition problem over using OAA multi-category TWSVM based approaches.

Tables 7 and 8 give the activity wise confusion matrix ofRLS-TWSVM and LS-TWSVM for Weizmann dataset using L1AOmethodology with OAA approach. Tables 9 and 10 give the activitywise confusion matrix of RLS-TWSVM and LS-TWSVM for Weiz-

evident from the confusion matrices, RLS-TWSVM outperformsLS-TWSVM in almost all activity classification using both themethodology. Even in cases where the noise was more, as in case

R. Khemchandani, S. Sharma / Applied Soft Computing 47 (2016) 33–46 43

Table 7Confusion matrix of L1AO obtained via RLSTWSVM with OAA approach using Weizmann dataset.

Bend Jack Jump Pjump Run Side Skip Walk Wave1 Wave2

Bend 96.7 0 0.2 0 0.2 0 3.0 0 0 0Jack 0 98.8 0 1.3 0 0 0 0 0 0Jump 0 0 87.9 0 0 0 11.7 0.4 0 0Pjump 0 1.5 0 98.5 0 0 0 0 0 0Run 0 0 0 0 88.5 0 11.5 0 0 0Side 0 0 0.7 0 .5 97.5 1.1 0.2 0 0Skip 0 0 3.2 0 6.3 0 88.9 1.7 0 0Walk 0 0 0 0 1.6 1.1 0.9 96.4 0 0Wave1 0 0 0 0 0 0 0 0 99.5 0.5Wave2 0.8 0.7 0 0 0 0 0 0 0 98.5

Table 8Confusion matrix of L1AO obtained via LS-TWSVM with OAA approach using Weizmann dataset.


Bend 94.3 0 0.1 0.8 0 0.3 3.9 0.1 0.5 0Jack 0 99.3 0 0.7 0 0 0 0 0 0Jump 1.2 0 82.4 0 0 0.2 14.6 1.6 0 0Pjump 0 0.4 0 99.6 0 0 0 0 0 0Run 0 0 0 0 85.1 0 11.4 3.5 0 0Side 0 0 0 0 0 87.8 0 12.2 0 0Skip 0 0 4.8 0 12.4 0.4 77.7 4.6 0 0Walk 0 0 0 0 0 0.3 0.3 99.4 0 0Wave1 0.2 0 0 0 0 0.5 0 0 97.7 1.7Wave2 0 1.3 0 0 0 0 0 0 0 98.7

Table 9Confusion matrix of L1AAO obtained via RLS-TWSVM with TDS approach using Weizmann dataset.


Bend 97.1 0 1.3 0 0 0 0.2 0 0.3 1.1Jack 0 98.5 0 1.5 0 0 0 0 0 0Jump 0 0 80.6 0 0 0.2 19.2 0 0 0Pjump 0 1.1 0 98.9 0 0 0 0 0 0Run 0 0 0 0 75.4 0 24.6 0 0 0Side 0 0 0 0 0 97.7 0 2.3 0 0Skip 0 0 3.2 0 10.3 1.5 83.0 2.1 0 0Walk 0 0 0 0 0.1 1.0 0.7 98.1 0 0Wave1 0 0 0 0 0 0 0 0 99.5 0.5Wave2 0.8 1.5 0 0 0 0 0 0 0.5 97.2

Table 10Confusion matrix of L1AAO obtained via LS-TWSVM with TDS approach using Weizmann dataset.


Bend 98.1 0 0.6 0 0 0 0.8 0 0 0.5Jack 0 98.2 0 1.8 0 0 0 0 0 0Jump 0 0 77.5 0 0 0 22.5 0 0 0Pjump 0.4 1.1 0 98.5 0 0 0 0 0 0Run 0 0 0 0 73.7 0 26.3 0 0 0Side 0 0 0 0 0 97.0 0 3.0 0 0Skip 0 0 3.6 0 7.8 2.3 80.9 5.5 0 0

0.3

0

0

ohffmfs

Ldpct

Walk 0 0 0 0

Wave1 0.3 0 0 0

Wave2 1.8 0.5 0 0

f activity classes like skip and jump or skip and run, RLS-TWSVMas been able to reduce the effect of noise. Furthermore, the per-

ormance of tree based TDS approach is better than OAA as evidentrom the respective confusion matrices. With the help of confusion

atrices, we set a threshold limit upon the maximum number oframes required to be labelled with a particular training activityuch that the video sequence can be labelled with that activity.

Table 11 summarizes the results of proposed method on using1AO evaluation methodology on the aforementioned activity
atasets. The best results have been marked in bold values. By com-aring our results with other closely related approaches, it can beoncluded that the prediction accuracy of RLS-TWSVM is betterhan several state-of-the-art methods. The other advantage of using
1.0 0.6 98.1 0 00 0 0 99.5 0.20 0 0 0.2 97.6

RLS-TWSVM over other approaches is much reduced time complex-ity of the training and testing phase. RLS-TWSVM achieves perfectaccuracy on Weizmann dataset, UIUC1 and UMD datasets. How-ever, it should be noted here that it is relatively easy to obtain 100%classification accuracy on UMD dataset because system was trainedwith one actor performing same set of actions multiple time. Someapproaches like [34] performed better than the proposed approachon IXMAS dataset. It should be noted that this comparison [34] usedthe local SIFT feature for feature extraction along with probabilistic
HMM classifier trained with different viewpoints.
Table 12 comprise the results of prediction accuracy of RLS-TWSVM in comparison with TWSVM and LS-TWSVM on Weizmanndataset using all the three protocols. Similarly, Table 13 gives the


Table 11Comparison of the performance of RLS-TWSVM classifier with different multicategory approach on the four datasets to the state of art methods, as reported in citedpublications.

Weizmann % Classifier Year UIUC1 % Classifier Year

Tran et al. [29] 89.66 kNN 2008 Tran et al. [29] 98.31 kNN 2008Lin et al. [30] 100 kNN 2009 Manosha et al. [8] 98.84 SVM 2012Ali [26] 95.75 MIL 2010 Wang et al. [39] 98.3 SVM 2013Melfi et al. [6] 95.25 SVM 2013 Lin et al. [32] 98.3 SVM 2014Touati et al. [35] 92.3 kNN 2014Vishwakarma [36] 95.6 kNN 2016Bauman et al. [31] 100 Random forest classifier 2016Vishwakarma [36] 100 SVM 2016Proposed method 100 RLS-TWSVM (OAA) 2016 Proposed method 100 RLS-TWSVM (OAA) 2016Proposed method 99.7 RLS-TWSVM (TDS) 2016 Proposed method 100 RLS-TWSVM (TDS) 2016Proposed method 96.4 RLS-TWSVM (BTS) 2016 Proposed method 99.8 RLS-TWSVM (BTS) 2016

IXMAS % Classifier Year UMD % Classifier Year

DuTran et al. [29] 57.82 kNN 2008 Tran et al. [29] 100 kNN 2008Wu et al. [40] 78.02 MKL-SVM 2011 Tran et al. [29] 100 kNN 2008Wang et al. [38] 76.5 Probablistic graphical models 2012Li et al. [41] 81.22 MKL-SVM 2012Cheema [37] 58.48 Symmetric modelling 2014Bauman et al. [31] 80.55 Random forest classifier 2015Ji et al. [34] 86.4 HMM 2015

016

016

016

cmFi

Proposed method 82.12 RLS-TWSVM (OAA) 2Proposed method 84.9 RLS-TWSVM (TDS) 2Proposed method 78.64 RLS-TWSVM (BTS) 2

omparison of results obtained via incremental RLS-TWSVM, incre-ental TWSVM and incremental LS-TWSVM on UIUC1 dataset.

urther, a better prediction accuracy has been obtained byncremental RLS-TWSVM in comparison with the incremental

Fig. 8. Comparison of training time for Weizm

Proposed method 100 RLS-TWSVM (OAA) 2016Proposed method 100 RLS-TWSVM (TDS) 2016Proposed method 100 RLS-TWSVM (BTS) 2016

LS-TWSVM. It should be noted here that although tree basedapproaches are capable to deal with the unbalanced class prob-lem, still their structure leads to the loss of information at nodes.So, in some case, the misclassification at the upper node propagate

ann dataset with first query sequence.

R. Khemchandani, S. Sharma / Applied S

Table 12Comparison of prediction accuracy on Weizmann dataset.

TWSVM LS-TWSVM RLS-TWSVM

L1SO OAA 100 99.7 100TDS 100 99.5 99.7BTS 100 100 100

L1AO OAA 100 99.7 96.4TDS 99.7 100 99.7BTS 99.6 99.5 99.8

L1AAO OAA 100 100 100TDS 100 96 100BTS 100 99.4 99.7

Table 13Comparison of prediction accuracy on UIUC1 dataset.

Incremental Incremental IncrementalTWSVM LS-TWSVM RLS-TWSVM

L1SO OAA 93.5 96.64 98.64TDS 96.42 96.42 99.8BTS 96.24 97.36 100

L1AO OAA 100 99.96 100TDS 99.70 99.70 100BTS 100 99.7 99.8

L1AAO OAA 95.04 97.36 97.36TDS 97.36 98.5 99.7

toO

TcIbim

5

soaiiiTnmoRmhss

ieiwd

[

[

[12] Jayadeva, R. Khemchandani, S. Chandra, Twin support vector machines forpattern classification, IEEE Trans. Pattern Anal. Mach. Intell. 29 (5) (2007)905–910.

BTS 97.36 98.5 99.7

o the leaf node in the tree and hence the classification accuracyf tree based classifier may be marginally low or similar to that ofAA based approach.

In order to demonstrate the capabilities of incremental RLS-WSVM, we have provided a comparison of training timeomparison between TWSVM, LS-TSVM and RLS-TWSVM andncremental RLS-TWSVM in Fig. 8 for Weizmann dataset. From thear chart, it is evident that Incremental RLS-TWSVM takes approx-

mately half of the training time as compared to other reportedethods.

. Conclusions

In this paper, we have proposed an improved and robust ver-ion of LS-TWSVM called RLS-TWSVM to incorporate the effect ofutliers and heteroscedastic noise involved due to closely relatedctivity classes in human activity recognition framework. Wentroduced the Incremental RLS-TWSVM to deal with large datasetsn activity recognition problem. We introduce the idea of activ-ty class hierarchy to use the idea of relatedness among activities.he proposed framework addresses the problem of heteroscedasticoise, high training time and unbalanced dataset. In our experi-ents, we have used feature descriptor based on silhouette and

ptic flow. To investigate the validity and the effectiveness ofLS-TSVM, experiments were conducted on the well-known Weiz-ann, UIUC1, IXMAS and UMD human action datasets. Experiments

ave also been carried out on several UCI datasets. The resultshowed the out-performance of proposed RLS-TWSVM over othertate-of-art methods.

In this paper, we have considered datasets where only one actors performing a single action. However, it would be interesting toxplore the application of the proposed approach to such scenar-os where actor-actor and object-actor interaction is involved. This

ould further open the window to explore other trajectory andepth based features.

[

oft Computing 47 (2016) 33–46 45

Acknowledgements

The authors would like to thank the editor and the anonymousreviewers whose valuable comments and feedback have helped usto improve the content and presentation of the paper.

Appendix 1.

We now explain how to use the partition method for the inver-sion of a general nonsingular matrix A, that can be partitioned as[

B CE F

]. Here, we assume that the inverse of matrix A, viz. A−1,

can also be partitioned similarly as

[X YW V

]. X, Y, W, and V are

matrices of the same orders as B, C, E and F, respectively, and

are defined as V = [F − EB−1C]−1

; Y = [−B−1CV]; W =− VEB−1; andX = [B−1 − B−1CW]. We observe, that for computation of matrices X,Y, W and V, we require the inverse of matrices B and [F − EB−1C]only.

For example, in the context of RLS-TWSVM, let considerthe 2 × 2 matrix, then in our case the matrix B is of theform NNT + c1I + c2MTM, and matrices C, E, and F are defined asC =− c2MTe2, E =− c2e2

T, F = (c2 + �), respectively. Therefore, for thecomputation of X, Y, W and V, we need to compute the inverse ofmatrices B = NNT + c1I + c2MTM and F, which always exists becauseB and F are positive definite matrices.

Furthermore, following partition method and applyingthe SMW formula [17], we obtain the following expression

(NNT + c1I + c2MT M)−1 = Z − ZMT ( I

c2+ MZMT )

−1MZ where

Z = Ic1

[I − NT (C1I + NNT ) − IN]. Here we requires inverting amatrix whose order is the dimension of input data samplesonly. The input dimension is usually smaller than the number ofsamples.

References

[1] J.K. Aggarwal, X. Lu, Human activity recognition from 3D data: a review,Pattern Recognit. Lett. 48 (2014) 70–80.

[2] Y. Yang, N. Cheng, M. Zhang, Research on activity recognition method basedon human motion trajectory features, J. Converg. Inf. Technol. 7 (1) (2012).

[3] G. Cheng, Y. Wan, A.N. Saudagar, K. Namuduri, B.P. Buckles, Advances inHuman Action Recognition: A Survey, 2015 arXiv:1501.05964.

[4] J.M. Chaquet, E.J. Carmona, A. Fernndez-Caballero, A survey of video datasetsfor human action and activity recognition, Comput. Vis. Image Underst. 117(6) (2013) 633–659.

[5] R. Poppe, A survey on vision-based human action recognition, Image Vis.Comput. 28 (6) (2010) 976–990.

[6] R. Melfi, S. Kondra, A. Petrosino, Human activity modeling by spatio temporaltextural appearance, Pattern Recognit. Lett. 34 (15) (2013) 1990–1994.

[7] D.K. Vishwakarma, R. Kapoor, Hybrid classifier based human activityrecognition using the silhouette and cells, Expert Syst. Appl. 42 (20) (2015)6957–6965.

[8] K.G. Manosha Chathuramali, R. Rodrigo, Faster human activity recognitionwith SVM, in: 2012 International Conference on Advances in ICT for EmergingRegions (ICTer), IEEE, 2012, pp. 197–203.

[9] K. Mozafari, J. Nasiri, N.M. Charkari, S. Jalili, Action recognition by localspace-time features and least square twin svm (LS-TSVM), in: 2011 FirstInternational Conference on Informatics and Computational Intelligence (ICI),IEEE, 2011, pp. 287–292.

10] K. Mozafari, J. Nasiri, N.M. Charkari, S. Jalili, Hierarchical least square twinsupport vector machines based framework for human action recognition, in:2011 7th Iranian Conference on Machine Vision and Image Processing (MVIP),IEEE, 2011, pp. 1–5.

11] V. Vapnik, The Nature of Statistical Learning Theory, Springer ScienceBusiness Media, 2013.

13] M.A. Kumar, M. Gopal, Least squares twin support vector machines forpattern classification, Expert Syst. Appl. 36 (4) (2009) 7535–7543.

http://refhub.elsevier.com/S1568-4946(16)30226-5/sbref0005












































https://arxiv.org/abs/1501.05964

https://arxiv.org/abs/1501.05964

































































































































































































































































4 plied S

[

[

[

[[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[[

[[

[

[

[58] D. Weinland, R. Ronfard, E. Boyer, Free viewpoint action recognition using


14] J. Nasiri, N.M. Charkari, K. Mozafari, Energy-based model of least squares twinsupport vector machines for human action recognition, Signal Process. 104(2014) 248–257.

15] Y.H. Shao, W.J. Chen, W.B. Huang, Z.M. Yang, N.Y. Deng, The best separatingdecision tree twin support vector machine for multi-class classification, Proc.Comput. Sci. 17 (2013) 1032–1038.

16] R. Khemchandani, P. Saigal, Color image classification and retrieval throughternary decision structure based multi-category TWSVM, Neurocomputing165 (2015) 444–455.

17] G.H. Golub, C.F. Van Loan, Matrix Computations, vol. 3, JHU Press, 2012.18] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook,

R. Moore, Real-time human pose recognition in parts from single depthimages, Commun. ACM 56 (1) (2013) 116–124.

19] Y. Zhao, Z. Liu, L. Yang, H. Cheng, Combing rgb and depth map features forhuman activity recognition, in: Asia-Pacific Signal & Information ProcessingAssociation Annual Summit and Conference (APSIPA ASC), IEEE, 2012.

20] S. Althloothi, M.H. Mahoor, X. Zhang, R.M. Voyles, Human activity recognitionusing multi-features and multiple kernel learning, Pattern Recognit. 47 (5)(2014) 1800–1812.

21] G. Yu, Y. Junsong, L. Zicheng, Propagative hough voting for human activityrecognition, in: Computer Vision-ECCV 2012, Springer Berlin Heidelberg,2012, pp. 693–706.

22] H. Wang, S. Cordelia, Action recognition with improved trajectories, in: 2013IEEE International Conference on Computer Vision (ICCV), IEEE, 2013.

23] E.A. Mosabbeb, R. Cabral, F. De la Torre, M. Fathy, Multi-label discriminativeweakly-supervised human activity recognition and localization, in: ComputerVision-ACCV 2014, Springer International Publishing, 2014, pp. 241–258.

24] A. Efros, A.C. Berg, G. Mori, J. Malik, Recognizing action at a distance, in:Proceedings of the International Conference on Computer Vision (ICCV’03),vol. 2, Nice, France, October 2003, pp. 726–733.

25] S. Danafar, N. Gheissari, Action recognition for surveillance applications usingoptic flow and SVM, in: Proceedings of the Asian Conference on ComputerVision (ACCV’07) – Part 2, Lecture Notes in Computer Science, Tokyo, Japan,November 2007, pp. 457–466 (Number 4844).

26] S. Ali, M. Shah, Human action recognition in videos using kinematic featuresand multiple instance learning, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2)(2010) 288–303.

27] N. Ikizler, R.G. Cinbis, P. Duygulu, Human action recognition with line andflow histograms, in: Proceedings of the International Conference on PatternRecognition (ICPR 08), Tampa, FL, December 2008, pp. 1–4.

28] Z. Zhang, Y. Hu, S. Chan, L.T. Chia, Motion context: a new representation forhuman action recognition, in: Proceedings of the European Conference onComputer Vision (ECCV’08) – Part 4, Lecture Notes in Computer Science,Marseille, France, October 2008, pp. 817–829.

29] D. Tran, A. Sorokin, Human activity recognition with metric learning, in:Computer Vision-ECCV 2008, Springer, 2008, pp. 548–561.

30] Z. Lin, Z. Jiang, L.S. Davis, Recognizing actions by shape-motion prototypetrees, in: 2009 IEEE 12th International Conference on Computer Vision, IEEE,2009, pp. 444–451.

31] F. Baumann, A. Ehlers, B. Rosenhahn, J. Liao, Recognizing human actions usingnovel space-time volume binary patterns, Neurocomputing 173 (2016) 54–63.

32] Y.Y. Lin, J.H. Hua, N.C. Tang, M.H. Chen, H.Y.M. Liao, Depth and skeletonassociated action recognition without online accessible rgb-d cameras, in:2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),IEEE, 2014, pp. 2617–2624.

33] X. Yan, Y. Luo, Making full use of spatial-temporal interest points: an adaboost
approach for action recognition, in: 2010 17th IEEE International Conferenceon Image Processing (ICIP), IEEE, 2010, pp. 4677–4680.
34] X. Ji, Z. Ju, C. Wang, C. Wang, Multi-view transition HMMs basedview-invariant human action recognition method, Multimed. Tools Appl.(2015) 1–18.

[


35] R. Touati, M. Mignotte, MDS-based multi-axial dimensionality reductionmodel for human action recognition, in: 2014 Canadian Conference onComputer and Robot Vision (CRV), IEEE, 2014, pp. 262–267.

36] D.K. Vishwakarma, R. Kapoor, A. Dhiman, Unified framework for humanactivity recognition: an approach using spatial edge distribution andR-transform, AEU Int. J. Electron. Commun. 70 (3) (2016) 341–353.

37] M.S. Cheema, A. Eweiwi, C. Bauckhage, Human activity recognition byseparating style and content, Pattern Recognit. Lett. 50 (2014) 130–138.

38] Z. Wang, J. Wang, J. Xiao, K.H. Lin, T. Huang, Substructure and boundarymodeling for continuous action recognition, in: 2012 IEEE Conference onComputer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1330–1337.

39] H. Wang, A. Klaser, C. Schmid, C.L. Liu, Action recognition by densetrajectories, in: 2011 IEEE Conference on Computer Vision and PatternRecognition (CVPR), IEEE, 2011, pp. 3169–3176.

40] X. Wu, D. Xu, L. Duan, J. Luo, Action recognition using context and appearancedistribution features, in: 2011 IEEE Conference on Computer Vision andPattern Recognition (CVPR), IEEE, 2011, pp. 489–496.

41] R. Li, T. Zickler, Discriminative virtual views for cross-view action recognition,in: 2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), IEEE, 2012, pp. 2855–2862.

42] T. Van Gestel, J.A. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B.De Moor, J. Vandewalle, Benchmarking least squares support vector machineclassifiers, Mach. Learn. 54 (1) (2004) 5–32.

43] J.A. Suykens, J.D. Brabanter, L. Lukas, J. Vandewalle, Weighted least squaressupport vector machines: robustness and sparse approximation,Neurocomputing 48 (1) (2002) 85–105.

44] A.C. de Carvalho, A.A. Freitas, A tutorial on multi-label classificationtechniques Foundations of Computational Intelligence, vol. 5, Springer, 2009,pp. 177–195.

45] U.H.-G. Kreßel, Pairwise Classification and Support Vector Machines.Advances in Kernel Methods, MIT Press, 1999.

46] H. Lei, V. Govindaraju, Half-against-half multi-class support vector machines,in: Multiple Classifier Systems, Springer, 2005, pp. 156–164.

47] B.D. Lucas, T. Kanade, An iterative image registration technique with anapplication to stereo vision, in: IJCAI, vol. 81, 1981, pp. 674–679.

48] I.T. Jolliffe, Principal component analysis and factor analysis, Princ. Compon.Anal. (2002) 150–166.

49] Y.H. Shao, C.H. Zhang, X.B. Wang, N.Y. Deng, Improvements on twin supportvector machines, IEEE Trans. Neural Netw. 22 (6) (2011) 962–968.

50] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge UniversityPress, 1996.

51] M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space–timeshapes, in: IEEE International Conference on Computer Vision, tenth, vol. 2,ICCV 2005, IEEE, 2005, October, pp. 1395–1402.

52] C. Blake, C.J. Merz, UCI Repository of Machine Learning Databases, 1998.53] C. Hsu, C. Chih-Chung, L. Chih-Jen, A Practical Guide to Support Vector

Classification, 2003.54] R.O. Duda, P.R. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2012.55] J. Chen, G. Ji, Weighted least squares twin support vector machines for pattern

classification, in: The 2nd International Conference on Computer andAutomation Engineering (ICCAE), vol. 2, IEEE, 2010.

56] Z. Yang, Y.H. Shao, X. Zhang, Multiple birth support vector machine formulti-class classification, Neural Comput. Appl. 22 (1) (2013) 153–161.

57] J. Demar, Statistical comparisons of classifiers over multiple data sets, J. Mach.Learn. Res. 7 (2006) 1–30.

motion history volumes, Comput. Vis. Image Underst. 104 (2) (2006) 249–257.59] A. Veeraraghavan, R. Chellappa, A.K. Roy-Chowdhury, The function space of

an activity, in: 2006 IEEE Computer Society Conference on Computer Visionand Pattern Recognition, vol. 1, IEEE, 2006.












































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































Documents

Applied Soft Computing - ایران عرضهiranarze.ir/wp-content/uploads/2016/10/E2180.pdfApplied Soft Computing 47 (2016) 33–46 Contents lists available at ScienceDirect Applied