2
4W1H and PSO for Human Activity Recognition Paper: 4W1H and Particle Swarm Optimization for Human Activity Recognition Leon Palafox and Hideki Hashimoto Institute of Industrial Science, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan E-mail: [email protected], [email protected] [Received February 21, 2011; accepted May 27, 2011] This paper proposes a paradigm in the forensic area for detecting and categorizing human activities. The presented approach uses five base variables, referred to as 4W1H (“Who,” “When,” “What,” “Where,” and “How”) to describe the context in an environment. The proposed system uses self-organizing maps to clas- sify movements for the “How” variable of 4W1H, as well as particle swarm optimization clustering tech- niques for the grouping (clustering) of data obtained from observations. The paper describes the hardware settings required for detecting these variables and the system designed to do the sensing. Keywords: self organizing maps, particle swarm opti- mization, 4W1H, activity recognition 1. Introduction Human Activity Recognition (HAR) systems are present in cities, buildings, and rooms where they change and adapt to continuously changing environments. To be able to do this, these systems process the data gath- ered from sensors in the environment. They then modify the environment in consonance with the activities demon- strated by the people present in the environment. Human activity recognition has been recognized as a very large field including many challenges (e.g., HAR can focus on a person or people, a single room, or even a whole city). Although the proposed approach could be applied, poten- tially, to any setting, the focus in the paper is on intelligent rooms, where the users are few and variables, such as ob- jects and places, are known. In particular, we will use the iSpace [1], which has an adequate set of sensors to do recognition tasks (Fig. 1). Intelligent room settings usually have three compo- nents: sensing, classification and action. The sensing and classification problems have a close relation since the traits of the sensed data (images, video, sound, etc.) will dictate which classification tools should be used. Never- theless, most conventional techniques of HAR have flaws. For example, cameras recognize activities using only the human pose [2], often overlooking the multiple character- istics of each scene (e.g., time, place, and environment). Fig. 1. iSpace sensor setting. To address this issue, some groups focus on extracting ac- tivities using context detection. Works like [3] and [4] showed that sensing extra variables can increase the accu- racy of a recognition system. In this work, we propose describing the actions in an environment using 4W1H. The 4W1H paradigm defines activities as a set of 5 variables (“Who,” “When,” “What,” “Where,” and “How”) deemed sufficient to describe every action. Furthermore, by defining each activity as a set of these variables, we can mix sensing techniques. For ex- ample, we can detect “What” and “Who” using object and subject identification algorithms. We can use an RFID tagged environment to sense the “What” variable. Then, we can use clustering and classification techniques to pro- cess the different activities given multiple sets of 4W1H. There are, however, also problems to solve in the 4W1H method. For example, the way a left-handed per- son uses a pencil is different from the way a right-handed person does, and different people have different ways of doing things. When we use 4W1H, therefore, the “How” variable has an intrinsic complexity and needs a special classification on its own. We need a scheme capable of performing on-line recognition of an increasing number of possible “How”s. To do this, we used a mix of wavelets and self-organized maps, which showed good results [5] when doing a rough classification of unknown inputs with a high variance. Vol.15 No.7, 2011 Journal of Advanced Computational Intelligence 793 and Intelligent Informatics

5W1H

Embed Size (px)

DESCRIPTION

ARTIGO

Citation preview

Page 1: 5W1H

4W1H and PSO for Human Activity Recognition

Paper:

4W1H and Particle Swarm Optimization forHuman Activity Recognition

Leon Palafox and Hideki HashimotoInstitute of Industrial Science, The University of Tokyo

7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, JapanE-mail: [email protected], [email protected]

[Received February 21, 2011; accepted May 27, 2011]

This paper proposes a paradigm in the forensic areafor detecting and categorizing human activities. Thepresented approach uses five base variables, referredto as 4W1H (“Who,” “When,” “What,” “Where,” and“How”) to describe the context in an environment.The proposed system uses self-organizing maps to clas-sify movements for the “How” variable of 4W1H,as well as particle swarm optimization clustering tech-niques for the grouping (clustering) of data obtainedfrom observations. The paper describes the hardwaresettings required for detecting these variables and thesystem designed to do the sensing.

Keywords: self organizing maps, particle swarm opti-mization, 4W1H, activity recognition

1. Introduction

Human Activity Recognition (HAR) systems arepresent in cities, buildings, and rooms where they changeand adapt to continuously changing environments. Tobe able to do this, these systems process the data gath-ered from sensors in the environment. They then modifythe environment in consonance with the activities demon-strated by the people present in the environment. Humanactivity recognition has been recognized as a very largefield including many challenges (e.g., HAR can focus ona person or people, a single room, or even a whole city).Although the proposed approach could be applied, poten-tially, to any setting, the focus in the paper is on intelligentrooms, where the users are few and variables, such as ob-jects and places, are known. In particular, we will usethe iSpace [1], which has an adequate set of sensors to dorecognition tasks (Fig. 1).

Intelligent room settings usually have three compo-nents: sensing, classification and action. The sensingand classification problems have a close relation since thetraits of the sensed data (images, video, sound, etc.) willdictate which classification tools should be used. Never-theless, most conventional techniques of HAR have flaws.For example, cameras recognize activities using only thehuman pose [2], often overlooking the multiple character-istics of each scene (e.g., time, place, and environment).

Fig. 1. iSpace sensor setting.

To address this issue, some groups focus on extracting ac-tivities using context detection. Works like [3] and [4]showed that sensing extra variables can increase the accu-racy of a recognition system.

In this work, we propose describing the actions in anenvironment using 4W1H. The 4W1H paradigm definesactivities as a set of 5 variables (“Who,” “When,” “What,”“Where,” and “How”) deemed sufficient to describe everyaction. Furthermore, by defining each activity as a set ofthese variables, we can mix sensing techniques. For ex-ample, we can detect “What” and “Who” using object andsubject identification algorithms. We can use an RFIDtagged environment to sense the “What” variable. Then,we can use clustering and classification techniques to pro-cess the different activities given multiple sets of 4W1H.

There are, however, also problems to solve in the4W1H method. For example, the way a left-handed per-son uses a pencil is different from the way a right-handedperson does, and different people have different ways ofdoing things. When we use 4W1H, therefore, the “How”variable has an intrinsic complexity and needs a specialclassification on its own. We need a scheme capable ofperforming on-line recognition of an increasing numberof possible “How”s. To do this, we used a mix of waveletsand self-organized maps, which showed good results [5]when doing a rough classification of unknown inputs witha high variance.

Vol.15 No.7, 2011 Journal of Advanced Computational Intelligence 793and Intelligent Informatics

Page 2: 5W1H

Palafox, L. and Hashimoto, H.

The goal of the proposed system is to sense activitiesin a space. So, once we have the 4W1H variables in abuffer, we need to cluster them into groups for identifi-cation. We map each of the 4W1H variables to an R

5

space, by assigning categorical values to each variable. Inthis work, we use a clustering technique based on particleswarm optimization to group these variables. Further, bydoing projections in any of the planes, we can visualizethe clusters with references like time, space and users.

The remainder of the paper is organized as follows. Ini-tially, we discuss related work, and the algorithms used tosolve the problem. We then present and describe the sens-ing hardware and the sensing system. Finally, we presentresults from experimental investigations.

2. Preliminaries

2.1. Related WorkSchilit and Theimer [6] first defined context sensing as:

“The ability of the system to discover and react to changesin the environment they are located in.” Using this def-inition, we may also say that a context sensing systemis capable of sensing those variables which generate thechanges in the environment. Work by Schmidt [7] de-scribed how context may help to infer human activities.Schmidt’s work did not set up a set of activities or a pat-tern, but described context-aware applications. Robertsonand Reid [2] used an approach that used position and ve-locity in addition to local motion to describe an activity.This enhanced the recognition rate of their system. Li andFei-Fei [3] also used context and defined three variablesfor static images (what, where, and who). Their approachused a Dirichlet mixture model to define activities as amixture of variables found in the scene. Their work over-all, however, did not investigate the implications of rely-ing in one sensor to get all variables. Their results, whilecompelling, were limited to static images. Huang et al. [4]used a similar approximation of context. Their work fo-cused on the when, what, and where variables using anarrangement of sensors. They also used a pattern match-ing algorithm to match sensed data to activities. This pat-tern matching sensing, however, may suffer from a lack offlexibility in situations with new objects. The presentedwork differs from these contributions by increasing thenumber of sensing variables. The presented approach alsouses a clustering system. This clustering system will al-low the the system to recognize a wide number of unseenactivities.

2.2. Self-Organizing MapsKohonen [8] describes Self-Organizing Maps (SOM)

as an algorithm that projects a Z dimensional featurespace into a 2 dimensional map. It places similar elementsclose to each other, thus, preserving the topology of thespace. Typically, a SOM is represented by an N ×N ma-trix, where each element is a neuron that has Z weights.In this work, we present K inputs from a Z dimensional

space to the map and compare them with each element ofthe matrix. A training process updates the weight vectorsin order to bring them closer to the input. After training,the SOM will be divided into areas where similar inputswill be grouped together. For a full explanation of the up-dating equations, please refer to [8]. Anyhow, the mapthen classifies new inputs by measuring the distance be-tween the new input D∗ and each element of the SOM.The resulting class assignment is set to the neuron withthe minimum distance to the input. Although SOMs arenot often used to classify new inputs, Benitez [9] and Cah-plot [5] have used SOMs combined with Wavelets as asurrogate for classification with good results.

2.3. Particle Swarm Optimization ClusteringParticle Swarm Optimization (PSO) is inspired by na-

ture’s social optimization. Visually, PSO can be imag-ined as a flock of birds flying in the sky or a school offish swimming in the water. In any of these groups everyso-called particle has a position, a velocity, and a set ofsimple instructions to do its tasks. In these groups, eachindividual is a candidate solution for the optimum state. Inaddition, each individual has the ability to record its bestposition and regulate its velocity according to its target.

In some applications [10], the PSO algorithm is alsocapable of recording the global best solution of the entirepopulation. In this case, the basic algorithm consists inupdating the current particle velocity and position, whilethe update is based on its community’s best results towardan optimum solution to a given fitness function. On theother hand, work by Omran et al. [11] describes a PSO-based clustering algorithm that is as robust as K-meansand, sometimes, even faster than K-means. In Omran’salgorithm, the particle space is simply regarded as a set ofK possible cluster centroids, and each particle is updatedin relation to the best distribution of these centroids. Thefitness function in PSO is typically a multi-objective opti-mization problem, which minimizes the distance within acluster and maximizes the separation among clusters.

Anyhow, for this work, we have decided to choosethis latter algorithm [11] because of its fast convergence,which makes it good for an on-line sensing system, aswell as its relative ease of implementation.

2.4. WaveletsThe wavelet transform coefficients, given by the inner

product of x(t) and the basis functions

W (ω,n) = 〈x(t),ψω,n(t)〉 . . . . . . . . (1)

comprise the time-frequency representation of the origi-nal signal. In digital signal processing (as in this work),the fast-forward wavelet transform is typically imple-mented as a set of tree-structured filter banks. The inputsignal is divided into contiguous, non-overlapping blocksof samples called frames, and sampling works by sam-pling frame by frame for the forward transform. Thiswork uses wavelets as a fast way to compress and filterthe signal from the MTx sensor. It is expected that this

794 Journal of Advanced Computational Intelligence Vol.15 No.7, 2011and Intelligent Informatics