Towards a Framework for Collaborative Video Surveillance ...susumus.com/files/pubs/cscw2016_paper.pdf · H.5.3 [Information interfaces and presentation (e.g., HCI)]: Computer-supported

Towards a Framework forCollaborative Video SurveillanceSystem Using Crowdsourcing

Susumu SaitoWaseda University3-4-1 Okubo, Shinjuku-ku,Tokyo, [email protected]

Teppei NakanoWaseda University3-4-1 Okubo, Shinjuku-ku,Tokyo, [email protected]

Tetsunori KobayashiWaseda University3-4-1 Okubo, Shinjuku-ku,Tokyo, [email protected]

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the Owner/Author.Copyright is held by the owner/author(s).CSCW ’16 Companion, February 27 - March 02, 2016, San Francisco, CA, USAACM 978-1-4503-3950-6/16/02.http://dx.doi.org/10.1145/2818052.2869074

AbstractThis paper proposes a new framework for video surveil-lance systems for crime prevention. The main purpose ofthis framework is to help provide reasonable and stablesolutions for automated video surveillance systems in acollaborative way. This framework is characterized by averification process using crowdsourcing after the imageanalysis process: automated image analyzer detects asmany suspicious events as possible followed by filteringprocess using human intelligence, to achieve both high re-call and high precision rates. Here we describe the basicmechanisms for collaboration between camera devices,data stores, image analyzers and surveillance crowds.

Author KeywordsCollaborative video surveillance; framework; crowdsourcing.

ACM Classification KeywordsH.5.3 [Information interfaces and presentation (e.g., HCI)]:Computer-supported cooperative work

IntroductionToday, there is an increasing demand for video surveillancesystems with automated detection and alerting functionsfor crime prevention. To achieve a sufficient alerting func-tion, all suspicions should be detected and alerted even iftheir possibility is quite low. Unfortunately as a result of this,

393

CSCW '16 COMPANION, FEBRUARY 27–MARCH 2, 2016, SAN FRANCISCO, CA, USA

the number of false alerts increase accordingly, leading totrue emergencies occasionally being overlooked. Therefore,there is a need for automated functions that provide moreaccurate information, with less false alerts and detectionfailures. Image analysis is a general method for automateddetection. However, there are limitations in its performancedue to the recall-precision tradeoff problem, as well as thehigh volume of behavioral variation and countless uniquescenarios that may occur [1]. Meanwhile, it is said that hu-man intelligence is able to solve problems which computerscannot easily recognize [2]. If events detected by high-recallimage analyzers are filtered by human verification, bothhigh recall and precision can be achieved. In order to dothis in a surveillance system, a sufficient number of peopleare needed for precise and prompt filtering: one of the bestways to solve this problem is crowdsourcing. To this end,we propose a framework for automated video surveillancesystems which achieves high accuracy automated detectionand alerting, by using both image analysis and crowdsourc-ing.

Figure 1: System overview. (1)Thecamera device captures images &detects suspicious behaviors(low-level) and (2)stores images orfeatures in the data store. (3)Theimage analyzer detects suspiciousbehaviors (high-level) and (4)sendsa signal to the collaboration serviceserver. (5)Workers of the crowdsverify the detected behaviors.(6)The collaboration service serveraggregates the verification result,(7)notifies the owner, and (8)sendsverification results to the imageanalyzer as a feedback.

Our framework is a structure in which all the necessarycomponents for automated detection and alerting on avideo surveillance system can be collaboratively incorpo-rated. Video surveillance cameras, image analyzers fordetection, and data stores for security footage are the com-mon components that are usually required for automaticdetection and alerting functions. Currently, a video surveil-lance system with these components needs to be con-structed from scratch, which is time consuming and usuallymakes it difficult for vendors to provide reasonable solu-tions. Despite the fact that a unified framework is neededfor easy development of surveillance systems, there are notany in existence. Our framework is capable of collaborationbetween camera devices, data stores, image analyzers andsurveillance crowds. Under this environment, vendors can

provide reasonable and stable surveillance service withoutthe need to construct an entire system from scratch. Indi-viduals can also adapt personal solutions with large varietyof combinations of camera devices and image analyzers.

Related WorkAt the present, there are platforms for crime prevention us-ing VSaaS, a software model of video surveillance systemwith less concern for users about system maintenance [4,5]. These systems use image processing for detecting sus-picious behaviors, however, detection failure and false alertproblems cannot be resolved. Gadgil et al. [3] proposed avideo annotation system for law enforcement authority us-ing crowdsourcing. Their system achieved high precisionand recall rates by training annotators, however, this cannotbe used for real-time surveillance without image processingfor pre-detection.

Surveillance MechanismIn our approach, a verification process using human intelli-gence is performed after the automated detection processusing image analyzers. There are multiple players calledsurveillance crowds in our framework, who verify eachevent detected by the image analyzer. Hiring surveillancecrowds for double checking on behalf of surveillance own-ers can solve the tradeoff problem by achieving high recalland high precision.

Figure 1 shows the overall view of our proposed framework.The surveillance camera device first captures an image anddetects a motion in the image. When a motion is detected,security footage is sent to the data store and the image an-alyzer detects a suspicious behavior. If the footage includesa suspicious behavior, the analyzer sends a signal to thecollaboration service server. As soon as the collaborationservice server receives the signal, it sends suspicion notifi-

394

SESSION: POSTERS

cations to surveillance crowds’ monitoring devices. Workersof crowds are expected to check the footage immediatelyafter the notification to verify whether there really existsa suspicious behavior. The collaboration service serveraggregates the results of crowds verification and decideswhether to notify to the camera owner or not. The ownergets notified via the management device if it is determinedto be necessary. Finally, verification results are sent to theimage analyzer as a feedback to improve its performance.

In order to realize this workflow, we provide a mediatorserver called the collaboration service server. Its primaryrole is to mediate surveillance components, described inthe next section, based on ”surveillance profiles”. A surveil-lance profile is a configuration which defines details of asystem workflow between surveillance components. Surveil-lance owners can manage the workflow and can associatemonetary reward with these components, incorporatingcrowds in the system. In the surveillance process, thisserver first elects a certain number of the most ideal work-ers of the crowds to notify based on the profile. Worker’sindividual information, such as the presence, recent verifi-cation time, failure rate and response speed, can also beconsidered. After notifying the workers via monitoring de-vices, the server then collects their verification results andcalculate seriousness of the suspicious behavior. When atrue suspicious behavior is detected, an alert notification issent to owner’s management device.

Figure 2: A concept of ourcollaborative framework. All of thecomponents can be incorporatedas plug-ins. Some amount ofmoney needs to be charged by thecamera owners so that thesurveillance crowds and the othercomponents can get small rewardsfor their contribution.

Components for Collaborative SurveillanceAs shown on Figure 2, not only surveillance camera own-ers but camera devices, data stores, and image analyzersprovided from outside are also able to be plugged in to ourframework.

Camera devicesIf device software and protocols are provided, manufactur-ers are able to easily produce various types of cameras.Those security cameras need to be registered and authen-ticated by the collaboration service server, with their defaultsurveillance profiles provided. A camera is designed to playa role of a brief image analyzer for low-level detection toreduce the data amount. Its algorithm can be installed bymanufacturers, or can be downloaded from the place as-signed in the profile. When a motion is detected, the cam-era sends security footage and/or extracted features thatare used for successive high-level behavior detection to thedata store.

Data storesOur framework is capable of accepting online data storeresources to support sufficient disk space for each cam-era. Providers can get a reward after getting used in theframework. Camera owners can choose proper data stor-age service depending on their requirements by associatingtheir rewards to the service in the profile.

Image analyzersDetection algorithms are designed to be plugged in so thatvarious types of algorithms can be produced by algorithmdevelopers and researchers. An image analyzer in thisframework is designed to be in charge of a high-level de-tection after low-level detection process by a camera de-vice. Camera owners can choose proper algorithms. Re-searchers can have a chance to practice their algorithms inreal situations without having highly-accurate results.

Surveillance CrowdsOur framework is designed to be open to any crowdsourc-ing services so that reasonable verification process is achieved.Crowdsourcing depends on general public who do not nec-essarily have knowledge of surveillance. Their role is to

395

CSCW '16 COMPANION, FEBRUARY 27–MARCH 2, 2016, SAN FRANCISCO, CA, USA

receive notifications and to check the images as soon aspossible for verifying events detected by analyzers. A sim-ple question with multiple choices defined by the cameraowner is given. Answering each question is expected totake only a few second and crowds get small reward aftermaking each verification. Because a quality of our detectionservice heavily depends on a quality of crowds, their resultsare recorded so that irresponsible works are excluded.

Figure 3: The camera device usedin our prototype system. We usedLogitec HD Webcam C270 for anUSB webcam and Raspberry Pi 2Model B for a microcomputer, witha wireless USB adapter.

Figure 4: A web interface forsurveillance crowds. There is asentence which describes whatowners want crowds to verify.Buttons in the middle are forwardand reverse functions of footage.Verification can be done by clickingeither ”Yes” or ”No”.

Prototype ImplementationWe have developed a minimum prototype of our surveil-lance system to demonstrate the workflow and the verifi-cation process. The camera device, the image analyzer,and the collaboration service server is implemented inNode.js1, a server-side JavaScript environment. For thecamera device, we used an USB webcam and a microcom-puter (Figure 3.) It takes a 320 x 240 image per secondand send it directly to the online database provided by Fire-base2, since low-level detection is not included in this imple-mentation. The image analysis server executes a sampleprogram of face detection prepared by OpenCV. As soonas it detects any face in an image saved in the data store,the collaboraion service server creates and sends a URL,with a unique ID appended, to each worker of the surveil-lance crowds via e-mail. The surveillance crowds consist of15 laboratory members. They use either a personal com-puter or a smartphone for verification. An yes/no questionis asked on the web interface for each notification, whichtakes only a few seconds to finish verification (Figure 4.)Verification results of the crowds are aggregated in the col-laboration service server. As a result of the test operationfor several hours, we could see that some of the false de-tection by a face detector could be eliminated according tothe verification results.

1https://nodejs.org/2https://www.firebase.com

Conclusion and Future WorkWe proposed a framework for video surveillance systemwhich achieves high-accuracy automated detection andalerting functions using crowdsourcing. The main compo-nents of video surveillance can be incorporated and col-laborate in the framework so that reasonable and stablesolutions are available. Future work will firstly implement afunction to embed other crowdsourcing services as surveil-lance crowds, and set a real task for image analysis. Thenit is necessary to test the feasibility of our minimum proto-type and ensure that it can reduce false alerts as well asdetection failures. We will also design and implement a mid-dleware that incorporates the system components in ourframework, while considering security and privacy as well.

References[1] Josep Aguilera and et al. 2014. System on Chip

(SoC): New generation of video surveillance systems.In Security Technology (ICCST), 2014 InternationalCarnahan Conference on. IEEE, 1–5.

[2] Jeffrey P. Bigham and et al. 2010. VizWiz: NearlyReal-time Answers to Visual Questions. In UIST 2010.

[3] Neeraj J. Gadgil and et al. 2014. A Web-Based VideoAnnotation System for Crowdsourcing SurveillanceVideos. In Proc. SPIE Vol. 9027: Imaging and Multime-dia Analytics in a Web and Mobile World 2014. SPIE,1–12.

[4] Thanathip Limna and Pichaya Tandayya. 2012. Designfor a flexible video surveillance as a service. In Imageand Signal Processing (CISP), 2012 5th InternationalCongress on. 197–201.

[5] Andrea Prati, Roberto Vezzani, Michele Fornaciari,and Rita Cucchiara. 2013. Intelligent Video Surveil-lance as a Service. In Intelligent Multimedia Surveil-lance Current Trends and Research. Springer Interna-tional Publishing, 1–16.

396

SESSION: POSTERS

Documents

Towards a Framework for Collaborative Video Surveillance ...susumus.com/files/pubs/cscw2016_paper.pdf · H.5.3 [Information interfaces and presentation (e.g., HCI)]: Computer-supported