Davis MobileMediaMetadata

Embed Size (px)

Citation preview

  • 8/7/2019 Davis MobileMediaMetadata

    1/4

    Mobile Media Metadata for Mobile Imaging

    Marc DavisGarage Cinema ResearchSchool of Information Management and Systems

    University of California at Berkeleyhttp://garage.sims.berkeley.edu

    [email protected]

    Risto Sarvas Helsinki Institute for Information Technology(HIIT)

    Helsinki University of Technologyhttp://www.hiit.fi/risto.sarvas/

    [email protected]

    Abstract

    Camera phones offer a new platform for digital imaging that integrates capture, programmable

    processing, networking, and rich user interactioncapabilities. This platform can support software that leverages the spatio-temporal context and social community of image capture and (re)use to infer mediacontent. Our Mobile Media Metadata prototype(MMM) uses a small custom client on the camera

    phone and its XHTML browser along with a metadata server to enable annotation at the time of imagecapture, leverage contextual metadata and networked metadata resources, and use iterative metadatarefinement on the mobile imaging device.

    1. Introduction

    Mobile phones with media creation capabilities arerapidly entering the marketplace in the USA and al-ready have significant market presence in Asia andEurope. While camera phones offer a tremendous op-

    portunity to create and share images, sound, and video,they also bring with them the inherent problem of me-dia management. As we capture more and more mediaevery day, the pressing need for solutions to managemedia grows ever greater. With hundreds of pictures,human memory and browsing can manage content;when one has many thousands of pictures, the individ-ual pictures one may want are effectively lost. The so-lution is metadata that describe the content of mobilemedia. However, we face a conundrum: automaticmedia analysis cannot represent media content in waysthat address human concerns and purely manual mediaannotation is too time-consuming for consumers. Athird way is to leverage the spatio-temporal context and social community of media capture in mobiledevices to infer media content.

    While useful work has been done on consumer im-age annotation [1], the vast majority of prior researchon personal image management has assumed thatimage annotation occurs after image capture in a

    desktop context . Time lag and context change affectsthe likelihood of the annotation task being performedas well as the photographers accurate recall of thecontextual information to be assigned to the photo-graph [2]. Importantly, the devices and usage contextof consumer digital photography are undergoing rapidtransformation from the traditional camera-to-desktop-to-network image pipeline to an integrated mobile im-aging experience. The availability of mobile, net-worked digital imaging devices with operating systems(e.g., Symbian) and support for high level program-ming languages (e.g., Java, C++) using open standardsand accessible APIs means that multimedia researchers

    (and consumers) now have a new platform for the de-velopment of digital imaging applications that canleverage:

    1) Programmable processing at the point of mediacapture

    2) Device and network supplied temporal, spatial,and social metadata

    3) Network resources for processing andcommunication

    4) Rich interaction with the camera user We have explored the integration of capture,

    processing, and interaction at the point of mediacapture in our research on Active Capture [3]. Cam-

    era phones bring a new dimension to our research byenabling the automated gathering of contextualmetadatatemporal, spatial, and social (e.g., usernameand presence)to create, infer, and learn annotationsof media content.

    Related work has looked at leveraging temporal[4] and spatial [5] information for organizing capturedimages. Furthermore, by utilizing the networking,

  • 8/7/2019 Davis MobileMediaMetadata

    2/4

  • 8/7/2019 Davis MobileMediaMetadata

    3/4

    Figure 2). Using our client software on the phone, theuser captures the image and selects the main subject of the image ( Person , Location , Object, or Activity)

    before uploading it to the server. The server receivesthe uploaded image and the metadata gathered at thetime of capture ( main subject , time, date , network

    cellID , and user name ). Based on this metadata, theserver searches a repository of previously capturedimages and their respective metadata for similarities.The images and metadata in the repository are not lim-ited to the users own images and metadata, butcontain every users annotated media to leverage theadvantages of shared metadata.

    Using the information from previously captured im-ages that have similar metadata, the server programgenerates educated guesses for the user (i.e., selectionlists with the most probable metadata first). The user

    receives the server-generated guesses for verification,and confirms, corrects, or augments the metadata.Below we describe the system implementation in moredetail by dividing it into the main parts of the metadatacreation process described above.

    3.1. Image capture and metadata gathering

    The client-side image capturing, user selection of main subject, automatic gathering of metadata, andcommunication with the server were implemented in aC++ application named Image-Gallery . It was devel-oped in cooperation with Futurice (www.futurice.fi)

    for the Symbian 6.1 operating system on the Nokia3650 phone. The user captures the image using Image-Gallery which automatically stores the main subject ,time, date, GSM network cellID , and the user name .The image and metadata upload process wasimplemented in Image-Gallery and on the server sideusing the Nokia Image Upload API 1.1.

    3.2. Metadata similarity processing

    The server side metadata similarity processing wasimplemented in a Java module that provides a set of algorithms for retrieving metadata using the metadataof the image at hand and the repository of previously

    annotated images. The values returned by the metadata processing and retrieval are the guesses sorted in order of highest probability. In the MMM system we imple-mented two main sets of algorithms: location guessingand person guessing which leverage the variousintersections of spatial, temporal, and social similarityas illustrated in Figure 3.

    Spatial Temporal

    Socialspatialcohorts

    temporalcohorts

    familiar strangers

    Spatial Temporal

    Social

    Spatial Temporal

    Socialspatialcohorts

    temporalcohorts

    familiar strangers

    Figure 3. Connecting spatial, temporal, and

    social metadata

    The patterns in where, when, and with and of whomindividuals, social groups, and cohorts take

    photographs have discernible regularities that we useto make inferences about photo content. Our location

    guesser uses a weighted sum of interrelated spatial,temporal, and social features: most recently visitedlocation by the user; most visited location by the user in this CellID around this time; most visited location

    by the user in this CellID; most visited location byother users in this CellID around this time; and mostvisited location by other users in this CellID.Interestingly, notions of what it means to visit alocation and other users are complex and rely onspatial, temporal, and social congruities. For example,I can visit a location by taking a photograph thereand/or by being photographed there. Other usersmay be connected to a given user in a variety of

    intersecting spatial, temporal, and social relations.

    3.3. Metadata and media sharing and reuse

    One of the main design principles in the MMMsystem is to have the metadata shared and reusedamong all users of the system. This means that when

    processing the media and metadata, the system has

    phone client MMM server

    1b. Image & metadataupload

    3. User verification

    1a. Image capture andgathering of metadata

    2a. Metadata similarity processing2b. Metadata & media sharing & reuse

    phone client MMM server

    1b. Image & metadataupload

    3. User verification

    1a. Image capture andgathering of metadata

    2a. Metadata similarity processing2b. Metadata & media sharing & reuse

    Figure 2. Mobile Media Metadata (MMM)system overview

  • 8/7/2019 Davis MobileMediaMetadata

    4/4

    access not only to the media and metadata the user hascreated before, but the media and metadata everyoneelse using the system has created. While this sharingmay raise privacy concerns, several factors can reducethese risks: anonymization of the metadata; aggregateuse of the metadata; and leveraging social networks for

    metadata sharing. Metadata sharing enables us toleverage a variety of shared spatial, temporal, andsocial relations across many users (See Figure 3) toimprove metadata guessing. For example, the

    probability that a given MMM user has taken a photograph at a given place and time (e.g., at home onthe weekend) can be increased by another MMM user having photographed the MMM user before in that

    place at a similar time.The images and their respective metadata are stored

    in an open source object-oriented database (Ozone 1.1)on the server. The metadata is stored in a faceted hier-archical structure. In our structure the facets were the

    main subjects of the image: Person , Location , Object ,and Activity . The objective of the faceted structure isfor the facets to be as independent of each other as

    possible, in other words, one facet can be describedwithout affecting the others. Facetted hierarchicalstructures enable vocabulary control during annotationand precise similarity processing for retrieval.

    3.4. User verification

    The user verification and system responses wereimplemented in XHTML forms. After uploading theimage and metadata, the client-side Image-Gallery

    program launches the phones XHTML browser to aURL given by the server during the uploading. After the server creates the metadata guesses to facilitate theusers annotation work, it creates XHTML pages fromthe guesses for the client-side browser to present to theuser. The dialog between the server and the user isthen implemented in the form data sent from the phoneto the server, and the XHTML pages created by theserver that are rendered by the phones browser.

    3.4. System deployment

    MMM has been deployed since September 2003

    and was used by 40 graduate students and 15 research-ers at UC Berkeleys School of Information Manage-ment and Systems in a required graduate courseIS202: Information Organization and Retrieval co-taught by Prof. Marc Davis and Prof. Ray Larson. Stu-dents used the MMM prototype and developed perso-nas, scenarios, storyboards, metadata frameworks, and

    presentations for their concepts for mobile media and

    metadata creation, sharing, and reuse applications.

    4. Future work

    In our future work, we will be integrating media analy-sis algorithms with our metadata inferencing

    algorithms which we are further testing and refining based on the datasets from the 4 month trial of our Mobile Media Metadata prototype.

    5. Acknowledgements

    The authors would like to thank British Telecom,AT&T Wireless, Futurice, and Nokia for their supportof this research and the members of the Mobile MediaMetadata project in Garage Cinema Research at theSchool of Information Management and Systems at theUniversity of California at Berkeley.

    6. References

    [1] A. Kuchinsky, C. Pering, M. L. Creech, D. Freeze, B.Serra, and J. Gwizdka, " FotoFile : A Consumer MultimediaOrganization and Retrieval System," presented at SIGCHIConference on Human Factors in Computing Systems (CHI'99), Pittsburgh, PA, 1999.[2] D. Frohlich, A. Kuchinsky, C. Pering, A. Don, and S.Ariss, "Requirements for Photoware," presented at 2002ACM Conference on Computer Supported Cooperative Work (CSCW '02), New Orleans, LA, 2002.[3] M. Davis, "Active Capture: Integrating Human-Computer Interaction and Computer Vision/Audition to Automate Me-dia Capture," presented at 2003 IEEE Conference on Multi-media and Expo (ICME2003) Special Session on Movingfrom Features to Semantics Using Computational MediaAesthetics, Baltimore, MD, 2003.[4] M. D. Cooper, J. Foote, A. Girgensohn, and L. Wilcox,"Temporal Event Clustering for Digital Photo Collections,"

    presented at ACM Multimedia 2003, Berkeley, CA, 2003.[5] K. Toyama, R. Logan, and A. Roseway, "GeographicLocation Tags on Digital Images," presented at 11th ACMInternational Conference on Multimedia (MM2003), Berke-ley, CA, 2003.[6] R. Sarvas, E. Herrarte, A. Wilhelm, and M. Davis,"Metadata Creation System for Mobile Images," presented atSecond International Conference on Mobile Systems, Appli-cations, and Services (MobiSys2004), Boston, MA, 2004.

    [7] P. Vartiainen, "Using Metadata and Context Informationin Sharing Personal Content of Mobile Users," in Depart-ment of Computer Science . Helsinki, Finland: University of Helsinki, 2003, pp. 67.[8] M. Naaman, A. Paepcke, and H. Garcia-Molina, "FromWhere to What: Metadata Sharing for Digital Photographswith Geographic Coordinates," presented at 10th Interna-tional Conference on Cooperative Information Systems(CoopIS 2003), Catania, Sicily, 2003.