8
Evaluation of Markers for Optical Hand Motion Capture Katharina Stollenwerk 1 , Anna Vögele 2 , Ralf Sarlette 2 , Björn Krüger 3 , André Hinkenjann 1 , Reinhard Klein 2 1 Institute of Visual Computing 2 Institute of Computer Science 3 Gokhale Method Institute Bonn-Rhein-Sieg University of Applied Sciences University of Bonn Palo Alto Sankt Augustin, Germany Bonn, Germany CA, USA [email protected] [email protected] [email protected] Abstract: The work at hand outlines a recording setup for capturing hand and finger movements of musicians. The focus is on a series of baseline experiments on the detectability of coloured markers under different lighting conditions. With the goal of capturing and recording hand and finger movements of musicians in mind, requirements for such a system and existing approaches are analysed and compared. The results of the experiments and the analysis of related work show that the envisioned setup is suited for the expected scenario. Keywords: Hand Tracking, Motion Capture, Musical Performance 1 Introduction and Motivation In the field of computer graphics creating realistic and feasible animations of fine motor activity in humans is a challenge. One way to address this challenge are data-driven ani- mation techniques. Such data-driven techniques require high-fidelity motion data and thus depend on the frameworks used for recording. In recent years, a number of techniques and systems has been explored to record and process high-fidelity motion data. Some examples are the Microsoft Kinect [Zha12], marker-based or markerless optical motion capture systems (e. g. by Vicon [Vic16]), systems tracking angles (e.g. data gloves [TCZDR06]), image-based systems [WP09], and wearable devices [RLS09] comprising accelerometers and gyroscopes. While many of these systems are available to novice and expert users, none of the modalities is well-suited for capturing musical performance due to a one or more of these reasons: 1. Impracticability Some systems like data gloves can drastically interfere with the per- formance of the human hand. But when it comes to musicians or artists in general, the actions of interest depend on the dexterity of the artist’s hand. 2. Inaccuracy While marker-based systems interfere less with the performance of the hand, they often suffer from the self-occlusion. This is particularly relevant for the human hand. The same holds true for depth camera-based setups where (self-) occlusion causes problems reconstructing the poses of the actor’s hand. 3. Expensiveness Some systems such as those for optical motion capture require expensive equipment and are mostly unavailable to the normal users (Metcalf et al. [MNC + 08]). 4. Spacial resolution Recording fine-grained details such as subtle changes of hand poses and finger articulation requires high spacial resolution which many inexpensive systems do

Evaluation of Markers for Optical Hand Motion Capture · 2016. 8. 19. · The inspection of the hue ranges of colours under lighting condition (2a-2c) revealed that the colour pairs

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Evaluation of Markers for Optical Hand Motion Capture

    Katharina Stollenwerk1, Anna Vögele2, Ralf Sarlette2, Björn Krüger3,André Hinkenjann1, Reinhard Klein2

    1 Institute of Visual Computing 2 Institute of Computer Science 3 Gokhale Method InstituteBonn-Rhein-Sieg University of Applied Sciences University of Bonn Palo Alto

    Sankt Augustin, Germany Bonn, Germany CA, [email protected] [email protected] [email protected]

    Abstract: The work at hand outlines a recording setup for capturing hand and fingermovements of musicians. The focus is on a series of baseline experiments on the detectabilityof coloured markers under different lighting conditions. With the goal of capturing andrecording hand and finger movements of musicians in mind, requirements for such a systemand existing approaches are analysed and compared. The results of the experiments and theanalysis of related work show that the envisioned setup is suited for the expected scenario.

    Keywords: Hand Tracking, Motion Capture, Musical Performance

    1 Introduction and Motivation

    In the field of computer graphics creating realistic and feasible animations of fine motoractivity in humans is a challenge. One way to address this challenge are data-driven ani-mation techniques. Such data-driven techniques require high-fidelity motion data and thusdepend on the frameworks used for recording. In recent years, a number of techniques andsystems has been explored to record and process high-fidelity motion data. Some examplesare the Microsoft Kinect [Zha12], marker-based or markerless optical motion capture systems(e. g. by Vicon [Vic16]), systems tracking angles (e.g. data gloves [TCZDR06]), image-basedsystems [WP09], and wearable devices [RLS09] comprising accelerometers and gyroscopes.While many of these systems are available to novice and expert users, none of the modalitiesis well-suited for capturing musical performance due to a one or more of these reasons:

    1. Impracticability Some systems like data gloves can drastically interfere with the per-formance of the human hand. But when it comes to musicians or artists in general, theactions of interest depend on the dexterity of the artist’s hand.

    2. Inaccuracy While marker-based systems interfere less with the performance of thehand, they often suffer from the self-occlusion. This is particularly relevant for the humanhand. The same holds true for depth camera-based setups where (self-) occlusion causesproblems reconstructing the poses of the actor’s hand.

    3. Expensiveness Some systems such as those for optical motion capture require expensiveequipment and are mostly unavailable to the normal users (Metcalf et al. [MNC+08]).

    4. Spacial resolution Recording fine-grained details such as subtle changes of hand posesand finger articulation requires high spacial resolution which many inexpensive systems do

  • not meet (depth cameras, [Had12]).5. Temporal resolution Recording fast motion, e. g. of fingers, requires higher temporal

    resolution than met by many systems including the Kinect or data gloves.6. Interpretation Many representations given in the motion acquisition have to be mapped

    to pose information in order to be used for animation or analysis goals. Especially accelera-tions and angular information do not map to locations. This would require for reconstructionframeworks that are currently lacking.

    Addressing these problems is vital in order to meet the visual requirements for animationand processing of animated hands. Moreover, for recording musical perfomance unobtrusiveand affordable tracking setups are needed that can be used in situations where musicianswould be able to naturally perform. The fact that many musical instruments like the pianoare not very mobile and are highly sensitive to changes in temperature or humidity indicatedthat a system for recording musical performance need to be easily transported. Finally, themarkers applied to the hands should not interfere with the artist’s performance as such. Thisrules out gloves, bulky material and sticky paint as marker material.

    We propose a setup that addresses most of the above concerns for recording the perfor-mance of e.g. pianists’ hand movement. This setup is based on the idea that adding UVlight into a low-light scenario of UV-reactive colour-markered hands highlights the markersand mutes the musician’s hands and instrument. The special marker configuration aims atbeing able to fast recover from lost markers due to, e.g. occlusion.

    While our current scenario of choice aims at recording pianists, our approach to handtracking is not limited to these musicians. It is suited to work with different instruments(e.g. harp, double bass, cello, clarinet, conga) where the musician’s hands are recordablewith a fixed camera and lighting setup.

    2 Review of Related Work

    A range of methods for tracking full-body movement has been developed throughout the pastdecades. The survey of Moeslund et al. [MHK06] gives an overview of standard techniques invision-based motion capture. However, not all methods suitable to capture full-body actionsare well-suited for hand tracking. This is partly due to the role of subtle details that havea severe influence on the perceived results [JHO10]. Moreover, extreme speed variations inthe performance of the fingers have to be handled. Therefore, the focus has lately shiftedalso towards techniques especially for tracking fine motor activity.

    Hand Tracking Approaches for capturing hand and finger movements range across marker-based optical motion capture [MNC+08] (a protocol now named HAWK), position-basedmethods [MKY+06], image-based methods [EBN+07, BTG+12], data-driven image-basedmethods employing a coloured glove [WP09], methods based on depth information suchas recorded by the Microsoft Kinect [OKA11], and glove-based systems such as Cyber-Glove [DSD08]. Wheatland et al.’s survey [WWS+15] summarises the state of the art in

  • capturing hand and finger movement including advantages and drawbacks of the techniques.MacRitchie and Bailey [MB13] compare the suitability of different motion capture technolo-gies for tracking pianists’ finger movements and evaluate their own approach.

    Combinations of different capturing modalities have been introduced to alleviate somelimitations in hand tracking such as visual self-occlusion. Arkenbout et al. [AdWB15], toname one, integrate a data glove into a Kinect-based VR system. But such additions (e.g.gloves, instrumentation for contact force data) only further restrain the musicians in theirperformance because they hinder flexibility and impair touch contact. Or the added com-ponents are unable to meet the needed temporal resolution for capturing fast movements ofthe fingers (e.g. Kinect). Therefore, it is crucial to devise a system that is able to capture amusician’s performance without interfering with it.

    3 A Setup for Recording Musicians’ Hand and Finger Movements

    Adequate visual capturing of virtuoso musical performance requires a recording setup able tomatch the performer’s speed and variation in order not to limit the artist in his performance.Designing an unobtrusive system in the sense that there is as little as possible interferencewith finger movement is especially important. An approach towards such a setup is outlinedin the following. It resembles [MB13] but we forego some of their heuristics by an elaboratemarkering and aim at achieving higher precision in 3d estimation through stereo cameras.

    3.1 Recording Setup – Hardware and Lighting

    Motion tracking is performed by two high-performance, high-quality Point Grey Grasshopper3 USB 3.0 cameras, each with a Ricoh FL-CC0814-5M 8mm lens. Both cameras record inparallel at a framerate of 160 frames/second and a resolution of 1600×1200 pixels. Subject topreliminary music recording tests is a YAMAHA Clavinova CLP 220 digital piano featuring64-note polyphony, three foot pedals and a built-in speaker system and MIDI terminals.All Hardware components are connected to a standard desktop PC (Intel Core i7 4930K,3.40GHz) equipped with two state of the art SSDs—the high frame rate and high resolutionof the cameras requires hard disks with a high writing data rate.

    In order to allow for natural performance of the artist recorded while maintaining valuesof visibility and contrast of the markers, a combination of light sources is set up. Illuminationby fluorescent tube lights enables the artist to visually control the performance of the hands.Another light source of UV LEDs is applied in order to ameliorate tracking of colour markersattached to the skin of the hands. For an exemplary recording setup see Fig. 1.

    3.2 Design of the Marker Setup

    Markers treated with black light reactive neon make-up (UV-reactive make-up) are appliedto the skin at 21 locations per hand (42 markers in total). Available colours are: orange,red, pink, violet, white, blue, green, and yellow. In order to encounter as little issues by

  • cameras

    sourceslight

    clavier

    sourceslight

    cameras

    clavier

    top view

    sideview

    Figure 1: Left: exemplary recording setup middle: image of a hand skeleton [Hel] right:envisioned marker setup. Marker labels reflect their position; for fingers this results in an ab-breviation consisting of the finger (Tumb, Index, Middle, Ring, and Little) and the coveredjoint. Joint abbreviations are spelled out (proximal, distal) interphalangeal joints (PIPJ,DIPJ, IPJ), metacarpophalangeal joint (MCPJ) and trapeziometacarpal joint (TMCJ).

    self-occlusion as possible, the designed setup features a number of disc-shaped markers aswell as a set of strips of material. Discs indicate the tips of the fingers as well as the base ofeach finger and thumb, i. e. the start and end position of each finger. Another dot indicatesthe wrist of the hand. Hence, there are 11 disc-shaped markers per hand (22 discs total).Marker strips indicate the joints in between the bases and tips of each finger and thumb.That is, they are placed at the distal interphalangeal and the proximal interphalangeal jointof each finger as well as at the interphalangeal and metacarpophalangeal joint of the thumb.All in all, there are ten marker strips on each hand (20 in total), see Fig. 1.

    4 Evaluation of Marker Colours and Lighting Setup

    To find an apt setup for recording articulated hand motion, we designed several experiments.

    4.1 Experiments in a Light Box

    To assess the quality of a possible markering for recording hand motions in the envisionedsetup, we performed experiments in a light box. This box is a four-sided wooden framecoated with standard gray paint (RAL 7037) on the inside. Different types of light sourcesare installed in the box above the surface where the material is placed for image acquisition.

    1. Primary light sources: (a) a set of warm white 3200K LEDs (YUJI VTC Series HighCRI LED Ribbon), (b) a set of normal white 5600K LEDs (YUJI VTC Series High CRILED Ribbon), (c) a fluorescent tube light (CH Lighting F18T8/6400K). 2. Secondary lightsource: a set of UV LEDs with a peak wavelength centered around 400 nm (Müller Licht -Modell 400085, see Fig. 2 lower right chart).

  • Figure 2: Normed spectral distribution of the different light sources used in our experiments.LED light sources for 3200K and 5600K were measured in high and low (20%) intensity. Top:high intensities and fluorescent, bottom: low intensities and UV light source.

    These light sources have also been spectrally measured: Full intensity (high) light mea-surements were taken over 2000ms, measurements for dimmed light (low) over 20000ms.Dark measurements were performed for the same timings. Each measurement was con-ducted 20 times. By subtracting the average of the dark measurements from the average ofthe light measurements and normalising the result to the maximum value, we obtain normedspectral distributions (Fig. 2).

    Each of the eight shades of UV-reactive make-up was applied to a patch of paper andplaced in the box. Because UV-reactive paint reflects non-UV light differently than non-UV-reactive paint [HDC07], we have placed a standard colour reference target next to thepainted colour patches. Each scene was recorded with one of the camera detailed in Sec-tion 3.1. Camera parameters (e.g. shutter speed, gain, aperture) were held fix throughoutthe experiments through software and hardware. Images of the colour setup are capturedunder the different types of illuminants (1a-1c). Additionally, in each of these settings, thelight sources were dimmed to 20% serving as a baseline setting for adding UV light (2).

    4.2 Separation of Colour Information

    In order to assess how well markers are detectable in the camera images, colour ranges foreach shade of make-up were computed under all six lighting conditions (each primary lightsource with and without additional UV light). This is done by the following procedure:

    A patch of 64 × 64 pixels was cut from each of the images and for each of the eightcolours. These patches were converted to HSV colour space as it is able to separate colourinformation (hue) from colour intensity (saturation) and brightness (value). To account forinhomogeneities in the application of the UV-reactive make-up onto the paper patches, eachcut patch was smoothed using a Gaussian filter before further analysis. Our analysis is basedon the computation of minimum and maximum values of the HSV channels, representing

  • Setup (1a) Setup (2a) Setup (1b) Setup (2b) Setup (1c) Setup (2c)

    Hue

    0 120 240 360

    Hue

    0 120 240 360

    Hue

    0 120 240 360

    Hue

    0 120 240 360

    Hue

    0 120 240 360

    Hue

    0 120 240 360

    Saturation

    0 20 40 60 80 100

    Saturation

    0 20 40 60 80 100

    Saturation

    0 20 40 60 80 100

    Saturation

    0 20 40 60 80 100

    Saturation

    0 20 40 60 80 100

    Saturation

    0 20 40 60 80 100

    Value

    0 20 40 60 80 100

    Value

    0 20 40 60 80 100

    Value

    0 20 40 60 80 100

    Value

    0 20 40 60 80 100

    Value

    0 20 40 60 80 100

    Value

    0 20 40 60 80 100

    Figure 3: Top: Field of colour patches used in the evaluation of possible colour combinations.Colours from left to right and top to bottom are: orange, red, pink, violet, yellow, green,blue, white. Bottom three Hue, saturation and value ranges plotted for each of the colours.Ranges are tinted by their respective colour’s mean hue and saturation and a fix value of 75.Colours are displayed from top to bottom and represent colours from the field of patches.

    the covered range of each channel, as well as each channel’s mean value. Results of thesecomputations are shown in Fig. 3.

    Most of the colours are well differentiable from one another based on their covered hueranges under lighting condition (1a-1c). For tracking hands, marker colours need to be welldistinguishable from skin colour. According to [TP98], the range of skin colour (in HSV)largely coincides with three out of four patches from the upper row of coloured patches inlighting conditions without UV light (namely orange, red, and pink). Because skin reflectsonly little UV light [SAG99], this effect is much less pronounced in setups with added UVlight. This was also the main reason to employ a combination of UV-reactive make-up andthe addition of UV-light to a low-light setup. The inspection of the hue ranges of coloursunder lighting condition (2a-2c) revealed that the colour pairs (orange, red), (red, pink),(yellow, green), and (blue, white) overlap considerably. As a consequence, these coloursshould either not be used at all or at least not be put next to each other. E.g. leavingout red from the triplet (orange, red, pink) enables colour segmentation of orange and pinkbased on hue (plus morphological operations) around their respective mean value.

    4.3 Assessment of Colours Chosen for Markering

    Based on the analysis of hue ranges, we found the colours chosen for markers (see Fig. 1)suited for tracking. For reasons described in the previous section, we disregarded red. Wefurther left out violet as colour choice; it naturally hardly reacts with the combination of lowintensity light and UV light and hence exhibited lowest V-values throughout the experiments.

  • The UV-reactive colours chosen for markering are (in order of fingers) pink, yellow, blue,orange, green. This has been found suitable for the following reasons: (1) Hue ranges of thesecolours are well distinguishable from skin colour. (2) Neighbouring colours do not overlap intheir hue range. The more the colours’ hue range overlap the further away from one anotherthey have been placed on the hand. This ensures that markers can be consistently detectedby a combination of simple colour segmentation and geometric constraints in the markerplacements. (3) Most colours (except for pink) exhibit high values in S and/or V channels.This is beneficial as the surrounding is expected to react less to UV light as the markers.

    The final decision between white and blue has yet to be made. Both colours reside inapproximately the same hue range. While blue is the more saturated colour, white is brighter(value) and could hence be better detectable. Preliminary tests have shown that the clavieralso hardly reacts to UV light which would also make white UV-reactive make-up the betterchoice over its blue counterpart. This final decision will be made once we have evaluated themarkering in combination with the clavier.

    5 Conclusions

    Our experiments indicate that the use of UV light has advantages and disadvantages. Interms of skin and background distinction, using UV light has a positive effect. However,the separability of pairs of colours based on hue suffers from the UV lighting. The locationof markers in our envisioned marker setup alleviates this by placing pairs of colours withoverlapping hue ranges further away from each other.

    References

    [AdWB15] E. A. Arkenbout, J. C. F. de Winter, and P. Breedveld. Robust hand motion trackingthrough data fusion of 5DT data glove and Nimble VR kinect camera measurements.Sensors, 15(12):31644–31671, 2015.

    [BTG+12] L. Ballan, A. Taneja, J. Gall, L. Van Gool, and M. Pollefeys. Motion capture of handsin action using discriminative salient points. In European Conference on ComputerVision, pages 640–653. Springer, 2012.

    [DSD08] L. Dipietro, A. M. Sabatini, and P. Dario. A survey of glove-based systems and theirapplications. IEEE Trans. Syst., Man, Cybern., C, 38(4):461–482, 2008.

    [EBN+07] A. Erol, G. Bebis, M. Nicolescu, R. Boyle, and X. Twombly. Vision-based hand poseestimation: A review. Comp. Vis. Image Underst., 108(1–2):52 – 73, 2007.

    [Had12] A. Hadjakos. Pianist motion capture with the kinect depth camera. Proceedings of theSound and Music Computing Conference, pages 303–310, 2012.

    [HDC07] R. D. Hersch, P. Donzé, and S. Chosson. Color images visible under UV light. ACMTrans. Graph., 26(3), 2007.

  • [Hel] Hellerhoff. Own work, cc by-sa 3.0. https://commons.wikimedia.org/w/index.php?curid=12143891.

    [JHO10] S. Jörg, J. Hodgins, and C. O’Sullivan. The perception of finger motions. In Proc.Symp. Applied Perception in Graphics and Visualization, pages 129–133. ACM, 2010.

    [MB13] J. MacRitchie and N. J. Bailey. Efficient tracking of pianists’ finger movements. Journalof New Music Research, 42(1):79–95, 2013.

    [MHK06] T. B. Moeslund, A. Hilton, and V. Krüger. A survey of advances in vision-based humanmotion capture and analysis. Comput. Vis. Image Underst., 104(2):90–126, 2006.

    [MKY+06] K. Mitobe, T. Kaiga, T. Yukawa, T. Miura, H. Tamamoto, A. Rodgers, andN. Yoshimura. Development of a motion capture system for a hand using a magneticthree dimensional position sensor. In ACM SIGGRAPH, volume 102, 2006.

    [MNC+08] C. D. Metcalf, S. Notley, P. Chappell, J. Burridge, and V. Yule. Validation andapplication of a computational model for wrist and hand movements using surfacemarkers. IEEE Trans. Biomed. Eng., 55(3):1199–1210, 2008.

    [OKA11] I. Oikonomidis, N. Kyriazis, and A. Argyros. Efficient model-based 3d tracking of handarticulations using kinect. In Proceedings of the British Machine Vision Conference,pages 101.1–101.11. BMVA Press, 2011.

    [RLS09] D. Roetenberg, H. Luinge, and P. Slycke. Xsens MVN: Full 6dof human motion trackingusing miniature inertial sensors. Xsens Technologies, Tech. Rep., 2009.

    [SAG99] M. Störring, H. J. Andersen, and E. Granum. Skin colour detection under changinglighting conditions. In 7th Symposium on Intelligent Robotics Systems, pages 187–195,1999.

    [TCZDR06] A. Tognetti, N. Carbonaro, G. Zupone, and D. De Rossi. Characterization of a noveldata glove based on textile integrated sensors. In Proc. of the 28th IEEE EMBS AnnualInternational Conference, pages 2510–2513, 2006.

    [TP98] S. Tsekeridou and I. Pitas. Facial feature extraction in frontal views using biometricanalogies. In 9th European Signal Processing Conference (EUSIPCO), pages 1–4, 1998.

    [Vic16] Vicon Motion Systems Ltd. Camera systems. http://www.vicon.com/products/camera-systems, 2016.

    [WP09] R. Y. Wang and J. Popović. Real-time hand-tracking with a color glove. ACM Trans.Graph., 28(3):63:1–63:8, 2009.

    [WWS+15] N. Wheatland, Y. Wang, H. Song, M. Neff, V. Zordan, and S. Jörg. State of the art inhand and finger modeling and animation. Computer Graphics Forum, 34(2):735–760,2015.

    [Zha12] Z. Zhang. Microsoft kinect sensor and its effect. IEEE Multimedia Mag., 19(2):4–10,2012.

    https://commons.wikimedia.org/w/index.php?curid=12143891 https://commons.wikimedia.org/w/index.php?curid=12143891http://www.vicon.com/products/camera-systemshttp://www.vicon.com/products/camera-systems

    Introduction and MotivationReview of Related WorkA Setup for Recording Musicians' Hand and Finger MovementsRecording Setup – Hardware and LightingDesign of the Marker Setup

    Evaluation of Marker Colours and Lighting SetupExperiments in a Light BoxSeparation of Colour InformationAssessment of Colours Chosen for Markering

    Conclusions