There are many standards for describing human body posture and
gestures in the various related fields of human-computer
interaction, psychology, computer graphics and animation, motion
capture, and so forth. This is an attempt to unify those
approaches and to specify a language that can serve all related
disciplines. We propose ways to achieve a flexible language that
allows decoupling of "gesture recognition" (in its wide sense)
methods from their use for many applications. The benefits of
such a unifying approach include fast and easy technology changes
without modifications to the content modules, independent
development of source methods from target applications, automatic
translators between language descriptors, and so forth. In terms
of applications, this has immediate benefits for humans in
immersive training and simulation applications, for interaction
with mixed reality environments (e.g., for situational awareness),
for human activity analysis (surveillance), and for automated
translators of cultural body language.
Examples
Descriptors
Visemes
The English language requires 22 visemes to describe all mouth
shapes that produce English phonemes. That both the sound and the
visuals matter is best showcased in the so-called McGurk effect.
Microsoft's text-to-speech API uses the 13 Disney visemes, which
were deemed sufficient to render cartoon characters that speak. A
possible mapping between the two is shown on this good
page about visemes. Visemes are not sufficient for modern
animation demands, primarily because phonemes and visemes do not
simply occur at the same time. Instead, we "co-articulate"
phonemes: we prepare for the next one while still sounding the
prior one. How the move to the anatomically-based FACS
description can be made in animation is detailed in this viseme
article at Gamasutra.
The following example shows how to represent visemes in the
suggested CGL.
Skeleton Tree
A skeleton tree that could be employed in a common gesture
language. Note that it is not as detailed as it should be, but
any common gesture language should be extensible to allow for
future integration of finer detail. Also note that, unlike in
most other human skeletons, FACS action units are a part of this
tree.
Publications and Presentations
Presentation to DoN Chief Information Officer,
Business Standards Council, 2/21/06
Kölsch, M. and Martell, C.: Towards a Common Human
Gesture Description Language, Workshop on Mixed Reality User
Interfaces, at VR 2006. (preliminary
pdf)
Kölsch, M. and Martell, C.: Body Gestures -
Recognition, Translation, Interpretation, and Display, NPS
Technical Report, upcoming, 2006/1.