Visualization and Analysis for Multimodal Presentation (VAMP)

04:00 PM - 04:25 PM on July 17, 2016, Room CR7

Ben Leong

Audience level:


VAMP is a Python-based toolkit consisting of a set of scripts that operate on set of synchronized video, audio and skeletal data, and transform them into feature contours for visualization and analysis. While the core function emphasized in this talk is the application of a recursive parser to extract expressive body language features using skeletal data from a Kinect sensor, VAMP is designed to be a general toolkit for the development and standardization of algorithms for extracting multimodal features and creating automated scoring models for multimodal educational assessments, e.g. public speaking in a classroom.


Body language plays an important role in learning processes and communication. For example, communication research has demonstrated that mathematical knowledge can be embodied in gestures made by teachers and students. In addition, body postures and gestures are utilized by speakers in oral presentations for communicative purposes. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. The introduction of depth sensors, such as the Microsoft Kinect, into commercial hardware has greatly facilitated research and development in this area. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. In this talk, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community, and share this endeavor with the Python community at large.