Video based activity recognition in trauma resuscitation
International Conference and Workshops on Automatic Face and Gesture Recognition |
We present a system for automated transcription of trauma resuscitation in the emergency department (ED). Using a ceiling-mounted single camera video recording, our goal is to track and transcribe the medical procedures performed during resuscitation of a patient, the time instances of their initiation and their temporal durations. In this multi-agent, multitask setting, we represent procedures as high-level concepts composed of low-level features based on the patient’s pose, scene dynamics, clinician motions and device locations. In particular, the low-level features are transformed into intermediate action attributes (e.g., “hand grasping of an object of interest”) and are used as building blocks to describe procedures. Procedures are expressed as first-order logic statements that capture spatio-temporal attribute interactions compactly in an activity grammar. The probabilities from feature observations and the logical semantics are combined probabilistically in a Markov Logic Network (MLN). At runtime, a Markov Network is dynamically constructed representing hypothesized procedures, spatio-temporal relationships and attribute probabilities. Inference on this network determines the most consistent sequence of procedures over time. Our activity model is modular and extendible to a multitude of sensor inputs and detection methods. The method is thus adaptable to many activity recognition problems. In this paper, we show our approach using videos of simulated trauma simulations. The accuracy of the results confirms the suitability of our framework.