Exploring how context, culture, and character matter in avatar research

已发布 2024年3月18日

作者 Marta Wilczkowiak (SHE/HER) , Principal Science Lead, Mesh Labs Sean Rintel , Senior Principal Research Manager Mar Gonzalez-Franco , Staff Research Scientist

分享这个页面

This research paper was presented at the IEEE VR Workshop Series on Animation in Virtual and Augmented Environments (opens in new tab) (ANIVAE 2024), the premier series on 3D content creation for simulated training in extended reality.

Face-to-face communication is changing, moving beyond physical interaction to include video conferencing and AR/VR platforms, where the participants are represented by avatars. Sophisticated avatars, animated through motion tracking, can realistically portray their human counterparts, but they can also suffer from noise, such as jitter and distortion, reducing their realism. Advances in motion-capture technology aim to reduce such issues, but they come with higher development costs and require additional time due to the need for advanced components. While some noise is inevitable, it’s important to determine acceptable types and levels to efficiently develop and introduce AR/VR devices and avatars to the market. Additionally, understanding how noise impacts avatar-based communication is essential for creating more inclusive avatars that accurately represent diverse cultures and abilities, enhancing the user experience.

In our paper, “Ecological Validity and the Evaluation of Avatar Facial Animation Noise,” presented at ANIVAE 2024, we explore the challenge of evaluating avatar noise without a standardized approach. Traditional methods, which present participants with isolated facial animation noise to gauge perception thresholds, fall short of reflecting real-life avatar interactions. Our approach emphasizes ecological validity—the extent to which experiments mimic real-world conditions—as central in assessing avatar noise. We discovered this significantly influences participants’ response to avatars, highlighting the impact of context on noise perception. Our goal is to improve avatar acceptance, inclusivity, and communication by developing noise evaluation methods that better represent actual experiences.

Seeing the big picture

To set up our study, we animated two avatars using motion capture, as depicted in Figure 1 (A). We recorded the performance of two professional actors enacting a scene between an architect and a client discussing home renovations and examining a 3D model of the proposed design. We used two proprietary characters for the avatars, whose faces were animated with 91 expression blendshapes. This allowed for a broad range of facial expressions and subtle variations in emotions, contributing to a more realistic animation. To examine different dynamics, we created six variations of the scene, changing the characters’ gender, role, and whether they agreed on the renovation plan.

Figure 1: A. Motion capture of a social interaction scenario for the experiment. B. The motion capture was remapped to stylized avatars. C. Participants experienced the scene wearing a HoloLens 2 and responded to questions on a tablet app. D. The avatars’ facial features were degraded with different types of animation noises of varying severity.

Fifty-six participants engaged in two experiments to evaluate the impact of noise on avatar facial animation. The first experiment had low ecological validity. Participants viewed fragmented clips of dialogue through a Microsoft HoloLens 2 device and used a slider to adjust any noise to an acceptable level. The second experiment featured high ecological validity, showing the scene in its full social context. Here, participants used a HoloLens 2 to judge the noise in facial expressions as either “appropriate” or “inappropriate” for the conversation. In contrast to the first experiment, this method considered the social aspects of context, culture, and character.

Results indicate that noise was less distracting when participants viewed the scene in its entirety, revealing a greater tolerance for noise in high ecological validity scenarios. Isolated clips, on the other hand, led to greater annoyance with facial animation noise, suggesting the importance of social context over hyper-realistic animation.

Cultural observations showed that noise perception was influenced by implicit cultural norms, particularly around gender roles and agreement levels. For example, in the second experiment, where participants viewed the conversation within its greater social context (high ecological validity), noise was deemed “appropriate” when the female architect agreed with the male client and “inappropriate” when she disagreed, revealing potential gender biases not observed in reversed gender roles. These findings emphasize the importance of applying high ecological validity in experiments to uncover socio-cultural influences on avatar perception. They also underscore the need to carefully consider context and cultural dynamics in avatar design.

Finally, we explored the character trait of empathy. Participants with lower empathy scores were more critical of noise in context-rich scenarios. This indicates that experiments focusing solely on low ecological validity might overlook important insights on how empathy influences responses to avatar facial animation noise.

Avatars need to be studied in realistic situations

When people communicate, they engage in a complex process influenced by environment, cultural background, and the nonverbal cues they perceive and interpret. By prioritizing high ecological validity in studies on avatar perception, researchers can uncover these socio-cultural influences and trust that their findings are relevant and applicable to real-life interactions within digital spaces.

Our research examines how different combinations of demographic characteristics change the way people react to avatars, and we hope to encourage more inclusivity in avatar design. It’s essential to have an established set of guidelines to achieve this goal, and this work is one step in that direction. While our study’s scope is limited, its methodology can be applied broadly across different devices and settings.

Acknowledgements

We would like to thank Ken Jakubzak, James Clemoes, Cornelia Treptow, Michaela Porubanova, Kerry Read, Daniel McDuff, Marina Kuznetsova and Mathew Lamb for their research collaboration. We would also like to thank Shawn Bruner for providing the characters for the study and Panagiotis Giannakopoulos for leading the animation and motion capture pipelines.