Social Media through Voice: Synthesized Voice Qualities and Self-presentation
- Lotus Zhang ,
- Lucy Jiang ,
- Nicole Washington ,
- Augustina Ao Liu ,
- Jingyao Shao ,
- Adam Fourney ,
- Meredith Ringel Morris ,
- Leah Findlater
With advances in expressive speech synthesis and conversational understanding, an ever-increasing amount of digital content—including social and personal content—can be consumed through voice. Voice has long been known to convey personal characteristics and emotional states, both of which are prominent aspects of social media. Yet, no study has investigated voice design requirements for social media platforms. We interviewed 15 active social media users about their preferences on using synthesized voices to represent their profiles. Our findings show that participants want to have control over how a voice delivers their content, such as the personality and emotion with which the voice speaks, because these prosodic variations can impact users’ online personas and interfere with impression management. We report motivations behind customizing or not customizing voice characteristics in different scenarios, and uncover key challenges around usability and the potential for stereotyping. We argue that synthesized speech for social media should be evaluated not only on listening experience and voice quality but also on its expressivity, degree of customizability, and ability to adapt to contexts (e.g., social media platforms, groups, individual posts). We discuss how our contribution confirms and extends knowledge of voice technology design and online self-presentation, and offer design considerations for voice personalization related to social interactions.