Building tech that reflects our diversity
December 16, 2020
December 16, 2020
Voice assistant use in the U.S. has been growing steadily since 2017. Estimates suggest that about 40% of all U.S. internet users – more than 122 million people – will use voice assistants by 2021. Those 122 million people who adopt voice assistants into their daily routines will consist of a diverse population including male, female, and non-binary individuals. But voice assistants don’t reflect this diversity.
The term non-binary describes people who sometimes or always identify as not being limited to an exclusively male or female gender. In the U.S., 12% of millennials identify as transgender or gender non-conforming and 56% of Gen Z’ers know someone who uses gender-neutral pronouns, such as they/them, to describe themselves. All of these are examples of a growing openness to non-binary identities. U.S. devices, by contrast, have overwhelmingly have female voices. Male text-to-speech (TTS) voices have been developed but aren’t often used. This is true even when devices provide male voices as options, because the vast majority of users never touch the default settings on their devices. And so far, there’s been very little effort put toward even creating non-binary voice assistants that can represent and include these individuals.
This limited diversity is proving problematic. A 2019 UNESCO report, I’d Blush if I could, showed that designing only female voice assistants encourages negative behavior, both with the assistants and with real people. Already, 5% of interactions with voice assistants are sexually explicit and 30% are off-topic, abusive, romantic or sexual.
What’s more, interactions with voice assistants are inherently social interactions; in a study conducted by Google, 41% of people said that talking to a voice assistant felt like talking to a friend or another person. But in nearly all of these interactions, current U.S. voice assistant users can only “talk” to a female AI. To address this, Accenture Labs partnered with CereProc to create Sam: a non-binary text-to-speech voice.
While pitch is the first thing that comes to mind when thinking about voices, gender presentation is much more than that. It’s a combination of pitch, intonation, and word choice. And non-binary speech is made up of a combination of male and female speech patterns, meaning there is no one non-binary voice. So, to create our “Sam” voice, we didn’t just alter the pitch of the voice to make it non-binary. We also incorporated transcribed audio data from non-binary individuals to influence speech patterns and intonation.
We included the non-binary community in the design and development of the voice, conducting two surveys to get their feedback throughout the design process and ensure they felt comfortable with the voice representing them. This helped shape the sound of the voice significantly. We got amazing feedback from the community, with one individual even stating:
“…this is my favorite I have ever heard and would buy it in an instant if it were available … I love this voice.”
We’re also planning on conducting a study to better understand how voice assistant users perceive Sam.
We’re proud and excited about this work, but it’s just the beginning. There’s so much opportunity in this space, including localization; right now, the voice has been created for US English, but we’d like to see it expanded for use in other languages and to reflect other dialects or accents.
These and other opportunities are why we’re excited to announce that we’ve open-sourced the audio recordings used to generate Sam, as well a version that uses an open-source engine and details about how we created our voice. We hope that others will use these tools in the development of their own voice assistants.
We look forward to the day when our voice assistants represent the diversity of the users in the world and where users can choose the voice assistant gender that they prefer. Join us in making this a reality.
For more information about this research, contact Andreea Danielescu. To request access to the open-sourced audio recordings and engine, click here.