The rise of voice assistants is here: it started with Amazon’s Alexa, closely followed by the Google Home housing the Google Assistant, which led to the likes of Microsoft and Apple to improve upon Cortana and Siri. Even start-ups are looking to get in on the action! We’re now seeing an increase in voice interactions; by 2020 50% of all searches are expected to be done by voice and 30% of searches without a screen3. What’s more, 55% of people are expected to have a smart speaker in their homes by 20223.

With around 70,000 Alexa skills and roughly 5,000 Google Actions available to consumers2, we still see an abundance of voice interactions fail to grip and please customers. I’ve developed a list of  best practices below, to help you create simple, gripping, dynamic, frustration-free voice experiences for your customers.

Voice-first approach

With many of today’s devices also supporting a screen display, it’s easy to design and implement a virtual assistant that follows a GUI (Graphical User Interface) pattern. This, however, is not always the best experience. Today’s virtual assistants allow us to move away from the typical IVR (Interactive Voice Response) approach of “Press 1 for billing information, Press 2 for account details” and so on. Instead, you can create conversations that are more fluid and natural. The key to this is always keeping voice as the main interaction and using any displays as support to give a more immersive experience. Not all virtual assistants have screens. Around 1 in 20 smart speaker owners have a smart display1, so you’ll isolate most users if your voice interactions require the use of a screen. You want to keep your customers focused on the conversation itself, and not distracted by visuals.

A voice first approach allows customers to get exactly what they need through simple commands. Speedy and rapid responses leave customers feeling satisfied. Visuals can enhance the conversation without taking away from the key information provided by voice. See below the simple interaction with Amazon’s Alexa when asking for the weather.

View Transcript

The customer gets a simple voice response telling them the current weather in their location, and the screen shows this information plus more, providing information not in the voice response such as the times during the day the weather changes. This simple interaction enhances the users experience providing exactly what they asked through voice, and further information through visuals.

Keep it simple

When first creating your use cases for the virtual assistant, keep it simple. It’s easy to want to offer everything you have. Start off with a few important, common user interactions to give a uniform experience. Nothing too intricate or complicated; interacting with voice should be easy.

From this initial build—and any analytics implemented—you can learn how customers interact with your virtual assistant. This feedback can then be used to refine your use cases. It’s important to understand what your customers are asking for and how they want to interact. Try to capture the utterances that your customers are saying regularly that the virtual assistant doesn’t understand and look to build those in future implementations.

Slowly, you can start building up the features offered by the virtual assistant, constantly refining them through user feedback.

Interact naturally, give it character, personalize the experience

In addition to keeping it simple, you’ll want to make your virtual assistant sound as natural as possible. This can vary depending on the industry that your virtual assistant sits in, but your virtual assistant should always respond naturally, rather than provide a robotic rigid response. With voice, people expect a conversation, not a form to fill out.

A great way to add character to your virtual assistant is through Speech Synthesis Markup Language (SSML). The ability to add audio, breaks, emphasis and modify prosody (the volume, pitch, and rate of the tagged speech), gives your virtual assistant more character and makes it sound more natural.

Now that your virtual assistant has some personality, you’ll want to look at personalizing the experience for the user, too. If you have information such as their name available, greet them with it. If they’re a returning user, customize the experience so that they may hear something different and relevant to them the next time they interact with your virtual assistant. Dynamic responses are important—you don’t want to bore your customers with the same speech outputs every time. Find various ways you can respond to the customer while still providing the same information. Depending on your use case, you can take it even further. For example, an Alexa Skill I built that recommends drinks was integrated with a weather API. Depending on the weather in the customer’s location, the skill would recommend a different set of drinks; maybe a hot chocolate for a rainy day or a lemonade for when the sun is out.

Take the simple interaction of the two examples below. You can see that with the latter, the experience for the customer is far more pleasant. The personalization and natural interaction with simple things such as saying a name and welcoming a repeat customer can make all the difference to customers returning to your virtual assistant.


Interaction One:

Customer: I’d like to book an appointment

VA: On what day?

Customer: Next Thursday

VA: At which time?

Customer: 3 pm

VA: Your appointment has been booked. An email has been sent to verify. Thank you, goodbye.


Interaction Two:

Customer: I’d like to book an appointment

VA: <audio Intro> Hi Faisal, welcome back! I can certainly help with that. What day would you like to book your appointment?

Customer: Next Thursday

VA: Perfect, I can see we have available slots. Do you have a preference for the time of your appointment?

Customer: 3 pm

VA: Great, that’s available! I’ve gone ahead and booked that for you, you should receive an email confirmation shortly too. Is there anything else I can assist you with today?

Contextual Awareness

Human conversations are quite unique and can feel hard to replicate when developing a voice application. In our conversations, we can start off talking about one subject, veer off to a completely unrelated subject, and then return back to the initial discussion. When returning to these different parts of the conversation, we remember what we were talking about and proceed from where we left off.

Many of today’s voice applications don’t factor this in. Instead they follow more of a tree diagram flow, where one interaction leads on to a definitive set of other interactions. Thinking about it programmatically this makes sense, but not when put into practice. It can lead to a frustrating experience when a customer is halfway through a flow, asks for something else, but instead gets the dreaded response of “I’m sorry, I didn’t quite get that. What did you say?”.

To combat this, try to make your voice application as contextually aware as possible. Allow your virtual assistant to suddenly switch from one intent/use case to another and store key information throughout the conversation, so that if the customer chooses to return to that initial flow, they can carry on from where they left off. Below is an example of this when booking a holiday:

Customer: I’d like the cheapest flights available to Barcelona at the end of July, please.

VA: <audio Intro> Hi Faisal, just checking what we have available for you <pause> Okay, the cheapest flights we have are £340 return on the 21st of July. Would you like to hear further details?

Customer: Oh wait, I think I need travel insurance.

VA: Let me verify if you have travel insurance with us <pause> Good news is that you do, however it looks like your travel insurance will run out before the end of July. Would you like me turn on auto renewal for you?

Customer: Is it going to be more expensive?

VA: Not at all, it will be the same annual price of £29.99 that was paid last year. Is this okay with you?

Customer: Yes, please confirm the auto renewal.

VA: I’ve gone ahead and turned on auto renewal. Shall we carry on with booking your flights? I can provide further details on the £340 return journey to Barcelona on the 21st July.

Customer: Yes, please tell me more.

The initial intent was to find the cheapest tickets to Barcelona during a specific time period. The user then had an unrelated thought about travel insurance. In this scenario, the virtual assistant stored the key information to be used later and handled the user’s current request rather than forcing them to finish the current flow. The virtual assistant used the stored information from the first intent of finding flights to assist with the travel insurance query. It knows the customer wants to fly around the end of July, so it first checks if they have travel insurance and then goes on to check when it expires and if it’s before the end of July. This contextual awareness allows for a seamless voice interaction with the customer without having to constantly prod them for more information. Lastly, the virtual assistant knows it has completed the actions required of the travel insurance intent. It checks to see what intents/flows have yet to be completed and proceeds to prompt the user to see if they’d like to carry on with that flow, giving a reminder of the information provided before.


Keeping the above in mind, you will put yourself in a position to create a great voice experience. However, do remember this is still a relatively new field and even some of the most successful voice experiences are still learning how to best communicate with their users. What’s most important is to test with real people from a broad range of backgrounds and be prepared to make changes and improvements as your voice application starts to grow.







Faisal Valli

Voice Technology Lead

Subscription Center
Subscribe to Software Engineering Blog Subscribe to Software Engineering Blog