Implementing Text-to-Speech (TTS) in SUSI Android

Mobile assistants are designed to perform tasks that the user “commands” through by chat UI or speech. The Android OS already provides Text to speech (TTS) and Speech to text (STT) features. This feature is available from Android version 1.6 onward. In this blog post I will show how tts is implemented in SUSI Android and how I fix the issue ‘delay in speech response’.

TextToSpeech class controls the tts engine. To use TextToSpeech class import it in the activity where you want to use text to speech feature.

import android.speech.tts.TextToSpeech;

After you import TextToSpeech class now we need to initialize TextToSpeech

TextToSpeech tts = new TextToSpeech(this,this);

Here first parameter is the Context and the other one is the listener. The listener is  use  to  inform our app that the engine is ready to use. In order to be notified we have to  implement  TextToSpeech.OnInitListener.

TextToSpeech.OnInitListener listener = new  TextToSpeech.OnInitListener {
public void onInit(int status) {
if (status == TextToSpeech.SUCCESS)
tts.setLanguage(Locale.UK/* set the default language*/);

Hence the engine can be initialized asIf status is success then, it means that TTS is initialized successfully and now we can use it. Otherwise, we can’t use it. setLanguage method is used to set language in which we want reply.

TextToSpeech tts = new TextToSpeech(getApplicationContext,listener)

When you use TTS one thing you have to remember that TTS run  on main thread so sometimes it may cause delays in text to speech conversion or it may block UI for a while. It is better to wrap it like below code.

new Handler().post(new Runnable() {
      public void run() {
         tts = new TextToSpeech(getApplicationContext(), listener);

Now our engine is ready to speak, we need simply pass the string we want to read.

tts.speak(text to read,TextToSpeech.QUEUE_FLUSH, null, null);

But before tts.speak, it is important to check for the audio focus change request. It is important because only one audio source can have focus at a time. You can check it using below code.

private AudioManager.OnAudioFocusChangeListener afChangeListener =
           new AudioManager.OnAudioFocusChangeListener() {
                 public void onAudioFocusChange(int focusChange) {
                                                        //check for focus

OnAudioFocusChangeListener is called when audio focus of the system is changed and according to value of focusChange either we stop TTS or keep using it.

AudioManager audiofocus = (AudioManager)                                    getSystemService(Context.AUDIO_SERVICE);

audiofocus is instance of AudioManager class. We need it to call requestAudioFocus method of AudioManager class. requestAudioFocus method returns the status of request for audio focus change. This method requires three parameter  instance of AudioManager.OnAudioFocusChangeListener, stream type and duration hint. If request is granted only then we can we can use tts.speak .

int result = audiofocus.requestAudioFocus(afChangeListener,AudioManager.STREAM_MUSIC, AudioManager.AUDIOFOCUS_GAIN);

if (result == AudioManager.AUDIOFOCUS_REQUEST_GRANTED) {

tts.speak(text to read,TextToSpeech.QUEUE_FLUSH, null, null);


We were continuously facing issue ‘delay in speech response’ because voiceReply method implementation was wrong. We were initializing TextToSpeech on each call of voiceReply method and since onInit method runs on main thread causing delay in voice response. So I removed it and instead of initializing tts each time I used the tts instance already initialized when activity create.

 String spoken = reply;

textToSpeech.speak(spoken, TextToSpeech.QUEUE_FLUSHnull);

You can also control how the engine read text. Like we can modify pitch and speech rate.