Implementing Text To Speech Settings in SUSI WebChat

Implementing Text To Speech Settings in SUSI WebChat

SUSI Web Chat has Text to Speech (TTS) Feature where it gives voice replies for user queries. The Text to Speech functionality was added using Speech Synthesis Feature of the Web Speech API. The Text to Speech Settings were added to customise the speech output by controlling features like :

  1. Language
  2. Rate
  3. Pitch

Let us visit SUSI Web Chat and try it out.

First, ensure that the settings have SpeechOutput or SpeechOutputAlways enabled. Then click on the Mic button and ask a query. SUSI responds to your query with a voice reply.

To control the Speech Output, visit Text To Speech Settings in the /settings route.

First, let us look at the language settings. The drop down list for Language is populated when the app is initialised. speechSynthesis.onvoiceschanged function is triggered when the app loads initially. There we call speechSynthesis.getVoices() to get the voice list of all the languages currently supported by that particular browser. We store this in MessageStore using ActionTypes.INIT_TTS_VOICES action type.

window.speechSynthesis.onvoiceschanged = function () {
  if (!MessageStore.getTTSInitStatus()) {
    var speechSynthesisVoices = speechSynthesis.getVoices();
    Actions.getTTSLangText(speechSynthesisVoices);
    Actions.initialiseTTSVoices(speechSynthesisVoices);
  }
};

We also get the translated text for every language present in the voice list for the text – `This is an example of speech synthesis` using google translate API. This is called initially for all the languages and is stored as translatedText attribute in the voice list for each element. This is later used when the user wants to listen to an example of speech output for a selected language, rate and pitch.

https://translate.googleapis.com/translate_a/single?client=gtx&sl=en-US&tl=TARGET_LANGUAGE_CODE&dt=t&q=TEXT_TO_BE_TRANSLATED

When the user visits the Text To Speech Settings, then the voice list stored in the MessageStore is retrieved and the drop down menu for Language is populated. The default language is fetched from UserPreferencesStore and the default language is accordingly highlighted in the dropdown. The list is parsed and populated as a drop down using populateVoiceList() function.

let voiceMenu = voices.map((voice,index) => {
  if(voice.translatedText === null){
    voice.translatedText = this.speechSynthesisExample;
  }
  langCodes.push(voice.lang);
  return(
    <MenuItem value={voice.lang}
              key={index}
              primaryText={voice.name+' ('+voice.lang+')'} />
  );
});

The language selected using this dropdown is only used as the language for the speech output when the server doesn’t specify the language in its response and the browser language is undefined. We then create sliders using Material UI for adjusting speech rate and pitch.

<h4 style={{'marginBottom':'0px'}}><Translate text="Speech Rate"/></h4>
<Slider
  min={0.5}
  max={2}
  value={this.state.rate}
  onChange={this.handleRate} />

The range for the sliders is :

  • Rate : 0.5 – 2
  • Pitch : 0 – 2

The default value for both rate and pitch is 1. We create a controlled slider saving the values in state and using onChange function to record change in values. The Reset buttons can be used to reset the rate and pitch values respectively to their default values. Once the language, rate and pitch values have been selected we can click on `Play a short demonstration of speech synthesis`  to listen to a voice reply with the chosen settings.

{ this.state.playExample &&
  (
    <VoicePlayer
       play={this.state.play}
       text={voiceOutput.voiceText}
       rate={this.state.rate}
       pitch={this.state.pitch}
       lang={this.state.ttsLanguage}
       onStart={this.onStart}
       onEnd={this.onEnd}
    />
  )
}

We use the VoicePlayer by passing the required props to get the speech output. onStart and onEnd functions are triggered at the beginning and ending of the speech synthesis and are used to control the state from the parent component. Chosen language, rate, pitch and translated text are passed as props to VoicePlayer which creates a new SpeechSynthesisUtterance() with the passed props and plays the speech output.

On saving these settings and then using the Mic button to get voice replies we see that the voice output is controlled according to the selected settings.

Finally, we have to store the selected settings on the server and ensure that these are pulled when the app is initialized. The format in which these settings are stored in the server is :

Speech Rate

- Used to control rate of speech output.
- SETTING_NAME :  `speechRate`
- SETTING_VALUE : `0.5 - 2`
- DEFAULT_VALUE : `1`
 
Speech Pitch

- Used to control pitch of speech output.
- SETTING_NAME :  `speechPitch`
- SETTING_VALUE : `0 - 2`
- DEFAULT_VALUE : `1`
 
TTS Language

- Used to set the language for Text-To-Speech used when the response from server doesnt specify language and the browser language is also undefined.
- SETTING_NAME :  `ttsLanguage`
- SETTING_VALUE : `Language Code (string)`
- DEFAULT_VALUE : `en-US`

This is how the Text To Speech Settings were implemented in SUSI Web Chat. The complete code can be found at SUSI Web Chat Repository.

PS: To test whether your browser supports Text To Speech, open your browser console and try the following :

  • var msg = new SpeechSynthesisUtterance(‘Hello World’);
  • window.speechSynthesis.speak(msg)

If you get a speech output then the Web API Speech Synthesis is supported by your browser and Text To Speech features of SUSI Web Chat will work. The Web Speech API has support for all latest Chrome browsers as mentioned in the Web Speech API Mozilla docs.However there are few bugs with some Chromium versions please check out more on how to fix them locally here in this link.

Resources:

 

 

Close Menu