Speech Synthesis – blog.fossasia.org

Toggling Voice On/Off in SUSI Chromebot

Post author:gabru-md
Post published:February 16, 2018
Post category:CodeHeat FOSSASIA SUSI.AI
Post comments:0 Comments

SUSI Chromebot has a lot of features that make it one of the best projects of FOSSASIA.

Recently Voice/Speech was added to SUSI Chromebot. But there was no option that controlled the fact that whether speech output is needed or not.

The latest addition to SUSI Chromebot is Toggling the Voice of SUSI On or Off.

How was it achieved?

Toggling Voice for SUSI required adding a button and a snippet of Javascript code to the main JS file. The code will take care of the fact whether the voice is to be toggled on or off.

I started off by adding a button to the main HTML file.

<a href=”javascript: void(0)” id=”speak” style=”color: white”><i class=”material-icons” id=”speak-icon”>volume_up</i></a>

The above snippet of HTML code adds a voice button to the top bar of chromebot.

Then there was the major part where the javascript code was to be added to add the functionality to the button.

var shouldSpeak = true;

I started off by creating a variable called as “shouldSpeak” which will determine whether or not SUSI should use the Chrome’s API to speak.

Then I changed the “speakOut()” function and added another parameter to it.

function speakOut(msg,speak=false){

if(speak){

var voiceMsg = new SpeechSynthesisUtterance(msg);

window.speechSynthesis.speak(voiceMsg);

}

}

The above code made sure that susi was only allowed to speak when and only “speak” variable was set to true.

Then “eventListeners” were added to buttons and other things to link the functionality.

document.getElementById(‘speak’).addEventListener(‘click’,changeSpeak);

It adds the events of click to “speak” and associates it with the function “changeSpeak”.

Now the function “changeSpeak” is created as follows. It toggles the on/off mechanism of voice in SUSI Chromebot.

function changeSpeak(){

shouldSpeak = !shouldSpeak;

var SpeakIcon = document.getElementById(‘speak-icon’);

if(!shouldSpeak){

SpeakIcon.innerText = “volume_off”;

}

else{

SpeakIcon.innerText = “volume_up”;

}

console.log(‘Should be speaking? ’ + shouldSpeak);

}

Everytime the user clicks on the icon to toggle on/off voice the icon must also change and this functionality was taken care of by the above piece of code.

Resources

SUSI Chromebot Repository : https://github.com/fossasia/susi_chromebot
Pull Request for the same : https://github.com/fossasia/susi_chromebot/pull/113
Issue for the same : https://github.com/fossasia/susi_chromebot/issues/110
Read about Speech Synthesis : https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis
Learn How to add Event Listeners to Objects : https://www.w3schools.com/jsref/met_document_addeventlistener.asp
What is “javascript : void(0)” : https://stackoverflow.com/questions/1291942/what-does-javascriptvoid0-mean
Read about Material Icons : http://materializecss.com/icons.html

SUSI.AI Chrome Bot and Web Speech: Integrating Speech Synthesis and Recognition

Post author:zamhaq
Post published:October 23, 2017
Post category:CodeHeat FOSSASIA SUSI.AI
Post comments:0 Comments

Susi Chrome Bot is a Chrome extension which is used to communicate with Susi AI. The advantage of having chrome extensions is that they are very accessible for the user to perform certain tasks which sometimes needs the user to move to another tab/site.

In this blog post, we will be going through the process of integrating the web speech API to SUSI Chromebot.

Web Speech API

Web Speech API enables web apps to be able to use voice data. The Web Speech API has two components:

Speech Recognition: Speech recognition gives web apps the ability to recognize voice data from an audio source. Speech recognition provides the speech-to-text service.

Speech Synthesis: Speech synthesis provides the text-to-speech services for the web apps.

Integrating speech synthesis and speech recognition in SUSI Chromebot

Chrome provides the webkitSpeechRecognition() interface which we will use for our speech recognition tasks.

var recognition = new webkitSpeechRecognition();

Now, we have a speech recognition instance recognition. Let us define necessary checks for error detection and resetting the recognizer.

var recognizing;

function reset() {
recognizing = false;
}

recognition.onerror = function(e){
console.log(e.error);
};

recognition.onend = function(){
reset();
};

We now define the toggleStartStop() function that will check if recognition is already being performed in which case it will stop recognition and reset the recognizer, otherwise, it will start recognition.

function toggleStartStop() {
    if (recognizing) {
      recognition.stop();
      reset();
    } else {
      recognition.start();
      recognizing = true;
    }
}

We can then attach an event listener to a mic button which calls the toggleStartStop() function to start or stop our speech recognition.

mic.addEventListener("click", function () {
    toggleStartStop();
});

Finally, when the speech recognizer has some results it calls the onresult event handler. We’ll use this event handler to catch the results returned.

recognition.onresult = function (event) {
    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        textarea.value = event.results[i][0].transcript;
        submitForm();
      }
    }
};

The above code snipped tests for the results produced by the speech recognizer and if it’s the final result then it sets textarea value with the result of speech recognition and then we submit that to the backend.

One problem that we might face is the extension not being able to access the microphone. This can be resolved by asking for microphone access from an external tab/window/iframe. For SUSI Chromebot this is being done using an external tab. Pressing on the settings icon makes a new tab which then asks for microphone access from the user. This needs to be done only once, so that does not cause a lot of trouble.

setting.addEventListener("click", function () {
chrome.tabs.create({
url: chrome.runtime.getURL("options.html")
});
});navigator.webkitGetUserMedia({
audio: true
}, function(stream) {
stream.stop();
}, function () {
console.log('no access');
});

In contrast to speech recognition, speech synthesis is very easy to implement.

function speakOutput(msg){
    var voiceMsg = new SpeechSynthesisUtterance(msg);
    window.speechSynthesis.speak(voiceMsg);
}

This function takes a message as input, declares a new SpeechSynthesisUtterance instance and then calls the speak method to convert the text message to voice.

There are many properties and attributes that come with this speech recognition and synthesis interface. This blog post only introduces the very basics.

Resources

Implementing Text To Speech Settings in SUSI WebChat

Post author:udaytheja
Post published:August 30, 2017
Post category:FOSSASIA GSoC SUSI.AI Tutorial
Post comments:0 Comments

SUSI Web Chat has Text to Speech (TTS) Feature where it gives voice replies for user queries. The Text to Speech functionality was added using Speech Synthesis Feature of the Web Speech API. The Text to Speech Settings were added to customise the speech output by controlling features like :

Language
Rate
Pitch

Let us visit SUSI Web Chat and try it out.

First, ensure that the settings have SpeechOutput or SpeechOutputAlways enabled. Then click on the Mic button and ask a query. SUSI responds to your query with a voice reply.

To control the Speech Output, visit Text To Speech Settings in the /settings route.

First, let us look at the language settings. The drop down list for Language is populated when the app is initialised. speechSynthesis.onvoiceschanged function is triggered when the app loads initially. There we call speechSynthesis.getVoices() to get the voice list of all the languages currently supported by that particular browser. We store this in MessageStore using ActionTypes.INIT_TTS_VOICES action type.

window.speechSynthesis.onvoiceschanged = function () {
  if (!MessageStore.getTTSInitStatus()) {
    var speechSynthesisVoices = speechSynthesis.getVoices();
    Actions.getTTSLangText(speechSynthesisVoices);
    Actions.initialiseTTSVoices(speechSynthesisVoices);
  }
};

We also get the translated text for every language present in the voice list for the text – `This is an example of speech synthesis` using google translate API. This is called initially for all the languages and is stored as translatedText attribute in the voice list for each element. This is later used when the user wants to listen to an example of speech output for a selected language, rate and pitch.

https://translate.googleapis.com/translate_a/single?client=gtx&sl=en-US&tl=TARGET_LANGUAGE_CODE&dt=t&q=TEXT_TO_BE_TRANSLATED

When the user visits the Text To Speech Settings, then the voice list stored in the MessageStore is retrieved and the drop down menu for Language is populated. The default language is fetched from UserPreferencesStore and the default language is accordingly highlighted in the dropdown. The list is parsed and populated as a drop down using populateVoiceList() function.

let voiceMenu = voices.map((voice,index) => {
  if(voice.translatedText === null){
    voice.translatedText = this.speechSynthesisExample;
  }
  langCodes.push(voice.lang);
  return(
    <MenuItem value={voice.lang}
              key={index}
              primaryText={voice.name+' ('+voice.lang+')'} />
  );
});

The language selected using this dropdown is only used as the language for the speech output when the server doesn’t specify the language in its response and the browser language is undefined. We then create sliders using Material UI for adjusting speech rate and pitch.

<h4 style={{'marginBottom':'0px'}}><Translate text="Speech Rate"/></h4>
<Slider
  min={0.5}
  max={2}
  value={this.state.rate}
  onChange={this.handleRate} />

The range for the sliders is :

Rate : 0.5 – 2
Pitch : 0 – 2

The default value for both rate and pitch is 1. We create a controlled slider saving the values in state and using onChange function to record change in values. The Reset buttons can be used to reset the rate and pitch values respectively to their default values. Once the language, rate and pitch values have been selected we can click on `Play a short demonstration of speech synthesis` to listen to a voice reply with the chosen settings.

{ this.state.playExample &&
  (
    <VoicePlayer
       play={this.state.play}
       text={voiceOutput.voiceText}
       rate={this.state.rate}
       pitch={this.state.pitch}
       lang={this.state.ttsLanguage}
       onStart={this.onStart}
       onEnd={this.onEnd}
    />
  )
}

We use the VoicePlayer by passing the required props to get the speech output. onStart and onEnd functions are triggered at the beginning and ending of the speech synthesis and are used to control the state from the parent component. Chosen language, rate, pitch and translated text are passed as props to VoicePlayer which creates a new SpeechSynthesisUtterance() with the passed props and plays the speech output.

On saving these settings and then using the Mic button to get voice replies we see that the voice output is controlled according to the selected settings.

Finally, we have to store the selected settings on the server and ensure that these are pulled when the app is initialized. The format in which these settings are stored in the server is :

Speech Rate

- Used to control rate of speech output.
- SETTING_NAME :  `speechRate`
- SETTING_VALUE : `0.5 - 2`
- DEFAULT_VALUE : `1`
 
Speech Pitch

- Used to control pitch of speech output.
- SETTING_NAME :  `speechPitch`
- SETTING_VALUE : `0 - 2`
- DEFAULT_VALUE : `1`
 
TTS Language

- Used to set the language for Text-To-Speech used when the response from server doesnt specify language and the browser language is also undefined.
- SETTING_NAME :  `ttsLanguage`
- SETTING_VALUE : `Language Code (string)`
- DEFAULT_VALUE : `en-US`

This is how the Text To Speech Settings were implemented in SUSI Web Chat. The complete code can be found at SUSI Web Chat Repository.

PS: To test whether your browser supports Text To Speech, open your browser console and try the following :

var msg = new SpeechSynthesisUtterance(‘Hello World’);
window.speechSynthesis.speak(msg)

If you get a speech output then the Web API Speech Synthesis is supported by your browser and Text To Speech features of SUSI Web Chat will work. The Web Speech API has support for all latest Chrome browsers as mentioned in the Web Speech API Mozilla docs.However there are few bugs with some Chromium versions please check out more on how to fix them locally here in this link.

Resources:

Web Speech API Repository – https://github.com/mdn/web-speech-api/
Speech Synthesis Documentation – https://developer.mozilla.org/en-US/docs/Web/API/Window/speechSynthesis
Speech Synthesis Demo – http://mdn.github.io/web-speech-api/speak-easy-synthesis/
Translating Text Guide – https://ctrlq.org/code/19909-google-translate-api
Material UI Documentation – http://www.material-ui.com/#/components/slider