SUSI.AI Chrome Bot and Web Speech: Integrating Speech Synthesis and Recognition

Susi Chrome Bot is a Chrome extension which is used to communicate with Susi AI. The advantage of having chrome extensions is that they are very accessible for the user to perform certain tasks which sometimes needs the user to move to another tab/site.

In this blog post, we will be going through the process of integrating the web speech API to SUSI Chromebot.

Web Speech API

Web Speech API enables web apps to be able to use voice data. The Web Speech API has two components:

Speech Recognition:  Speech recognition gives web apps the ability to recognize voice data from an audio source. Speech recognition provides the speech-to-text service.

Speech Synthesis: Speech synthesis provides the text-to-speech services for the web apps.

Integrating speech synthesis and speech recognition in SUSI Chromebot

Chrome provides the webkitSpeechRecognition() interface which we will use for our speech recognition tasks.

var recognition = new webkitSpeechRecognition();

 

Now, we have a speech recognition instance recognition. Let us define necessary checks for error detection and resetting the recognizer.

var recognizing;

function reset() {
recognizing = false;
}

recognition.onerror = function(e){
console.log(e.error);
};

recognition.onend = function(){
reset();
};

 

We now define the toggleStartStop() function that will check if recognition is already being performed in which case it will stop recognition and reset the recognizer, otherwise, it will start recognition.

function toggleStartStop() {
    if (recognizing) {
      recognition.stop();
      reset();
    } else {
      recognition.start();
      recognizing = true;
    }
}

 

We can then attach an event listener to a mic button which calls the toggleStartStop() function to start or stop our speech recognition.

mic.addEventListener("click", function () {
    toggleStartStop();
});

 

Finally, when the speech recognizer has some results it calls the onresult event handler. We’ll use this event handler to catch the results returned.

recognition.onresult = function (event) {
    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        textarea.value = event.results[i][0].transcript;
        submitForm();
      }
    }
};

 

The above code snipped tests for the results produced by the speech recognizer and if it’s the final result then it sets textarea value with the result of speech recognition and then we submit that to the backend.

One problem that we might face is the extension not being able to access the microphone. This can be resolved by asking for microphone access from an external tab/window/iframe. For SUSI Chromebot this is being done using an external tab. Pressing on the settings icon makes a new tab which then asks for microphone access from the user. This needs to be done only once, so that does not cause a lot of trouble.

setting.addEventListener("click", function () {
chrome.tabs.create({
url: chrome.runtime.getURL("options.html")
});
});navigator.webkitGetUserMedia({
audio: true
}, function(stream) {
stream.stop();
}, function () {
console.log('no access');
});

 

In contrast to speech recognition, speech synthesis is very easy to implement.

function speakOutput(msg){
    var voiceMsg = new SpeechSynthesisUtterance(msg);
    window.speechSynthesis.speak(voiceMsg);
}

 

This function takes a message as input, declares a new SpeechSynthesisUtterance instance and then calls the speak method to convert the text message to voice.

There are many properties and attributes that come with this speech recognition and synthesis interface. This blog post only introduces the very basics.

Resources