Web Speech – blog.fossasia.org

Adding Speech Component in Loklak Search

Post author:simsausaurabh
Post published:June 8, 2018
Post category:FOSSASIA GSoC loklak
Post comments:0 Comments

Speech recognition service for voice search is already embedded in Loklak. Now the idea is to use this service and create a new separate component for voice recognition with an interactive and user friendly interface. This blog will cover every single portion of an Angular’s redux based component from writing actions and reducers to the use of created component in other required components.

Creating Action and Reducer Function

The main idea to create an Action is to control the flow of use of Speech Component in Loklak Search. The Speech Component will be called on and off based on this Action.

Here, the first step is to create speech.ts file in actions folder with the following code:

import { Action } from '@ngrx/store';

export const ActionTypes = {
   MODE_CHANGE: '[Speech] Change',
};

export class SearchAction implements Action {
   type = ActionTypes.MODE_CHANGE;

   constructor(public payload: any) {}
}

export type Actions
   = SearchAction;

In the above segment, only one action (MODE_CHANGE) has been created which is like a boolean value which is being returned as true or false i.e. whether the speech component is currently in use or not. This is a basic format to be followed in creating an Action which is being followed in Loklak Search. The next step would be to create speech.ts file in reducers folder with the following code:

import { Action } from '@ngrx/store';
import * as speech from '../actions/speech';
export const MODE_CHANGE = 'MODE_CHANGE';

export interface State {
   speechStatus: boolean;
}
export const initialState: State = {
   speechStatus: false
};
export function reducer(state: State = initialState,
   action: speech.Actions): State {
        switch (action.type) {
            case speech.ActionTypes.MODE_CHANGE: {
                const response = action.payload;
                return Object.assign({}, state,
                {speechStatus: response});
       }
       default: {
           return state;
       }
   }
}
export const getspeechStatus = (state: State) =>
    state.speechStatus;

It follows the format of reducer functions created in Loklak Search. Here, the main key point is the state creation and type of value it is storing i.e. State is containing a speechStatus of type boolean. Defining an initial state with speechStatus value false (Considering initially the Speech Component will not be in use). The reducer function a new state by toggling the input state based on the type of Action created above and it returns the input state by default. At last wrapping the state as a function and returning the state’s speechStatus value.

Third and last step in this section would be to create a selector for the above reducer function in the root reducer index file.

Import and add speech from speech reducer file into the general state in root reducer file. And at last export the created selector function for speech reducer.

import * as fromSpeech from './speech';
export interface State {
   ...
   speech: fromSpeech.State;
}
export const getSpeechState = (state: State) =>
    state.speech;
export const getspeechStatus = createSelector(
    getSpeechState, fromSpeech.getspeechStatus);

Creating Speech Component

Now comes the main part to create and define the functioning of Speech Component. For creating the basic Speech Component, following command is used:

ng generate component app/speech --module=app

It will automatically create and provide Speech Component in app.module.ts. The working structure of Speech Component has been followed as of Google’s voice recognition feature for voice searching. Rather than providing description of each single line of code the following portion will cover the main code responsible for the functioning of Speech Component.

Importing and defining speech service in constructor:

import {
    SpeechService
} from '../services/speech.service';
constructor(
       private speech: SpeechService,
       private store: Store<fromRoot.State>,
       private router: Router
   ) {
       this.resultspage =this.router.url
       .toString().includes('/search');
       if (this.resultspage) {
           this.shadowleft = '-103px';
           this.shadowtop = '-102px';
       }
       this.speechRecognition();
}
speechRecognition() {
    this.speech.record('en_US').subscribe(voice =>
        this.onquery(voice));
}

When the Speech Component is called, speechRecognition() method will start recording speech (It will use the record() method from speech service to record the user voice).

For fluctuating border height and color of voice search icon, a resettimer() method is created.

randomize(min, max) {
    let x;
    x = (Math.random() * (max - min) + min);
    return x;
}
resettimer(recheck: boolean = false) {
    this.subscription.unsubscribe();
    this.timer = Observable.timer(0, 100);
    this.subscription = this.timer.subscribe(t => {
    this.ticks = t;
        if (t % 10 === 0 && t <= 20) {
            this.buttoncolor = '#f44';
            this.miccolor = '#fff';
            this.borderheight =
                this.randomize(0.7, 1);
            if (this.resultspage) {
                this.borderheight =
                    this.randomize(0.35, 0.5);
            }
            if (!recheck) {
                this.resettimer(true);
            }
        }
        if (t === 20) {
            this.borderheight = 0;
        }
        if (t === 30) {
            this.subscription.unsubscribe();
            this.store.dispatch(new speechactions
            .SearchAction(false));
        }
    });
}

The randomize() method provides a random number between min and max value.

To put on check and display status as message on things like whether microphone is working, or user has spoken something, or if the speech is being recorded, based on the time elapsed in calling of speech component and actual voice recording, the following portion of code is written in ngOnInit() method.

ngOnInit() {
    this.timer = Observable.timer(1500, 2000);
    this.subscription = this.timer.subscribe(t => {
        this.ticks = t;
        if (t === 1) {
            this.message = 'Listening...';
        }
        if (t === 4) {
            this.message = 'Please check your 
            microphone and volume levels.';
            this.miccolor = '#C2C2C2';
        }
        if (t === 6) {
            this.subscription.unsubscribe();
            this.store.dispatch(new speechactions
                .SearchAction(false));
        }
    });
}

The logic can be understood as if the elapsed time is 1 sec, it means it is listening to the speaker’s voice. And if the elapsed time is 4 sec, it means there is something wrong and user will be asked to check for the microphone and volume levels. At last if it tends to 6 seconds, then the Speech Component will be called off with the dispatched Action as false which is defined above (That means it is no longer in use).

Embed Speech Component in main App Component

Now comes the last part to use the created featured component in the required place. Code below describes embedding Speech Component in App Component.

Import SpeechService and required modules.

import {
    SpeechService
} from './services/speech.service';
import { Observable } from 'rxjs/Observable';

hidespeech will be used to store the current status of Speech Component (whether its in use or not), and completeQuery$ and searchData store the voice recorded in form Observable and String. completeQuery$ is optional (If the Speech Component is unable to track voice of speaker by any means, then it will not contain any value and hence searchData will be empty).

hidespeech: Observable<any>;
completeQuery$: Observable<any>;
searchData: String;

Creating speech parameter in constructor and store the current status of speech and store it into hidespeech. Based on the subscribed value of hidespeech, speech service’s stoprecord() will be called (To stop recording when the speech recognition completes). After recording stops, store the whole query in completeQuery$.

constructor (
    private speech: SpeechService
) {
    this.hidespeech = store.select(
        fromRoot.getspeechStatus);
    this.hidespeech.subscribe(hidespeech => {
        if (!hidespeech) {
            this.speech.stoprecord();
        }
    });
    this.completeQuery$ = store.select(
        fromRoot.getQuery);
    this.completeQuery$.subscribe(data => {
        this.searchData = data;
    });
}

Add the Speech Component in app.component.html. Now the main logic of calling Speech Component will be based on the subscribed observable value of hidespeech (If false then call Speech Component else not).

<app-speech *ngIf="hidespeech|async"></app-speech>

Using Speech Component in Home and FeedHeader Component

Import Speech Service and speech Action created above, and create hidespeech to store the current status of Speech Component.

import * as speechactions from '../../actions/speech';
import {
    SpeechService
} from '../../services/speech.service';
hidespeech: Observable<boolean>;

Create speech parameter of type SpeechService and store the current status of Speech Component in hidespeech. Dispatch speechactions.SearchAction (payload as true) for inferring that the Speech Component is currently in use.

constructor(
    private speech: SpeechService
) {
    this.hidespeech = store
        .select(fromRoot.getspeechStatus);
}
speechRecognition() {
    this.store.dispatch(
        new speechactions.SearchAction(true));
}

How to use the Speech Component?

Goto Loklak and click on Voice Input Icon. It will popup a screen as below.

Now, speak something to search. E.g. Google, the screen will turn into something like below with the spelled value displayed on screen.

If something goes wrong (Microphone did not work, low volume levels or unrecognisable voice), then screen will show something like:

On successful recognition of speech, the query will be set and the results will be shown as

Similar process is being followed on results page to make a search query using voice.

Resources

Angular docs. Angular.io: Tutorial
Rishabh (2017). Rishabh.io: Create Angular 4 Components

Adding ‘Voice Search’ Feature in Loklak Search

Post author:simsausaurabh
Post published:June 5, 2018
Post category:FOSSASIA GSoC loklak
Post comments:0 Comments

It is beneficial to have a voice search feature embedded in a search engine like loklak.org based on PWA (Progressive Web Application) architecture. This will allow users to make query using voice on phone or computer. For integrating voice search, JavaScript Web Speech API (also known as webkitSpeechRecognition) is used which allows us to add speech recognition feature in any website or a web application.

Integration Process

For using webkitSpeechRecognition, a separate typescript service is being created along with its relevant unit test

speech.service.ts
speech.service.spec.ts

The main idea to implement the voice search was

To create an injectable speech service.
Write methods for starting and stopping voice record in it.
Import and inject the created speech service into root app module.
Use the created speech service into required HomeComponent and FeedHeaderComponent.

Structure of Speech Service

The speech service mainly consists of two methods

record() method which accepts lang as string input which specifies the language code for speech recording and returns a string Observable.

   record(lang: string): Observable<string> {
           return Observable.create(observe => {
                 const webkitSpeechRecognition }: IWindow = 
                        <IWindow>window;
                 this.recognition = new webkitSpeechRecognition();
                 this.recognition.continuous = true;
                 this.recognition.interimResults = true;
                 this.recognition.onresult = take => 
                 this.zone.run(()                            
                 observe.next(take
                 .results.item
                 (take.results.
                 length - 1).
                 item(0).
                 transcript);
                 this.recognition
                 .onerror = 
                 err => observe
                 .error(err);
                 this.recognition.onend 
                 = () => observe
                 .complete();
                 this.recognition.lang = 
                 lang;
                 this.recognition.start( 
             );
        });
     }

Using observe.complete() allows speech recognition action to stop when the user stops speaking.

stoprecord() method which is used to stop the current instance of recording.

 stoprecord() {
       if (this.recognition) {
           this.recognition.stop();
       }
   }

stop() method is used to stop the current instance of speech recognition.

Using Speech Service in required Component

In Loklak Search, the speech service have been included in two main components i.e. HomeComponent and FeedHeaderComponent. The basic idea of using the created speech service in these components is same.

In the TypeScript (not spec.ts) file of the two components, firstly import the SpeechService and create its object in constructor.

constructor( private speech: SpeechService ) { }

Secondly, define a speechRecognition() method and use the created instance or object of SpeechService in it to record speech and set the query as the recorded speech input. Here, default language has been set up as ‘en_US’ i.e. English.

 stoprecord() {
       if (this.recognition) {
           this.recognition.stop();
       }
   }

After user clicks on speech icon, this method will be called and the recorded speech will be set as the query and there will be router navigation to the /search page where the result will be displayed.

Resources

George Ornbo (2014). Shapeshed: The HTML5 Speech Recognition API
Glen Shires, Hans Wennborg (2012). W3C: Web Speech API Specification

SUSI.AI Chrome Bot and Web Speech: Integrating Speech Synthesis and Recognition

Post author:zamhaq
Post published:October 23, 2017
Post category:CodeHeat FOSSASIA SUSI.AI
Post comments:0 Comments

Susi Chrome Bot is a Chrome extension which is used to communicate with Susi AI. The advantage of having chrome extensions is that they are very accessible for the user to perform certain tasks which sometimes needs the user to move to another tab/site.

In this blog post, we will be going through the process of integrating the web speech API to SUSI Chromebot.

Web Speech API

Web Speech API enables web apps to be able to use voice data. The Web Speech API has two components:

Speech Recognition: Speech recognition gives web apps the ability to recognize voice data from an audio source. Speech recognition provides the speech-to-text service.

Speech Synthesis: Speech synthesis provides the text-to-speech services for the web apps.

Integrating speech synthesis and speech recognition in SUSI Chromebot

Chrome provides the webkitSpeechRecognition() interface which we will use for our speech recognition tasks.

var recognition = new webkitSpeechRecognition();

Now, we have a speech recognition instance recognition. Let us define necessary checks for error detection and resetting the recognizer.

var recognizing;

function reset() {
recognizing = false;
}

recognition.onerror = function(e){
console.log(e.error);
};

recognition.onend = function(){
reset();
};

We now define the toggleStartStop() function that will check if recognition is already being performed in which case it will stop recognition and reset the recognizer, otherwise, it will start recognition.

function toggleStartStop() {
    if (recognizing) {
      recognition.stop();
      reset();
    } else {
      recognition.start();
      recognizing = true;
    }
}

We can then attach an event listener to a mic button which calls the toggleStartStop() function to start or stop our speech recognition.

mic.addEventListener("click", function () {
    toggleStartStop();
});

Finally, when the speech recognizer has some results it calls the onresult event handler. We’ll use this event handler to catch the results returned.

recognition.onresult = function (event) {
    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        textarea.value = event.results[i][0].transcript;
        submitForm();
      }
    }
};

The above code snipped tests for the results produced by the speech recognizer and if it’s the final result then it sets textarea value with the result of speech recognition and then we submit that to the backend.

One problem that we might face is the extension not being able to access the microphone. This can be resolved by asking for microphone access from an external tab/window/iframe. For SUSI Chromebot this is being done using an external tab. Pressing on the settings icon makes a new tab which then asks for microphone access from the user. This needs to be done only once, so that does not cause a lot of trouble.

setting.addEventListener("click", function () {
chrome.tabs.create({
url: chrome.runtime.getURL("options.html")
});
});navigator.webkitGetUserMedia({
audio: true
}, function(stream) {
stream.stop();
}, function () {
console.log('no access');
});

In contrast to speech recognition, speech synthesis is very easy to implement.

function speakOutput(msg){
    var voiceMsg = new SpeechSynthesisUtterance(msg);
    window.speechSynthesis.speak(voiceMsg);
}

This function takes a message as input, declares a new SpeechSynthesisUtterance instance and then calls the speak method to convert the text message to voice.

There are many properties and attributes that come with this speech recognition and synthesis interface. This blog post only introduces the very basics.