Adding Voice Recognition in Description Dialog Box in Phimpme project

In this blog, I will explain how I added Voice Recognition in a dialog box to describe an image in Phimpme Android application. In Phimpme Android application we have an option to add a description for the image. Sometimes the description can be long. Adding Voice Recognition text to speech will ease the user’s experience to add a description for the image.

Adding appropriate Dialog Box

In order to take input from the user to prompt the Voice Recognition function, I have added an image button in the description dialog box. Since the description dialog box will only contain an EditText and a button will have used material design to make it look better and add caption on top of it.

 

<LinearLayout
   android:id="@+id/rl_description_dialog"
   android:layout_width="match_parent"
   android:layout_height="wrap_content"
   android:orientation="horizontal"
   android:padding="15dp">
   <EditText
       android:id="@+id/description_edittxt"
       android:layout_weight="1"
       android:layout_width="fill_parent"
       android:layout_height="wrap_content"
       android:padding="15dp"
       android:hint="@string/description_hint"
       android:textColorHint="@color/grey"
       android:layout_margin="20dp"
       android:gravity="top|left"
       android:inputType="text" />
   <ImageButton
       android:layout_width="@dimen/mic_image"
       android:layout_height="@dimen/mic_image"
       android:layout_alignRight="@+id/description_edittxt"
       app2:srcCompat="@drawable/ic_mic_black"
       android:layout_gravity="center"
       android:background="@color/transparent"
       android:paddingEnd="10dp"
       android:paddingTop="12dp"
       android:id="@+id/voice_input"/>
</LinearLayout>

Function to prompt dialog box

We have added a function to prompt the dialog box from anywhere in the application. getDescriptionDialog() function is used to prompt the description dialog box. getDescriptionDialog() returns EditText which can be further be used to manipulate the text in the EditText. Please follow the following steps to inflate description dialog box in the activity.

First,

In the getDescriptionDialog() function we will inflate the layout by using getLayoutInflater function. We will pass the layout id as an argument in the function.

public EditText getDescriptionDialog(final ThemedActivity activity, AlertDialog.Builder descriptionDialog){
final View DescriptiondDialogLayout = activity.getLayoutInflater().inflate(R.layout.dialog_description, null);

Second,

Get the TextView in the description dialog box.

final TextView DescriptionDialogTitle = (TextView) DescriptiondDialogLayout.findViewById(R.id.description_dialog_title);

Third,

Present the dialog using cardview to make use of the material design. Then take an instance of the EditText. This EditText can be further used to input text from the user either by text or Voice Recognition.

final CardView DescriptionDialogCard = (CardView) DescriptiondDialogLayout.findViewById(R.id.description_dialog_card);
EditText editxtDescription = (EditText) DescriptiondDialogLayout.findViewById(R.id.description_edittxt);

Fourth,

Set onClickListener when the user clicks the mic image icon. This onClicklistener will prompt the voice Recognition in the activity. We need to specify the language for the speech to text input. In the case of Phimpme its English so “en-US”. We have set the maximum results to 15.  

ImageButton VoiceRecognition = (ImageButton) DescriptiondDialogLayout.findViewById(R.id.voice_input);
VoiceRecognition.setOnClickListener(new View.OnClickListener() {
   @Override
   public void onClick(View v) {
       // This are the intents needed to start the Voice recognizer
       Intent i = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
       i.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, "en-US");
       i.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 15); // number of maximum results..
       i.putExtra(RecognizerIntent.EXTRA_PROMPT, R.string.caption_speak);
       startActivityForResult(i, REQ_CODE_SPEECH_INPUT);

   }
});

Putting Text in the EditText

After Voice Recognition prompt ends the onActivityResult function checks to see if the data is received or not.

if (requestCode == REQ_CODE_SPEECH_INPUT && data!=null) {

We get the spoken text from intent data.getString() and store it in ArrayList. To store the received data in a string we need to get the first string from the ArrayList.

ArrayList<String> result = data
       .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
voiceInput = result.get(0);

Setting the received data in the the EditText

editTextDescription.setText(voiceInput);

Conclusion

Using Voice recognition is a quick and simple way to add a long description on the image. It’s Speech to Text feature works without many mistakes and is useful in our Phimpme Android application.

Github

https://github.com/fossasia/phimpme-android

Resources

Tutorial for speech to Text: https://www.androidhive.info/2014/07/android-speech-to-text-tutorial/

To add description dialog box: https://developer.android.com/guide/topics/ui/dialogs.html

Implementation of Text-To-Speech Feature In Susper

Susper has been given a voice search feature through which it provides the user a better experience of search. We introduced to enhance the speech recognition by adding Speech Synthesis or Text-To-Speech feature. The speech synthesis feature should only work when a voice search is attempted.

The idea was to create speech synthesis similar to market leader. Here is the link to YouTube video showing the demo of the feature: Video link

In the video, it will show demo :

  • If a manual search is used then the feature should not work.
  • If voice search is used then the feature should work.

For implementing this feature, we used Speech Synthesis API which is provided with Google Chrome browser 33 and above versions.

window.speechSynthesis.speak(‘Hello world!’); can be used to check whether the browser supports this feature or not.

First, we created an interface:

interface IWindow extends Window {
  SpeechSynthesisUtterance: any;
  speechSynthesis: any;
};

 

Then under @Injectable we created a class for the SpeechSynthesisService.

export class SpeechSynthesisService {
  utterence: any;  constructor(private zone: NgZone) { }  speak(text: string): void {
  const { SpeechSynthesisUtterance }: IWindow = <IWindow>window;
  const { speechSynthesis }: IWindow = <IWindow>window;  this.utterence = new SpeechSynthesisUtterance();
  this.utterence.text = text; // utters text
  this.utterence.lang = ‘en-US’; // default language
  this.utterence.volume = 1; // it can be set between 0 and 1
  this.utterence.rate = 1; // it can be set between 0 and 1
  this.utterence.pitch = 1; // it can be set between 0 and 1  (window as any).speechSynthesis.speak(this.utterence);
}// to pause the queue of utterence
pause(): void {
  const { speechSynthesis }: IWindow = <IWindow>window;
const { SpeechSynthesisUtterance }: IWindow = <IWindow>window;
  this.utterence = new SpeechSynthesisUtterance();
  (window as any).speechSynthesis.pause();
 }
}

 

The above code will implement the feature Text-To-Speech.

The source code for the implementation can be found here: https://github.com/fossasia/susper.com/blob/master/src/app/speech-synthesis.service.ts

We call speech synthesis only when voice search mode is activated. Here we used redux to check whether the mode is ‘speech’ or not. When the mode is ‘speech’ then it should utter the description inside the infobox.

We did the following changes in infobox.component.ts:

import { SpeechSynthesisService } from ../speechsynthesis.service;

speechMode: any;

constructor(private synthesis: SpeechSynthesisService) { }

this.query$ = store.select(fromRoot.getwholequery);
this.query$.subscribe(query => {
  this.keyword = query.query;
  this.speechMode = query.mode;
});

 

And we added a conditional statement to check whether mode is ‘speech’ or not.

// conditional statement
// to check if mode is ‘speech’ or not
if (this.speechMode === speech) {
  this.startSpeaking(this.results[0].description);
}
startSpeaking(description) {
  this.synthesis.speak(description);
  this.synthesis.pause();
}