speech-to-text – blog.fossasia.org

Adding EditText With Google Input Option While Sharing In Phimpme App

Post author:mohitmanuja
Post published:August 20, 2017
Post category:Android FOSSASIA GSoC
Post comments:0 Comments

In Phimpme Android App an user can share images on multiple platforms. While sharing we have also included caption option to enter description about the image. That caption can be entered by using keyboard as well Google Voice Input method. So in this post, I will be explaining how to add edittext with google voice input option.

Let’s get started

Step-1 Add EditText and Mic Button in layout file

<ImageView
       android:layout_width="20dp"
       android:id="@+id/button_mic"
       android:layout_height="20dp"
       android:background="?android:attr/selectableItemBackground"
       android:background="@drawable/ic_mic_black"
       android:scaleType="fitCenter" />
</RelativeLayout>

Caption Option in Share Activity in Phimpme

In Phimpme we have material design dialog box so right now I am using getTextInputDialogBox(). It will prompt the material design dialog box to enter caption to share image on multiple platform.

Step-2

Now we can get caption from edittext easily using the following code.

if (!captionText.isEmpty()) {
  caption = captionText;
  text_caption.setText(caption);
  captionEditText.setSelection(caption.length());
} else {
  caption = null;
  text_caption.setText(caption);
}

Step – 3 (Now add option Google Voice input option)

To use google input option I have added a global function in Utils class. To use that method just call that method with proper arguments.

public static void promptSpeechInput(Activity activity, int requestCode, View parentView, String promtMsg) {
  Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
  intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
  intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
  intent.putExtra(RecognizerIntent.EXTRA_PROMPT, promtMsg);
  try {
      activity.startActivityForResult(intent, requestCode);
  } catch (ActivityNotFoundException a) {
      SnackBarHandler.show(parentView,activity.getString(R.string.speech_not_supported));
  }

}

Just pass the request code to receive the speech text in onActvityResult() method and passs promt message which will be visible to user.

Step – 4 (Set text to caption )

if (requestCode == REQ_CODE_SPEECH_INPUT && data!=null){
       ArrayList<String> result = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
       String voiceInput = result.get(0);
       text_caption.setText(voiceInput);
       caption = voiceInput;
       return;

}

Now we can set the text in caption string right now I am adding text in existing caption i.e If the user enter some text using edittext and after that user clicked on mic button then the extra text will be added after the previous text.

So this is how I have used Google voice input method and made the function globally.

Resources:

Stack overflow example : https://stackoverflow.com/questions/18049157/how-to-programmatically-initiate-a-google-now-voice-search
Another post on Google Voice input : https://www.androidhive.info/2014/07/android-speech-to-text-tutorial/

Adding Voice Recognition in Description Dialog Box in Phimpme project

Post author:Subhankar29
Post published:August 12, 2017
Post category:Android API FOSSASIA GSoC Phimpme Tutorial
Post comments:0 Comments

In this blog, I will explain how I added Voice Recognition in a dialog box to describe an image in Phimpme Android application. In Phimpme Android application we have an option to add a description for the image. Sometimes the description can be long. Adding Voice Recognition text to speech will ease the user’s experience to add a description for the image.

Adding appropriate Dialog Box

In order to take input from the user to prompt the Voice Recognition function, I have added an image button in the description dialog box. Since the description dialog box will only contain an EditText and a button will have used material design to make it look better and add caption on top of it.

<LinearLayout
   android:id="@+id/rl_description_dialog"
   android:layout_width="match_parent"
   android:layout_height="wrap_content"
   android:orientation="horizontal"
   android:padding="15dp">
   <EditText
       android:id="@+id/description_edittxt"
       android:layout_weight="1"
       android:layout_width="fill_parent"
       android:layout_height="wrap_content"
       android:padding="15dp"
       android:hint="@string/description_hint"
       android:textColorHint="@color/grey"
       android:layout_margin="20dp"
       android:gravity="top|left"
       android:inputType="text" />
   <ImageButton
       android:layout_width="@dimen/mic_image"
       android:layout_height="@dimen/mic_image"
       android:layout_alignRight="@+id/description_edittxt"
       app2:srcCompat="@drawable/ic_mic_black"
       android:layout_gravity="center"
       android:background="@color/transparent"
       android:paddingEnd="10dp"
       android:paddingTop="12dp"
       android:id="@+id/voice_input"/>
</LinearLayout>

Function to prompt dialog box

We have added a function to prompt the dialog box from anywhere in the application. getDescriptionDialog() function is used to prompt the description dialog box. getDescriptionDialog() returns EditText which can be further be used to manipulate the text in the EditText. Please follow the following steps to inflate description dialog box in the activity.

First,

In the getDescriptionDialog() function we will inflate the layout by using getLayoutInflater function. We will pass the layout id as an argument in the function.

public EditText getDescriptionDialog(final ThemedActivity activity, AlertDialog.Builder descriptionDialog){
final View DescriptiondDialogLayout = activity.getLayoutInflater().inflate(R.layout.dialog_description, null);

Second,

Get the TextView in the description dialog box.

final TextView DescriptionDialogTitle = (TextView) DescriptiondDialogLayout.findViewById(R.id.description_dialog_title);

Third,

Present the dialog using cardview to make use of the material design. Then take an instance of the EditText. This EditText can be further used to input text from the user either by text or Voice Recognition.

final CardView DescriptionDialogCard = (CardView) DescriptiondDialogLayout.findViewById(R.id.description_dialog_card);
EditText editxtDescription = (EditText) DescriptiondDialogLayout.findViewById(R.id.description_edittxt);

Fourth,

Set onClickListener when the user clicks the mic image icon. This onClicklistener will prompt the voice Recognition in the activity. We need to specify the language for the speech to text input. In the case of Phimpme its English so “en-US”. We have set the maximum results to 15.

ImageButton VoiceRecognition = (ImageButton) DescriptiondDialogLayout.findViewById(R.id.voice_input);
VoiceRecognition.setOnClickListener(new View.OnClickListener() {
   @Override
   public void onClick(View v) {
       // This are the intents needed to start the Voice recognizer
       Intent i = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
       i.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, "en-US");
       i.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 15); // number of maximum results..
       i.putExtra(RecognizerIntent.EXTRA_PROMPT, R.string.caption_speak);
       startActivityForResult(i, REQ_CODE_SPEECH_INPUT);

   }
});

Putting Text in the EditText

After Voice Recognition prompt ends the onActivityResult function checks to see if the data is received or not.

if (requestCode == REQ_CODE_SPEECH_INPUT && data!=null) {

We get the spoken text from intent data.getString() and store it in ArrayList. To store the received data in a string we need to get the first string from the ArrayList.

ArrayList<String> result = data
       .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
voiceInput = result.get(0);

Setting the received data in the the EditText

editTextDescription.setText(voiceInput);

Conclusion

Using Voice recognition is a quick and simple way to add a long description on the image. It’s Speech to Text feature works without many mistakes and is useful in our Phimpme Android application.

Github

https://github.com/fossasia/phimpme-android

Resources

Tutorial for speech to Text: https://www.androidhive.info/2014/07/android-speech-to-text-tutorial/

To add description dialog box: https://developer.android.com/guide/topics/ui/dialogs.html

Implementing Speech to Text for Chrome in SUSI Web Chat

Post author:rishiraj824
Post published:July 23, 2017
Post category:FOSSASIA GSoC Open Event SUSI.AI
Post comments:0 Comments

SUSI Web Chat now replies to voice inputs. To achieve this, I made use of the Web Speech API. The voice input saves one from the pain of typing and it’s a much needed feature for the Web Chat and to maintain the similarity with the other SUSI Android and SUSI iOS clients.

To test the feature out in SUSI Web Chat, click on the microphone icon beside the text area on chat.susi.ai.

Say the message once the dialog appears, and you will see the message being sent to the Chat List rendered in text.

Let’s achieve the same result following the steps below.

First, initialize the class Voice Recognition with defaults for the Speech Recognition, for that we create a file VoiceRecognition.js

We first initialize the Speech Recognition API with the window object.
We warn the User with a console message if there is no Speech Recognition API available.
If it’s available call the recognition function using the following line

this.recognition = this.createRecognition(SpeechRecognition)

// Initialise the Speech recognition API
const SpeechRecognition = window.SpeechRecognition
      || window.webkitSpeechRecognition
      || window.mozSpeechRecognition
      || window.msSpeechRecognition
      || window.oSpeechRecognition
    // Warn the user if not available otherwise call the createRecognition function
    if (SpeechRecognition != null) {
      this.recognition = this.createRecognition(SpeechRecognition)
    } else {
      console.warn('The current browser does not support the SpeechRecognition API.');
    }
  }

Then we write the createRecognition function

We set our defaults first as “continuous – true, interimResults – false, and language – ‘en-US’ ”
We pass these options to the recognition object that we created in the above step and finally return the recognition object.

createRecognition = (SpeechRecognition) => {
    const defaults = {
      continuous: true,
      interimResults: false,
      lang: 'en-US'
    }

    const options = Object.assign({}, defaults, this.props)

    let recognition = new SpeechRecognition()

    recognition.continuous = options.continuous
    recognition.interimResults = options.interimResults
    recognition.lang = options.lang

    return recognition
  }

Initialize all the helper functions to be passed as props.

start – This method starts the recognition and invokes the Mic of the browser. It also checks if the browser has the access to the user’s Mic.
stop – Stop method closes the Mic and returns the audio captured so far.
abort – Abort method stops the SpeechRecognition service.
onspeechend – This method is called if there is any inactivity and there is no voice input. Hence, stops the recognition service.
componentWillReceiveProps – This method waits for the stop method and calls it when it has received the stop object.
componentWIllUnmount – This method is invoked just before the component is about to unmount and therefore its function is to abort the Speech Recognition Service
render – We return null as there is nothing to return in this component and all the converted text of the captured Speech will be sent to the parent element.

start = () => {
    this.recognition.start()
  }

  stop = () => {
    this.recognition.stop()
  }

  abort = () => {
    this.recognition.abort()
  }
  onspeechend = () => {
    console.log('no sound detected');
    this.recognition.stop()
  }

  componentWillReceiveProps ({ stop }) {
    if (stop) {
      this.stop()
    }
  }

  componentWillUnmount () {
    this.abort()
  }

  render () {
    return null
  }

Add event listeners to start and stop functions inside componentDidMount() to ensure every action that we want to perform from the parent element is after the component has successfully mounted itself.

start – The start method is set with an action start so that we can pass the required action name to the VoiceRecognition component that we created
end – The end method similarly is set with an action end
After setting up the actions we finally call the bindResult function with the result that we received.

componentDidMount () {
    const events = [
      { name: 'start', action: this.props.onStart },
      { name: 'end', action: this.props.onEnd },
      { name: 'onspeechend', action: this.props.onspeechend }
    ]

    events.forEach(event => {
      this.recognition.addEventListener(event.name, event.action)
    })

    this.recognition.addEventListener('result', this.bindResult)

    this.start()
  }

Bind the result and send it as the props to the parent element.
Combine all interim results of the recognition and send it to the onResult function as finalTranscript
The function bindResult – The function bindResult does all the binding of the interim results that we received and output a final result as finalTranscript.
Lastly, we add the prop validations to ensure the correct props are being passed to our VoiceRecognition component.

// bindResult function
 bindResult = (event) => {
    let interimTranscript = ''
    let finalTranscript = ''
   // Bind all the results to finalTranscript
    for (let i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        finalTranscript += event.results[i][0].transcript
      } else {
        interimTranscript += event.results[i][0].transcript
      }
    }

    this.props.onResult({ interimTranscript, finalTranscript })
  }
// Add Prop Validations
VoiceRecognition.propTypes = {
  onStart: PropTypes.func,
  onEnd : PropTypes.func,
  onResult: PropTypes.func,
  Onspeechend: PropTypes.func,
  continuous: PropTypes.bool,
  lang: PropTypes.string,
  stop: PropTypes.bool
};
// Finally export the VoiceRecognition Component
export default VoiceRecognition

Lastly, call the VoiceRecogntion component and pass the props from the MessageComposer Section to it in the following way.

Initialize the default state in the constructor inside this.state

this.state = {
      text: '',
      start: false, // Starting the VoiceRecognition
      stop: false, // Stop the VoiceRecognition
      open: false, // Maintain the modal state
      result:'' // Maintain the result state
    };

onStart function to call the VoiceRecognition component only when the Mic Button is pressed.
onEnd to end the Speech Recognition service.
onResult to send the message through the Actions.createMessage() function

onResult = ({interimTranscript,finalTranscript }) => {
    let result = interimTranscript;
    let voiceResponse = false;
    this.setState({result:result});
    if(finalTranscript) {
      result = finalTranscript;
      this.setState({
      start: false,
      result:result,
      stop: false,
      open:false,
      animate:false
      });
      if(this.props.speechOutputAlways || this.props.speechOutput){
        voiceResponse = true;
      }
      Actions.createMessage(result, this.props.threadID, voiceResponse);
      setTimeout(()=>this.setState({result: ''}),400);
      this.Button = <Mic />
    }
  }

Fire the component based on the value of start variable and pass the requisite props as given below in the code.

// Only when the start is ‘true’ call the VoiceRecognition component

    {this.state.start && (
          <VoiceRecognition
            onStart={this.onStart}
            onEnd={this.onEnd}
            onResult={this.onResult}
            continuous={true}
            lang="en-US"
            stop={this.state.stop}
          />
        )}

Update the text in the “Speak Now” Dialog to show the user the Speech to Text conversion

Update the text in the Modal when it is converted from Speech to Text, i.e. when we set the state of the result variable.

{this.state.result !=='' ? this.state.result :
          'Speak Now...'}

To get access to the full code, go to the repository https://github.com/fossasia/chat.susi.ai

Resources

Mozilla Web Speech Recognition API
Web Speech API Concepts
For the component of the Dialog – www.material-ui.com/#/components/dialog
Tutorial from Google – https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API
HTML5/JS Speech Recognition – https://shapeshed.com/html5-speech-recognition-api/
Follow up tutorial – https://www.labnol.org/software/add-speech-recognition-to-website/19989/