Full-Screen Speech Input Implementation in SUSI Android App

Full-Screen Speech Input Implementation in SUSI Android App

SUSI Android has some very good features and one of them is, it can take input in speech format from user i.e if the user says anything then it can detect it and convert it to text. This feature is implemented in SUSI Android app using Android’s built-in speech-to-text functionality. You can implement Android’s built-in speech-to-text functionality using either only RecognizerIntent class or SpeechRecognizerIntent class, RecognitionListner interface and RecognizerIntent class. Using former method has some disadvantages:

  • During speech input, it shows a dialog box (as shown here) and it breaks the connection between user and app.
  • We can’t show partial result i.e text to the user but using the later method we can show it.

We used  SpeechRecognizerIntent class, RecognitionListner interface and RecognizerIntent class to implement Android’s built-in speech-to-text functionality in SUSI Android and you know the reason for that. In this blog post, I will show you how I implemented this feature in SUSI Android with new UI.

Layout design

You can give speech input to SUSI Android either by clicking mic button

or using ‘Hi SUSI’ hotword. When you click on mic button or use ‘Hi SUSI’ hotword, you can see a screen where you will give speech input.

Two important part of this layout are:

<TextView

  android:id=“@+id/txtchat”

  android:layout_width=“wrap_content”

  android:layout_height=“wrap_content”

  

 />

  • TextView: It used to show the partial result of speech input i.e it will show converted text (partial) of your speech.
<org.fossasia.susi.ai.speechinputanimation.SpeechProgressView

  android:id=“@+id/speechprogress”

  android:layout_width=“match_parent”

  android:layout_height=“50dp”

  android:layout_margin=“8dp”

  android:layout_gravity=“center”/>

  • SpeechProgressView: It is a custom view which use to show the animation when the user gives speech input. When the user starts speaking, the animation starts. This custom view contains five bars and these five bars animate according to user input.

Full-screen speech input implementation

When the user clicks on mic button or uses ‘Hi SUSI’ hotword, a screen comes where the user can give speech input. As already mentioned I used SpeechRecognizerIntent class, RecognitionListner interface and RecognizerIntent class to implement speech-to-text functionality in SUSI Android. RecognizerIntent class starts an intent and asks for speech input

val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)

intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,

    RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)

intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, “com.domain.app”)

intent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)

and send it through the speech recognizer. It does it through ACTION_RECOGNIZE_SPEECH. SpeechRecognizer class provides access to the speech recognition service. This service allows access to the speech recognizer and recognition related event occurs RecognitionListner receive notification from SpeechRecognizer class.

recognizer = SpeechRecognizer

      .createSpeechRecognizer(activity.applicationContext)

val listener = object : RecognitionListener {

 //implement all override methods

}

When the user starts speaking, the height of bars changes according to change in sound level. When sound level changes, onRmsChanged method get called where we are calling onRmsChanged method of SpeechProgressView class which is responsible for animating bars according to change in sound level.

override fun onRmsChanged(rmsdB: Float) {

  if (speechprogress != null)

      speechprogress.onRmsChanged(rmsdB)

}

When user finished speaking onEndOfSpeech method get called where we call onEndOfSpeech method of SpeechProgressView class which is responsible for rotating animation. Rotation is used to show that SUSI Android has finished listening and now it is processing your input.

override fun onEndOfSpeech() {

  if (speechprogress != null)

      speechprogress.onEndOfSpeech()

}

In case of any error, onError method get called and in case of successful speech input, onResults method get called. In both cases, we reset bars to their initial position and show chat activity user. The user can again give speech input either by clicking on mic or using ‘Hi SUSI’ hotword.

override fun onResults(results: Bundle) {

  if (speechprogress != null)

      speechprogress.onResultOrOnError()

  activity.supportFragmentManager.popBackStackImmediate()

}

Reference

Close Menu