Full-Screen Speech Input Implementation in SUSI Android App

SUSI Android has some very good features and one of them is, it can take input in speech format from user i.e if the user says anything then it can detect it and convert it to text. This feature is implemented in SUSI Android app using Android’s built-in speech-to-text functionality. You can implement Android’s built-in speech-to-text functionality using either only RecognizerIntent class or SpeechRecognizerIntent class, RecognitionListner interface and RecognizerIntent class. Using former method has some disadvantages: During speech input, it shows a dialog box (as shown here) and it breaks the connection between user and app. We can’t show partial result i.e text to the user but using the later method we can show it. We used  SpeechRecognizerIntent class, RecognitionListner interface and RecognizerIntent class to implement Android’s built-in speech-to-text functionality in SUSI Android and you know the reason for that. In this blog post, I will show you how I implemented this feature in SUSI Android with new UI. Layout design You can give speech input to SUSI Android either by clicking mic button or using ‘Hi SUSI’ hotword. When you click on mic button or use ‘Hi SUSI’ hotword, you can see a screen where you will give speech input. Two important part of this layout are: <TextView   android:id="@+id/txtchat"   android:layout_width="wrap_content"   android:layout_height="wrap_content"   …  /> TextView: It used to show the partial result of speech input i.e it will show converted text (partial) of your speech. <org.fossasia.susi.ai.speechinputanimation.SpeechProgressView   android:id="@+id/speechprogress"   android:layout_width="match_parent"   android:layout_height="50dp"   android:layout_margin="8dp"   android:layout_gravity="center"/> SpeechProgressView: It is a custom view which use to show the animation when the user gives speech input. When the user starts speaking, the animation starts. This custom view contains five bars and these five bars animate according to user input. Full-screen speech input implementation When the user clicks on mic button or uses ‘Hi SUSI’ hotword, a screen comes where the user can give speech input. As already mentioned I used SpeechRecognizerIntent class, RecognitionListner interface and RecognizerIntent class to implement speech-to-text functionality in SUSI Android. RecognizerIntent class starts an intent and asks for speech input val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,     RecognizerIntent.LANGUAGE_MODEL_FREE_FORM) intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, "com.domain.app") intent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true) and send it through the speech recognizer. It does it through ACTION_RECOGNIZE_SPEECH. SpeechRecognizer class provides access to the speech recognition service. This service allows access to the speech recognizer and recognition related event occurs RecognitionListner receive notification from SpeechRecognizer class. recognizer = SpeechRecognizer       .createSpeechRecognizer(activity.applicationContext) val listener = object : RecognitionListener {  //implement all override methods } When the user starts speaking, the height of bars changes according to change in sound level. When sound level changes, onRmsChanged method get called where we are calling onRmsChanged method of SpeechProgressView class which is responsible for animating bars according to change in sound level. override fun onRmsChanged(rmsdB: Float) {   if (speechprogress != null)       speechprogress.onRmsChanged(rmsdB) } When user finished speaking onEndOfSpeech method get called where we call onEndOfSpeech method of SpeechProgressView class which is responsible for rotating animation. Rotation is used to show that SUSI Android has finished listening and now it is processing your input. override…

Continue ReadingFull-Screen Speech Input Implementation in SUSI Android App

Implementing Speech to Text for Chrome in SUSI Web Chat

SUSI Web Chat now replies to voice inputs. To achieve this, I made use of the Web Speech API. The voice input saves one from the pain of typing and it’s a much needed feature for the Web Chat and to maintain the similarity with the other SUSI Android and SUSI iOS clients. To test the feature out in SUSI Web Chat, click on the microphone icon beside the text area on chat.susi.ai. Say the message once the dialog appears, and you will see the message being sent to the Chat List rendered in text. Let’s achieve the same result following the steps below. First, initialize the class Voice Recognition with defaults for the Speech Recognition, for that we create a file VoiceRecognition.js We first initialize the Speech Recognition API with the window object. We warn the User with a console message if there is no Speech Recognition API available. If it's available call the recognition function using the following line this.recognition = this.createRecognition(SpeechRecognition) // Initialise the Speech recognition API const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition || window.mozSpeechRecognition || window.msSpeechRecognition || window.oSpeechRecognition // Warn the user if not available otherwise call the createRecognition function if (SpeechRecognition != null) { this.recognition = this.createRecognition(SpeechRecognition) } else { console.warn('The current browser does not support the SpeechRecognition API.'); } } Then we write the createRecognition function We set our defaults first as “continuous  - true, interimResults - false, and language - ‘en-US’ ” We pass these options to the recognition object that we created in the above step and finally return the recognition object. createRecognition = (SpeechRecognition) => { const defaults = { continuous: true, interimResults: false, lang: 'en-US' } const options = Object.assign({}, defaults, this.props) let recognition = new SpeechRecognition() recognition.continuous = options.continuous recognition.interimResults = options.interimResults recognition.lang = options.lang return recognition } Initialize all the helper functions to be passed as props. start - This method starts the recognition and invokes the Mic of the browser. It also checks if the browser has the access to the user’s Mic. stop - Stop method closes the Mic and returns the audio captured so far. abort - Abort method stops the SpeechRecognition service. onspeechend - This method is called if there is any inactivity and there is no voice input. Hence, stops the recognition service. componentWillReceiveProps - This method waits for the stop method and calls it when it has received the stop object. componentWIllUnmount - This method is invoked just before the component is about to unmount and therefore its function is to abort the Speech Recognition Service render -  We return null as there is nothing to return in this component and all the converted text of the captured Speech will be sent to the parent element. start = () => { this.recognition.start() } stop = () => { this.recognition.stop() } abort = () => { this.recognition.abort() } onspeechend = () => { console.log('no sound detected'); this.recognition.stop() } componentWillReceiveProps ({ stop }) { if (stop) { this.stop() } } componentWillUnmount () { this.abort() } render () { return null…

Continue ReadingImplementing Speech to Text for Chrome in SUSI Web Chat