Using Speech To Text Engine in Susi Android

Susi is an intelligent chatbot, it supports speech to text as the input. The user can talk to the susi just like he or she is talking to some other person. Also in case of speech to text input, the output of susi is in the form of text to speech giving a seamless conversational experience to the user.

To achieve speech to text input in Susi Android or any other android application we have the following ways:-

  1. Using Android inbuilt Speech to Text function.
  2. Using Google Clouds Speech API.

We will talk about each of these.

Using Android inbuilt Speech to Text function.

Android provides an inbuilt method to convert speech into text, it is the most easy way to convert speech to text.

This method uses android.speech package and a specific class called android.speech.RecognizerIntent . Basically we trigger an intent (android.speech.RecognizerIntent) which shows dialog box to recognize speech input. This Activity then converts the speech into text and send backs the result to our calling Activity. When we invoke android.speech.RecognizerIntent intent, we must use startActivityForResult()as we must listen back for result text.

The code snippet for the following is

private void promptSpeechInput() {
        Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
                RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
        intent.putExtra(RecognizerIntent.EXTRA_PROMPT,
                getString(R.string.speech_prompt));
        try {
            startActivityForResult(intent, REQ_CODE_SPEECH_INPUT);
        } catch (ActivityNotFoundException a) {
            Toast.makeText(getApplicationContext(),
                    getString(R.string.speech_not_supported),
                    Toast.LENGTH_SHORT).show();
        }
    }

In the above code as we can see that we putting some extra information while passing the intent. This information is used by speech to text engine to determine the language of the user. Thus while invoking RecognizerIntent, we must provide extra RecognizerIntent.EXTRA_LANGUAGE_MODE. Here we are setting its value to en-US.

Since the recognizer is triggered we receive a callback onActivityResult(int requestCode, int resultCode, Intent data) which is an override method to handle the result. The RecognizerIntent will convert the speech input to text and send back the result as ArraList with key RecognizerIntent.EXTRA_RESULTS. Generally this list should be ordered in descending order of speech recognizer confidence. Only present when RESULT_OK is returned in an activity result. We just set the text that we got in result in text view txtText using txtText.setText()

The screenshot of the implementation is

Using Google Cloud Speech API

Google Cloud Api is enable the developers to convert speech to text in Real time . It is used in Google Allo and Google Assistant. It is backed by powerful neural network and machine learning algorithms which makes it very efficient and fast at the same time. The Api is capable of recognizing more than 80 languages. To find more detail about Google Cloud Speech Api, one can refer to the official documentation at this link.

The Google to text Api is not free and based on the usage of the Api. To use this Api, developer have to sign up at the google console to generate the Api key. On enabling the speech API a json will be created.

protected void onActivityResult(int requestCode, int resultCode, Intent data) {
        super.onActivityResult(requestCode, resultCode, data);

        switch (requestCode) {
            case REQ_CODE_SPEECH_INPUT: {
                if (resultCode == RESULT_OK && null != data) {
                    ArrayList<String> result = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
                    mVoiceInputTv.setText(result.get(0));
                }
                break;
            }

        }
    }

The whole implementation of the API can be found here.