Being an AI for conversational bots, Hotword detection of SUSI is the top priority to the community. Another requirement was that there should be an option for an offline hotword detection. So, I was searching for an API that has all these capabilities. Sphinx by CMU was the obvious choice. It provides robust mechanism for hotword detection.
What is CMUsphinx?
CMUsphinx is open source and leading speech recognition toolkit. CMUsphinx has different modules for different tasks it needs to perform. Our requirement for SUSI is, that is needs to be lightweight, So we are using Pocketsphinx. Before going into integration let us discuss about basics of speech recognition.
Let us dive into coding and integrating Susi with pocketsphinx.
Building Pocketsphinx .AAR file
Git clone the sphinxbase, pocketsphinx and pocketsphinx-android and put them in the same folder. By following commands below.
git clone http://github.com/cmupshinx/sphinxbase git clone http://github.com/cmupshinx/pocketsphinx git clone http://github.com/cmupshinx/pocketsphinx-android
Then import pocketsphinx Android into Android studio. Run the project. .aar files pocketsphinx-android-5prealpha-debug.aar & pocketsphinx-android-5prealpha-release.aar will be created in the build/outputs/aar.
Integrating Susi with Pocketsphinx
In Android Studio you need to the above generated AAR into your project. Just go to File > New > New module and choose Import .JAR/.AAR Package. After this, We need to change permissions of project. Add the following permissions in AndroidManifest.xml.
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" /> <uses-permission android:name="android.permission.RECORD_AUDIO" />
Import the following functions into your main activity.
import edu.cmu.pocketsphinx.Assets; import edu.cmu.pocketsphinx.Hypothesis; import edu.cmu.pocketsphinx.RecognitionListener; import edu.cmu.pocketsphinx.SpeechRecognizer; import edu.cmu.pocketsphinx.SpeechRecognizerSetup;
Next we need to sync the assets we get from .aar file in to our project. Edit app/build.gradle build file to run assets.xml. We do it by adding following code to build.gradle.
ant.importBuild 'assets.xml' preBuild.dependsOn(list, checksum) clean.dependsOn(clean_assets)
Now all the import and sync errors of gradle must disappear and you should be good to go. You can start your recognizer by adding this code to your activity.
recognizer = defaultSetup() .setAcousticModel(new File(assetsDir, "en-us-ptm")) .setDictionary(new File(assetsDir, "cmudict-en-us.dict")) .getRecognizer(); recognizer.addListener(this);
Decoder model is lengthy process that contains many operations, so it’s recommended to run in inside async task. These are commands for decoder to run. These commands essentially do acoustic and language modelling of speech.
// Create keyword-activation search. recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE); // Create grammar-based searches. File menuGrammar = new File(assetsDir, "menu.gram"); recognizer.addGrammarSearch(MENU_SEARCH, menuGrammar); // Next search for digits File digitsGrammar = new File(assetsDir, "digits.gram"); recognizer.addGrammarSearch(DIGITS_SEARCH, digitsGrammar); // Create language model search. File languageModel = new File(assetsDir, "weather.dmp"); recognizer.addNgramSearch(FORECAST_SEARCH, languageModel);
Speech recognition will end at onEndOfSpeech callback of the recognizer listener. We can call recognizer.stop or recognizer.cancel(). Cancel will cancel the recognition, stop will cause the final result be passed you in onResult callback. During the recognition, you will get partial results in onPartialResult callback.
Now we have integrated Pocketsphinx with SUSI.AI in Android.