Hotword Detection in SUSI Android App using Snowboy

Hotword Detection is as cool as it sounds. What exactly is hotword detection? Hotword detection is a feature in which a device gets activated when it listens to a specific word or a phrase. You must have said “OK Google” or “Hey Cortana” or “Siri” or “Alexa” at least once in your lifetime. These all are hotwords which trigger the specific action attached to them. That specific action can be anything.

Implementing hotword detection from scratch in SUSI Android is not an easy task. You have to define language model, train the model and do various other processes before implementing it in Android. In short, not feasible to implement that along with the code of our Android app. There are many open source projects on hotword detection and speech recognition. They already have done what we need and we can make use of it. One such project is Snowboy. According to Snowboy GitHub repo “Snowboy is a DNN based hotword and wake word detection toolkit.”

Img src: https://snowboy.kitt.ai/

In SUSI Android App, we have used Snowboy for hotword detection with hotword as “susi” (pronounced as ‘suzi’). In this blog, I will tell you how Hotword detection is implemented in SUSI Android app. So, you can just follow the steps and you will be able to implement it in your application too or if you want to contribute in SUSI android app, it may help you a little in knowing the codebase better.

Pre Processing before Implementation

1. Generating Hotword Model

The start of implementation of hotword detection begins with creating a hotword model from snowboy website https://snowboy.kitt.ai/dashboard . Just log in and search for susi and then train it by saying “susi” thrice and download the susi.pmdl file.

There are two types of models:

  1. .pmdl : Personal Model
  2. .umdl : Universal Model

The personal model is specifically trained for you and is instantly available for you to download once you train the hotword by your voice. On the other hand, the Universal model is trained by minimum 500 hundred people and is only available once it is trained. So, we are going to use personal model for now since training of universal model is not yet completed.

Img src: https://snowboy.kitt.ai/

2. Adding some predefined native binary files in your app.

Once you have downloaded the susi.pmdl file and you need to copy some already written native binary file in your app.

    1. In your assets folder, make a directory named snowboy and add your downloaded susi.pmdl file along with this file in it.
    2. Copy this folder and add it in your  /app/src/main/java folder as it is. These are autogenerated swig files. So, don’t change it unless you know what you are doing.
    3. Also, create a new folder in your /app/src/main folder called jniLibs and add these files to it.

Implementation in SUSI Android App

Check out the implementation of Hotword detection in SUSI Android App here

You now have everything ready. Now you just need to implement some code in your MainActivity and you are good to go. Initiate Hotword detection in the app by using below written code snippet. The call initHotword() method from your onCreate method in MainActivity. Make sure you have given permission to RECORD_AUDIO and WRITE_EXTERNAL_STORAGE.

private void initHotword() {
   if (ActivityCompat.checkSelfPermission(this,
           Manifest.permission.WRITE_EXTERNAL_STORAGE) == PackageManager.PERMISSION_GRANTED &&
       ActivityCompat.checkSelfPermission(this,
           Manifest.permission.RECORD_AUDIO) == PackageManager.PERMISSION_GRANTED) {
       AppResCopy.copyResFromAssetsToSD(this);

       recordingThread = new RecordingThread(new Handler() {
           @Override
           public void handleMessage(Message msg) {
               MsgEnum message = MsgEnum.getMsgEnum(msg.what);
               switch(message) {
                   case MSG_ACTIVE:
                       //HOTWORD DETECTED. NOW DO WHATEVER YOU WANT TO DO HERE
                       break;
                   case MSG_INFO:
                       break;
                   case MSG_VAD_SPEECH:
                       break;
                   case MSG_VAD_NOSPEECH:
                       break;
                   case MSG_ERROR:
                       break;
                   default:
                       super.handleMessage(msg);
                       break;
               }
           }
       }, new AudioDataSaver());
   }
}

In the above code under case, MSG_ACTIVE add your code which needs to be executed once hotword is detected. Now, hotword detection is initiated and you just have to start recording and stop recording whenever you want.

To start recording : (can be written in onStart() method)

if(recordingThread !=null && !isDetectionOn) {
    recordingThread.startRecording();
    isDetectionOn = true;
}

To stop recording : (can be written in onStop() method)

if(recordingThread !=null && isDetectionOn){
   recordingThread.stopRecording();
   isDetectionOn = false;
}

So, this this is the simple process to add hotword detection feature in your app.

Though this is great and will word wonderfully for you but it has a little problem and a solution which is discussed below.

Problems for now and future steps

As mentioned above, the susi.pmdl model is a personal model and only trained for your voice. So, it will work wonderfully for you and for people with a similar voice like you but not for everyone. There are two ways to make this solve this problem:

  1. Replace susi.pmdl with susi.umdl: This is very easy way to fix this problem. But to get the susi.umdl file at least 500 people have to train susi hotword on snowboy website. Once that target is achieved, you can download susi.umdl file and replace it with susi.pmdl file and you are good to go.
  2. Train and obtain susi.pmdl file for each user: Right now you trained and used your own personal model for everyone but there is a way you can use every user’s own personal model in the app. Now hotword will be trained by the user himself in the app. Snowboy provides a training API http://docs.kitt.ai/snowboy/#api-v1-train to train the hotword and getting personal model. Quoting snowboy docs “You can define truly customized hotword for each of your end customer. Just ask them to say the hotword 3 times and a model will be trained on the fly!”

Resources

  1. Official docs of Snowboy hotword detection http://docs.kitt.ai/snowboy/#introduction
  2. Code for snowboy hotword detection. It contains readme files which give very nice description about the project. https://github.com/kitt-ai/snowboy
  3. This readme file of snowboy which guides on how to set up snowboy in an android app https://github.com/Kitt-AI/snowboy/blob/master/examples/Android/README.md
Continue ReadingHotword Detection in SUSI Android App using Snowboy

Getting user Location in SUSI Android App and using it for various SUSI Skills

Using user location in skills is a very common phenomenon among various personal assistant like Google Assistant, Siri, Cortana etc. SUSI is no different. SUSI has various skills which uses user’s current location to implement skills. Though skills like “restaurant nearby” or “hotels nearby” are still under process but skills like “Where am I” works perfectly indicating SUSI has all basic requirements to create more advance skills in near future.

So let’s learn about how the SUSI Android App gets location of a user and sends it to SUSI Server where it is used to implement various location based skills.

Sources to find user location in an Android app

There are three sources from which android app gets users location :

  1. GPS
  2. Network
  3. Public IP Address

All three of these have various advantages and disadvantages. The SUSI Android app uses cleverly each of them to always get user location so that it can be used anytime and anywhere.

Some factors for comparison of these three sources :

Factors GPS Network IP Address
Source Satellites Wifi/Cell Tower Public IP address of user’s mobile
Accuracy Most Accurate (20ft) Moderately Accurate (200ft) Least Accurate (5000+ ft)
Requirements GPS in mobile Wifi or sim card Internet connection
Time taken to give location Takes long time to get location Fastest way to get location Fast enough (depends on internet speed)
Battery Consumption High Medium Low
Permission Required User permission required User permission required No permission required
Location Factor Works in outdoors. Does not work near tall buildings Works everywhere Works everywhere

Implementation of location finding feature in SUSI Android App

SUSI Android app very cleverly uses all the advantages of each location finding source to get most accurate location, consume less power and find location in any scenario.

The /susi/chat.json endpoint of SUSI API requires following 7 parameters :

Sno. Parameter Type Requirement
1 q String Compulsory
2 timezoneOffset int Optional
3 longitude double Optional
4 latitude double Optional
5 geosource String Optional
6 language Language  code Optional
7 access_token String Optional

In this blog we will be talking about latitude , longitude and geosource. So, we need these three things to pass as parameters for location related skills. Let’s see how we do that.

Finding location using IP Address: At the starting of app, user location is found by making an API call to ipinfo.io/json . This results in following JSON response having a field “loc” giving location of user (latitude and longitude.

{
  "ip": "YOUR_IP_ADDRESS",
  "city": "YOUR_CITY",
  "region": "YOUR_REGION",
  "country": "YOUR_COUNTRY_CODE",
  "loc": "YOUR_LATITUDE,YOUR_LONGITUDE",
  "org": "YOUR_ISP"
}

By this way we got latitude, longitude and geosource will be “ip” . We find location using IP address only once the app is started because there is no need of finding it again and again as making network calls takes time and drains battery.

So, now we have user’s location but this is not accurate. So, we will now proceed to check if we can find location using network is more accurate than location using IP address.

Finding location using Network Service Provider : To actually use the network provider and find out location requires ACCESS_COARSE_LOCATION permission from user which can be asked during the run time. Also, the location can only be found out using this if user has his location setting is enabled. So, we check following condition.

if (ActivityCompat.checkSelfPermission(mContext, Manifest.permission.ACCESS_COARSE_LOCATION) == PackageManager.PERMISSION_GRANTED) {
}

If permission is granted by user to find location using network provider, we use following code snippet to find location. It updates location of user after every 5 minutes or 10 meters (whichever is achieved first).

locationManager.requestLocationUpdates(LocationManager.NETWORK_PROVIDER, 5 * 60 * 1000, 10, this);
location = locationManager.getLastKnownLocation(LocationManager.NETWORK_PROVIDER);
if (location != null) {
   source = "network";
   canGetLocation = true;
   latitude = location.getLatitude();
   longitude = location.getLongitude();
}

So, whenever we are about to send query to SUSI Server, we take location from Network services, thus updating previous values of latitude, longitude and geosource (found using IP address) with the new values (found using Network Provider), provided the user has granted permission. So, we now have location is from Network Provider which is more accurate than location from IP address. Now we will check if we can find location from GPS or not.

Finding location using GPS Service Provider : Finding location from GPS Provider is almost same as Network Provider. To find location using GPS Provider user must give  ACCESS_FINE_LOCATION permission. We just check if GPS of user is enabled and user has given permission to use GPS and also if GPS can actually provide location. Sometimes, GPS can not provide location because user is indoor. In that cases we leave location from GPS.

So, now if we update previous values of latitude, longitude and geosource (found using Network Provider) with the new values (found using GPS Provider) and send query to SUSI Server.

Summary

To send location to server for location skills, we need latitude, longitude and geosource. We first find these 3 things using IP address (no that accurate). So, geosource will be “ip” for now. Then check if we can find values using network provider. If yes, we update those 3 values with the ones got from network Provider (more accurate). Geosource will change to “network”. Finally, we check if we can find values using GPS provider. If yes, we update those 3 values with the ones got from GPS Provider (most accurate). Geosource will change to “gps”. So, by this way we can find location of user in any circumstance possible. If you want to use location in your app too. Just follow the above steps and you are good to go.

Resources

 

Continue ReadingGetting user Location in SUSI Android App and using it for various SUSI Skills

Implementing Real Time Speech to Text in SUSI Android App

Android has a very interesting feature built in called Speech to Text. As the name suggests, it converts user’s speech into text. So, android provides this feature in form of a recognizer intent. Just the below 4 line code is needed to start recognizer intent to capture speech and convert into text. This was implemented earlier in SUSI Android App too.  Simple, isn’t it?

Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
       RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,
       "com.domain.app");
startActivityForResult(intent, REQ_CODE_SPEECH_INPUT);

But there is a problem here. It popups an alert dialog box to take a speech input like this :

Although this is great and working but there were many disadvantages of using this :

  1. It breaks connection between user and app.
  2. When user speaks, it first listens to complete sentence and then convert it into text. So, user was not able to see what the first word of sentence was until he says the complete sentence.
  3. There is no way for user to check if the sentence he is speaking is actually converting correctly to speech or not. He only gets to know about it when the sentence is completed.
  4. User can not cancel the conversion of Speech to text if some word is converted wrongly because the user doesn’t even get to know if the word is converted wrongly. It is kind of suspense that a whether the conversion is right or not.

So, now the main challenge was how can we improve the Speech to Text and implement it something like Google now?  So, what google now does is, it converts speech to text alongside taking input making it a real time speech to text convertor. So, basically if you speak second word of sentence, the first word is already converted and displayed. You can also cancel the conversion of rest of the sentence if the first word of sentence is converted wrongly.

I searched this problem a lot on internet, read various blogs and stackoverflow answers, read Documentation of Speech to Text until I finally came across SpeechRecognizer class and RecognitionListener. The plan was to use object of SpeechRecognizer class with RecognizerIntent and RecognitionListener to generate partial results.

So, I just added this line to the the above code snippet of recognizer intent.

intent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS,true);

And introduced SpeechRecognizer class and RecognitionListener like this

 SpeechRecognizer recognizer = SpeechRecognizer
   .createSpeechRecognizer(this.getApplicationContext());

RecognitionListener listener = new RecognitionListener() {
   @Override
   public void onResults(Bundle results) {
   }
 
   @Override
   public void onError(int error) {
   }
 
   @Override
   public void onPartialResults(Bundle partialResults) {
    //The main method required. Just keep printing results in this.
   }
};

Now combine SpeechRecognizer class with RecognizerIntent and RecognitionListener with these two lines.

recognizer.setRecognitionListener(listener);
recognizer.startListening(intent);

Everything is now set up. You can now use result in onPartialResults method and keep displaying partial results. So, if you want to speak “My name is…” , “My” will be displayed on screen by the time you speak “name” and “My name” will be displayed on screen by the time you speak “is” . Isn’t it cool?

This is how Realtime Speech to Text is implemented in SUSI Android App. If you want to implement this in your app too, just follow the above steps and you are good to go. So, the results look like these.

  

Resources

Continue ReadingImplementing Real Time Speech to Text in SUSI Android App

Displaying Web Search and Map Preview using Glide in SUSI Android App

SUSI.AI has many skills. Two of which are displaying web search of a certain query and displaying a map of certain position. This post will cover how these things are implemented in Susi Android App and how a preview is displayed using Bumptech’s free and open source Glide library. So, what is glide? According to Glide docs, Glide is a fast and efficient open source media management and image loading framework for Android that wraps media       decoding, memory and disk caching, and resource pooling into a simple and easy to use interface.

Now, lets describe how this framework is used in Susi App. So, let’s cover it’s two uses one by one :

Map Preview :

    

Whenever a user queries something like “Where is Singapore” or “Where am I”, the response from server is a map with latitude, longitude and zoom level. See the “type” which is map. It indicates android app that it needs to display a map with provided values.

“actions”: [
     {
       “type”: “answer”,
       “expression”: “Singapore is a place with a population of 3547809. Here is a map: https://www.openstreetmap.org/#map=13/1.2896698812440377/103.85006683126556”
     },
     {
       “type”: “anchor”,
       “link”: “https://www.openstreetmap.org/#map=13/1.2896698812440377/103.85006683126556”,
       “text”: “Link to Openstreetmap: Singapore”
     },
     {
       “type”: “map”,
       “latitude”: “1.2896698812440377”,
       “longitude”: “103.85006683126556”,
       “zoom”: “13”
     }
   ]
 }]

So, these values are then plugged into the below mentioned url where length x width is size of image to be retrieved. This url links to an image of the map with center and size as defined.

http://staticmap.openstreetmap.de/staticmap.php?center=latitude,longitude&zoom=zoom&size=lengthxwidth

So, now as we have a link to the image to be displayed. We just need something to get the image from that link and display it in the app. That’s where Glide comes to rescue. It loads the image from the link to an ImageView.

Glide.with(currContext).load(mapHelper.getMapURL()).listener(new RequestListener<String, GlideDrawable>() {

@Override
public boolean onException(Exception e, String model, Target<GlideDrawable> target, boolean isFirstResource) {
return false;
}

@Override
public boolean onResourceReady(GlideDrawable resource, String model,
Target<GlideDrawable> target, boolean isFromMemoryCache, boolean isFirstResource){
mapViewHolder.pointer.setVisibility(View.VISIBLE);
return false;
}

}).into(mapViewHolder.mapImage);

So, what the above code does is that it displays image from url mapHelper.getMapURL() to mapViewHolder.mapImage.

So, by this way, we are able to display a preview of the map in the chat itself. The user can then click on the image to load the         complete map.

Web Search Preview :

When a user enters queries like “search for dog”, the response from server is a websearch with the query to be searched,         something like this.

“actions”: [
     {
       “type”: “answer”,
       “expression”: “Here is a web search result:”
     },
     {
       “type”: “websearch”,
       “query”: “dog”
     }
   ]

So, now we know the query to be searched, we can use any search api to get results about that query ad display it in the app. In Susi android app, we have used DuckDuckGo open source api to do that task. We call this url

https://api.duckduckgo.com/?format=json&pretty=1&q=query&ia=meanings

which gives a json response with results. We then use the json     results which contains the link to result, image to be displayed and a short abstract of the result.

{
“Result” : “<a href=\”https://duckduckgo.com/Dog\”>Dog</a>A member of genus Canis that forms part of the wolf-like canids, and is the most widely abundant…”,
“Icon” : {
“URL” : “https://duckduckgo.com/i/16101b42.jpg”,
“Height” : “”,
“Width” : “”
},
“FirstURL” : “https://duckduckgo.com/Dog”,
“Text” : “Dog A member of genus Canis that forms part of the wolf-like canids, and is the most widely abundant…”
}

There are three things we are displaying in the websearch preview, which are Title, text and an image. We display the query as the title, text in json response as text and use glide to get the image from the url provided in the the json response.

glide.load(url)
.crossFade()
.into(websearchholder.previewImageView);

Using the above code, we load the image from the image url to websearchholder.previewImageView().

So, by this way, we are able to display a preview of the search     result. The user can then touch on this preview which opens the     detailed page of the result.

Happy Coding!

Continue ReadingDisplaying Web Search and Map Preview using Glide in SUSI Android App

Detecting and Fixing Memory leaks in Susi Android App

In the fast development of the Susi App, somehow developers missed out some memory leaks in the app. It is a very common mistake that developers do. Most new android developers don’t know much about Memory leaks and how to fix them. Memory leaks makes the app slower and causes crashes due to OutOfMemoryException. To make the susi app more efficient, it is advised to look out for these memory leaks and fix them. This post will focus on teaching developers who’ll be contributing in Susi android App or any other android app about the memory leaks, detecting them and fixing them.

What is a memory leak?

Android system manages memory allocation to run the apps efficiently. When memory runs short, it triggers Garbage Collector (GC) which cleans up the objects which are no longer useful, clearing up memory for other useful objects. But, suppose a case when a non-useful object is referenced from a useful object. In that case Garbage Collector would mark the non-useful object as useful and thus won’t be able to remove it, causing a Memory Leak.

Now, when a memory leak occurs, the app demands for memory from the android system but the android system can only give a certain amount of memory and after that point it will refuse to give more memory and thus causing a OutOfMemoryException and crashing the app. Even if sometime due to memory leaks, the app doesn’t crash but it surely will slow down and skip frames.

Now, few other questions arises, like “How to detect these leaks?” , “What causes these leaks ?” and “How can we fix these?” Let’s cover these one by one.

Detecting Memory Leaks

You can detect memory leaks in android app in two ways :

  1. Using Android Studio
  2. Using Leak Canary

In this post I’ll be describing the way to use Leak Canary to detect Memory Leaks. If you want to know about the way to use android studio to detect leaks, check out this link .

Using Leak Canary for Memory Leak detection :

Leak Canary is a very handy tool when it comes to detecting memory leaks. It shows the memory leak in an another app in the mobile itself. Just add these lines under the dependencies in build.gradle file.

debugCompile ‘com.squareup.leakcanary:leakcanary-android:1.5.1’
releaseCompile ‘com.squareup.leakcanary:leakcanary-android-no-op:1.5.1’
testCompile ‘com.squareup.leakcanary:leakcanary-android-no-op:1.5.1’

And this code here in the MainApplication.java file.

if (LeakCanary.isInAnalyzerProcess(this)) {
// This process is dedicated to LeakCanary for heap analysis.

// You should not init your app in this process.

return
;
}
LeakCanary.install(this);
// Normal app init code…

You are good to go. Now just run the app and if there is a memory leak you will get something like this. It dumps the memory in .hprof file and displays it in another app.

      

Causes of Memory Leaks

There are many causes of memory leaks. I will list a few top of my head but there can be more.

  1. Static Activities and Views : Defining a static variable inside the class definition of the Activity and then setting it to the running instance of that Activity. If this reference is not cleared before the Activity’s lifecycle completes, the Activity will be leaked.
  2. Listeners : When you register a listener, it is advised to unregister it in onDestroy() method to prevent memory leaks. It is not that prominent but may cause memory leaks.
  3. Inner Classes : If you create an instance of Inner Class and maintain a static reference to it, there is a chance of memory leak.
  4. Anonymous Classes : A leak can occur if you declare and instantiate an AsyncTask anonymously inside your Activity. If it continues to perform background work after the Activity has been destroyed, the reference to the Activity will persist and it won’t be garbage collected until after the background task completes.
  5. Handlers and Threads : The very same principle applies to background tasks declared anonymously by a Runnable object and queued up for execution by a Handler object.

Preventing and Fixing Memory Leaks

So, now you know what are the causes of these memory leaks. You just have to be a little more careful while implementing these. Here are some more tips to prevent or fix memory leaks :

  1. Be extra careful when dealing with Inner classes and Anonymous classes. Make them static wherever possible. Use a static inner class with a WeakReference to the outer class if that helps.
  2. Be very careful with a static variable in your activity class because it can reference your activity and cause leak. Be sure to remove the reference in onDestroy().
  3. Unregister all listeners in onDestroy() method.
  4. Always terminate worker threads you initiated on Activity onDestroy().
  5. Make sure that your allocated resources are all collected as expected. Do not always rely on Garbage Collector.
  6. Try using the context-application instead of a context-activity.

Conclusion

So, now if you want to contribute in Susi Android App and implement a feature in it, you can just check if there is a memory leak due to your implementation and fix it for better performance of the app. Also, if you find any other memory leak in the app, do report it on the issue tracker, fix it and make the Susi Android App more efficient.

Happy Coding!

Continue ReadingDetecting and Fixing Memory leaks in Susi Android App