Hotword – blog.fossasia.org

Hotword Recognition in SUSI iOS

Post author:Chashmeet Singh
Post published:July 24, 2017
Post category:FOSSASIA GSoC SUSI.AI Tutorial
Post comments:0 Comments

Hot word recognition is a feature by which a specific action can be performed each time a specific word is spoken. There is a service called Snowboy which helps us achieve this for various clients (for ex: iOS, Android, Raspberry pi, etc.). It is basically a DNN based hotword recognition toolkit.

In this blog, we will learn how to integrate the snowboy hotword detection wrapper in the SUSI iOS client. This service can be used in any open source project but for using it commercially, a commercial license needs to be obtained.

Following are the files that need to be added to the project which are provided by the service itself: snowboy-detect.h libsnowboy-detect.a and a trained model file which can be created using their online service: snowboy.kitt.ai. For the sake of this blog, we will be using the hotword “Susi”, the model file can be found here.

The way how snowboy works is that speech is recorded for a few seconds and this data is detected with an already trained model by a specific hotword, now if snowboy returns a 1 means word has been successfully detected else wasn’t.

We start with creation of a wrapper class in Objective-C which can be found wrapper and the bridging header in case this needs to be added to a Swift project. The wrapper contains methods for setting sensitivity, audio gain and running the detection using the buffer. It is a wrapper class built on top of the snowboy-detect.h header file.

Let’s initialize the service and run it. Below are the steps followed to enable hotword recognition and print out whether it successfully detected the hotword or not:

Create a ViewController class with extensions
- AVAudioRecorderDelegate
- AVAudioPlayerDelegate

since we will be recording speech.

Import AVFoundation
Create a basic layout containing a label which detects whether hotword detected or not and create corresponding `IBOutlet` in the ViewController and a button to trigger the start and stop of recognition.
Create the following variables:
- let WAKE_WORD = “Susi” // hotword used
- let RESOURCE = Bundle.main.path(forResource: “common”, ofType: “res”)
- let MODEL = Bundle.main.path(forResource: “susi”, ofType: “umdl”) //path where the model file is stored
- var wrapper: SnowboyWrapper! = nil // wrapper instance for running detection
- var audioRecorder: AVAudioRecorder! // audio recorder instance
- var audioPlayer: AVAudioPlayer!
- var soundFileURL: URL! //stores the URL of the temp reording file
- var timer: Timer! //timer to fire a function after an interval
- var isStarted = false // variable to check if audio recorder already started
In `viewDidLoad` initialize the wrapper and set sensitivity and audio gain. Recognition best happens when sensitivity is set to `0.5` and audio gain is set to `1.0` according to the docs.

override func viewDidLoad() {
    super.viewDidLoad()
    wrapper = SnowboyWrapper(resources: RESOURCE, modelStr: MODEL)
    wrapper.setSensitivity("0.5")
    wrapper.setAudioGain(1.0)
}

Create an `IBAction` for the button to start recognition. This action will be used to start or stop the recording in which the action toggles based on the `isStarted` variable. When true, recording is stopped and the timer invalidated else a timer is started which calls the `startRecording` method with an interval of 4 seconds.

@IBAction func onClickBtn(_ sender: Any) {
  if (isStarted) {
    stopRecording()
    timer.invalidate()
    btn.setTitle("Start", for: .normal)
    isStarted = false
  } else {
    timer = Timer.scheduledTimer(timeInterval: 4, target: self, 
    selector: #selector(startRecording), userInfo: nil, repeats: true)
    timer.fire()
    btn.setTitle("Stop", for: .normal)
    isStarted = true
  }
}

Next, we add the start and stop recording methods.
- First, a temp file is created which stores the recorded audio output
- After which, necessary record configurations are made such as setting the sampling rate.
- The recording is then started and the output stored in the temp file.

func startRecording() {
  do {
    let fileMgr = FileManager.default
    let dirPaths = fileMgr.urls(for: .documentDirectory, in: .userDomainMask)
    soundFileURL = dirPaths[0].appendingPathComponent("temp.wav")
    let recordSettings = [AVEncoderAudioQualityKey: 
    AVAudioQuality.high.rawValue,
    AVEncoderBitRateKey: 128000,
    AVNumberOfChannelsKey: 1,
    AVSampleRateKey: 16000.0] as [String : Any]
    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(AVAudioSessionCategoryRecord)
    try audioRecorder = AVAudioRecorder(url: soundFileURL,
settings: recordSettings as [String : AnyObject])
    audioRecorder.delegate = self
    audioRecorder.prepareToRecord()
    audioRecorder.record(forDuration: 2.0)
    instructionLabel.text = "Speak wake word: \(WAKE_WORD)"print("Started recording...")
  } catch let error {
    print("Audio session error: \(error.localizedDescription)")
  }
}

The stop recording method, stops the audioRecorder instance and updates the instruction label to show the same.

func stopRecording() {
  if (audioRecorder != nil && audioRecorder.isRecording) {
    audioRecorder.stop()
  }
  instructionLabel.text = "Stop"
  print("Stopped recording...")
}

The final recognition is done in the `audioRecorderDidFinishRecording` delegate method which runs the snowboy detection function which processes the audio recording in the temp file by creating a buffer and storing the audio in it and giving the wrapper this buffer as input which processes the buffer and returning a `1` is hotword was successfully detected.

func runSnowboy() {
  let file = try! AVAudioFile(forReading: soundFileURL)
  let format = AVAudioFormat(commonFormat: .pcmFormatFloat32, 
  sampleRate: 16000.0, channels: 1, interleaved: false)
  let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(file.length))
  try! file.read(into: buffer)
  let array = Array(UnsafeBufferPointer(start: 
  buffer.floatChannelData![0], count:Int(buffer.frameLength)))
  // print output
  let result = wrapper.runDetection(array, length: Int32(buffer.frameLength))
  print("Result: \(result)")
}

To test this out, click the start button and speak different words and you will notice that once the Hot Word is spoken, log with `result: 1` is printed out.

The snowboy hotword recognition also offers to train the personalized model with the help of Rest Apis for which the docs can be found here. The complete project implementation can be found here.

Sources:

Hot word Detection Documentation: http://docs.kitt.ai/snowboy/
Snow boy Github repository containing sample apps: https://github.com/Kitt-AI/snowboy/tree/master/examples/iOS
A small tutorial using AVAudioRecorder: http://www.techotopia.com/index.php/Recording_Audio_on_iOS_8_with_AVAudioRecorder_in_Swift
A video tutorial on using AVAudioRecorder: https://www.youtube.com/watch?v=TZtf40cBltg

Hotword Detection in SUSI Android App using Snowboy

Post author:chiragw15
Post published:July 22, 2017
Post category:Android FOSSASIA GSoC SUSI.AI
Post comments:0 Comments

Hotword Detection is as cool as it sounds. What exactly is hotword detection? Hotword detection is a feature in which a device gets activated when it listens to a specific word or a phrase. You must have said “OK Google” or “Hey Cortana” or “Siri” or “Alexa” at least once in your lifetime. These all are hotwords which trigger the specific action attached to them. That specific action can be anything.

Implementing hotword detection from scratch in SUSI Android is not an easy task. You have to define language model, train the model and do various other processes before implementing it in Android. In short, not feasible to implement that along with the code of our Android app. There are many open source projects on hotword detection and speech recognition. They already have done what we need and we can make use of it. One such project is Snowboy. According to Snowboy GitHub repo “Snowboy is a DNN based hotword and wake word detection toolkit.”

Img src: https://snowboy.kitt.ai/

In SUSI Android App, we have used Snowboy for hotword detection with hotword as “susi” (pronounced as ‘suzi’). In this blog, I will tell you how Hotword detection is implemented in SUSI Android app. So, you can just follow the steps and you will be able to implement it in your application too or if you want to contribute in SUSI android app, it may help you a little in knowing the codebase better.

Pre Processing before Implementation

1. Generating Hotword Model

The start of implementation of hotword detection begins with creating a hotword model from snowboy website https://snowboy.kitt.ai/dashboard . Just log in and search for susi and then train it by saying “susi” thrice and download the susi.pmdl file.

There are two types of models:

.pmdl : Personal Model
.umdl : Universal Model

The personal model is specifically trained for you and is instantly available for you to download once you train the hotword by your voice. On the other hand, the Universal model is trained by minimum 500 hundred people and is only available once it is trained. So, we are going to use personal model for now since training of universal model is not yet completed.

Img src: https://snowboy.kitt.ai/

2. Adding some predefined native binary files in your app.

Once you have downloaded the susi.pmdl file and you need to copy some already written native binary file in your app.

1. In your assets folder, make a directory named snowboy and add your downloaded susi.pmdl file along with this file in it.
2. Copy this folder and add it in your /app/src/main/java folder as it is. These are autogenerated swig files. So, don’t change it unless you know what you are doing.
3. Also, create a new folder in your /app/src/main folder called jniLibs and add these files to it.

Implementation in SUSI Android App

Check out the implementation of Hotword detection in SUSI Android App here

You now have everything ready. Now you just need to implement some code in your MainActivity and you are good to go. Initiate Hotword detection in the app by using below written code snippet. The call initHotword() method from your onCreate method in MainActivity. Make sure you have given permission to RECORD_AUDIO and WRITE_EXTERNAL_STORAGE.

private void initHotword() {
   if (ActivityCompat.checkSelfPermission(this,
           Manifest.permission.WRITE_EXTERNAL_STORAGE) == PackageManager.PERMISSION_GRANTED &&
       ActivityCompat.checkSelfPermission(this,
           Manifest.permission.RECORD_AUDIO) == PackageManager.PERMISSION_GRANTED) {
       AppResCopy.copyResFromAssetsToSD(this);

       recordingThread = new RecordingThread(new Handler() {
           @Override
           public void handleMessage(Message msg) {
               MsgEnum message = MsgEnum.getMsgEnum(msg.what);
               switch(message) {
                   case MSG_ACTIVE:
                       //HOTWORD DETECTED. NOW DO WHATEVER YOU WANT TO DO HERE
                       break;
                   case MSG_INFO:
                       break;
                   case MSG_VAD_SPEECH:
                       break;
                   case MSG_VAD_NOSPEECH:
                       break;
                   case MSG_ERROR:
                       break;
                   default:
                       super.handleMessage(msg);
                       break;
               }
           }
       }, new AudioDataSaver());
   }
}

In the above code under case, MSG_ACTIVE add your code which needs to be executed once hotword is detected. Now, hotword detection is initiated and you just have to start recording and stop recording whenever you want.

To start recording : (can be written in onStart() method)

if(recordingThread !=null && !isDetectionOn) {
    recordingThread.startRecording();
    isDetectionOn = true;
}

To stop recording : (can be written in onStop() method)

if(recordingThread !=null && isDetectionOn){
   recordingThread.stopRecording();
   isDetectionOn = false;
}

So, this this is the simple process to add hotword detection feature in your app.

Though this is great and will word wonderfully for you but it has a little problem and a solution which is discussed below.

Problems for now and future steps

As mentioned above, the susi.pmdl model is a personal model and only trained for your voice. So, it will work wonderfully for you and for people with a similar voice like you but not for everyone. There are two ways to make this solve this problem:

Replace susi.pmdl with susi.umdl: This is very easy way to fix this problem. But to get the susi.umdl file at least 500 people have to train susi hotword on snowboy website. Once that target is achieved, you can download susi.umdl file and replace it with susi.pmdl file and you are good to go.
Train and obtain susi.pmdl file for each user: Right now you trained and used your own personal model for everyone but there is a way you can use every user’s own personal model in the app. Now hotword will be trained by the user himself in the app. Snowboy provides a training API http://docs.kitt.ai/snowboy/#api-v1-train to train the hotword and getting personal model. Quoting snowboy docs “You can define truly customized hotword for each of your end customer. Just ask them to say the hotword 3 times and a model will be trained on the fly!”

Resources

Official docs of Snowboy hotword detection http://docs.kitt.ai/snowboy/#introduction
Code for snowboy hotword detection. It contains readme files which give very nice description about the project. https://github.com/kitt-ai/snowboy
This readme file of snowboy which guides on how to set up snowboy in an android app https://github.com/Kitt-AI/snowboy/blob/master/examples/Android/README.md