Snowboy – blog.fossasia.org

Hotword Recognition in SUSI iOS

Post author:Chashmeet Singh
Post published:July 24, 2017
Post category:FOSSASIA GSoC SUSI.AI Tutorial
Post comments:0 Comments

Hot word recognition is a feature by which a specific action can be performed each time a specific word is spoken. There is a service called Snowboy which helps us achieve this for various clients (for ex: iOS, Android, Raspberry pi, etc.). It is basically a DNN based hotword recognition toolkit.

In this blog, we will learn how to integrate the snowboy hotword detection wrapper in the SUSI iOS client. This service can be used in any open source project but for using it commercially, a commercial license needs to be obtained.

Following are the files that need to be added to the project which are provided by the service itself: snowboy-detect.h libsnowboy-detect.a and a trained model file which can be created using their online service: snowboy.kitt.ai. For the sake of this blog, we will be using the hotword “Susi”, the model file can be found here.

The way how snowboy works is that speech is recorded for a few seconds and this data is detected with an already trained model by a specific hotword, now if snowboy returns a 1 means word has been successfully detected else wasn’t.

We start with creation of a wrapper class in Objective-C which can be found wrapper and the bridging header in case this needs to be added to a Swift project. The wrapper contains methods for setting sensitivity, audio gain and running the detection using the buffer. It is a wrapper class built on top of the snowboy-detect.h header file.

Let’s initialize the service and run it. Below are the steps followed to enable hotword recognition and print out whether it successfully detected the hotword or not:

Create a ViewController class with extensions
- AVAudioRecorderDelegate
- AVAudioPlayerDelegate

since we will be recording speech.

Import AVFoundation
Create a basic layout containing a label which detects whether hotword detected or not and create corresponding `IBOutlet` in the ViewController and a button to trigger the start and stop of recognition.
Create the following variables:
- let WAKE_WORD = “Susi” // hotword used
- let RESOURCE = Bundle.main.path(forResource: “common”, ofType: “res”)
- let MODEL = Bundle.main.path(forResource: “susi”, ofType: “umdl”) //path where the model file is stored
- var wrapper: SnowboyWrapper! = nil // wrapper instance for running detection
- var audioRecorder: AVAudioRecorder! // audio recorder instance
- var audioPlayer: AVAudioPlayer!
- var soundFileURL: URL! //stores the URL of the temp reording file
- var timer: Timer! //timer to fire a function after an interval
- var isStarted = false // variable to check if audio recorder already started
In `viewDidLoad` initialize the wrapper and set sensitivity and audio gain. Recognition best happens when sensitivity is set to `0.5` and audio gain is set to `1.0` according to the docs.

override func viewDidLoad() {
    super.viewDidLoad()
    wrapper = SnowboyWrapper(resources: RESOURCE, modelStr: MODEL)
    wrapper.setSensitivity("0.5")
    wrapper.setAudioGain(1.0)
}

Create an `IBAction` for the button to start recognition. This action will be used to start or stop the recording in which the action toggles based on the `isStarted` variable. When true, recording is stopped and the timer invalidated else a timer is started which calls the `startRecording` method with an interval of 4 seconds.

@IBAction func onClickBtn(_ sender: Any) {
  if (isStarted) {
    stopRecording()
    timer.invalidate()
    btn.setTitle("Start", for: .normal)
    isStarted = false
  } else {
    timer = Timer.scheduledTimer(timeInterval: 4, target: self, 
    selector: #selector(startRecording), userInfo: nil, repeats: true)
    timer.fire()
    btn.setTitle("Stop", for: .normal)
    isStarted = true
  }
}

Next, we add the start and stop recording methods.
- First, a temp file is created which stores the recorded audio output
- After which, necessary record configurations are made such as setting the sampling rate.
- The recording is then started and the output stored in the temp file.

func startRecording() {
  do {
    let fileMgr = FileManager.default
    let dirPaths = fileMgr.urls(for: .documentDirectory, in: .userDomainMask)
    soundFileURL = dirPaths[0].appendingPathComponent("temp.wav")
    let recordSettings = [AVEncoderAudioQualityKey: 
    AVAudioQuality.high.rawValue,
    AVEncoderBitRateKey: 128000,
    AVNumberOfChannelsKey: 1,
    AVSampleRateKey: 16000.0] as [String : Any]
    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(AVAudioSessionCategoryRecord)
    try audioRecorder = AVAudioRecorder(url: soundFileURL,
settings: recordSettings as [String : AnyObject])
    audioRecorder.delegate = self
    audioRecorder.prepareToRecord()
    audioRecorder.record(forDuration: 2.0)
    instructionLabel.text = "Speak wake word: \(WAKE_WORD)"print("Started recording...")
  } catch let error {
    print("Audio session error: \(error.localizedDescription)")
  }
}

The stop recording method, stops the audioRecorder instance and updates the instruction label to show the same.

func stopRecording() {
  if (audioRecorder != nil && audioRecorder.isRecording) {
    audioRecorder.stop()
  }
  instructionLabel.text = "Stop"
  print("Stopped recording...")
}

The final recognition is done in the `audioRecorderDidFinishRecording` delegate method which runs the snowboy detection function which processes the audio recording in the temp file by creating a buffer and storing the audio in it and giving the wrapper this buffer as input which processes the buffer and returning a `1` is hotword was successfully detected.

func runSnowboy() {
  let file = try! AVAudioFile(forReading: soundFileURL)
  let format = AVAudioFormat(commonFormat: .pcmFormatFloat32, 
  sampleRate: 16000.0, channels: 1, interleaved: false)
  let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(file.length))
  try! file.read(into: buffer)
  let array = Array(UnsafeBufferPointer(start: 
  buffer.floatChannelData![0], count:Int(buffer.frameLength)))
  // print output
  let result = wrapper.runDetection(array, length: Int32(buffer.frameLength))
  print("Result: \(result)")
}

To test this out, click the start button and speak different words and you will notice that once the Hot Word is spoken, log with `result: 1` is printed out.

The snowboy hotword recognition also offers to train the personalized model with the help of Rest Apis for which the docs can be found here. The complete project implementation can be found here.

Sources:

Hot word Detection Documentation: http://docs.kitt.ai/snowboy/
Snow boy Github repository containing sample apps: https://github.com/Kitt-AI/snowboy/tree/master/examples/iOS
A small tutorial using AVAudioRecorder: http://www.techotopia.com/index.php/Recording_Audio_on_iOS_8_with_AVAudioRecorder_in_Swift
A video tutorial on using AVAudioRecorder: https://www.youtube.com/watch?v=TZtf40cBltg

Managing States in SUSI MagicMirror Module

Post author:betterclever
Post published:July 23, 2017
Post category:FOSSASIA
Post comments:0 Comments

SUSI MagicMirror Module is a module for MagicMirror project by which you can use SUSI directly on MagicMirror. While developing the module, a problem I faced was that we need to manage the flow between the various stages of processing of voice input by the user and displaying SUSI output to the user. This was solved by making state management flow between various states of SUSI MagicMirror Module namely,

Idle State: When SUSI MagicMirror Module is actively listening for a hotword.
Listening State: In this state, the user’s speech input from the microphone is recorded to a file.
Busy State: The user has finished speaking or timed out. Now, we need to transcribe the audio spoken by the user, send the response to SUSI server and speak out the SUSI response.

The flow between these states can be explained by the following diagram:

As clear from the above diagram, transitions are not possible from a state to all other states. Only some transitions are allowed. Thus, we need a mechanism to guarantee only allowed transitions and ensure it triggers on the right time.

For achieving this, we first implement an abstract class State with common properties of a state. We store the information whether a state can transition into some other state in a map allowedTransitions which maps state names “idle”, “listening” and “busy” to their corresponding states. The transition method to transition from one state to another is implemented in the following way.

protected transition(state: State): void {
   if (!this.canTransition(state)) {
       console.error(`Invalid transition to state: ${state}`);
       return;
   }

   this.onExit();
   state.onEnter();
}

private canTransition(state: State): boolean {
   return this.allowedStateTransitions.has(state.name);
}

Here we first check if a transition is valid. Then we exit one state and enter into the supplied state. We also define a state machine that initializes the default state of the Mirror and define valid transitions for each state. Here is the constructor for state machine.

constructor(components: IStateMachineComponents) {
        this.idleState = new IdleState(components);
        this.listeningState = new ListeningState(components);
        this.busyState = new BusyState(components);

        this.idleState.AllowedStateTransitions = new Map<StateName, State>([["listening", this.listeningState]]);
        this.listeningState.AllowedStateTransitions = new Map<StateName, State>([["busy", this.busyState], ["idle", this.idleState]]);
        this.busyState.AllowedStateTransitions = new Map<StateName, State>([["idle", this.idleState]]);

        this.currentState = this.idleState;
        this.currentState.onEnter();
}

Now, the question arises that how do we detect when we need to transition from one state to another. For that we subscribe on the Snowboy Detector Observable. We are using Snowboy library for Hotword Detection. Snowboy detects whether an audio stream is silent, has some sound or whether hotword was spoken. We bind all this information to an observable using the ReactiveX Observable pattern. This gives us a stream of events to which we can subscribe and get the results. It can be understood in the following code snippet.

detector.on("silence", () => {
   this.subject.next(DETECTOR.Silence);
});

detector.on("sound", () => {});

detector.on("error", (error) => {
   console.error(error);
});

detector.on("hotword", (index, hotword) => {
   this.subject.next(DETECTOR.Hotword);
});

public get Observable(): Observable<DETECTOR> {
   return this.subject.asObservable();
}

Now, in the idle state, we subscribe to the values emitted by the observable of the detector to know when a hotword is detected to transition to the listening state. Here is the code snippet for the same.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Hotword:
           this.transition(this.allowedStateTransitions.get("listening"));
           break;
   }
});

In the listening state, we subscribe to the states emitted by the detector observable to find when silence is detected so that we can stop recording the audio stream for processing and move to busy state.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Silence:
           record.stop();
           this.transition(this.allowedStateTransitions.get("busy"));
           break;
   }
});

The task of speaking the audio and displaying results on the screen is done by a renderer. The communication to renderer is done via a RendererCommunicator object using a notification system. We also bind its events to an observable so that we know when SUSI has finished speaking the result. To transition from busy state to idle state, we subscribe to renderer observable in the following manner.

this.rendererSubscription = this.components.rendererCommunicator.Observable.subscribe((type) => {
   if (type === "finishedSpeaking") {
       this.transition(this.allowedStateTransitions.get("idle"));
   }
});

In this way, we transition between various states of MagicMirror Module for SUSI in an efficient manner.

Resources

MagicMirror Project Website: https://magicmirror.builders
RxJS Observable Usage tutorial: http://xgrommx.github.io/rx-book/content/observable/
State Pattern for Software Development: https://en.wikipedia.org/wiki/State_pattern
State Machine pattern in NodeJS: http://www.robert-drummond.com/2015/04/21/event-driven-programming-finite-state-machines-and-nodejs/

Hotword Detection on SUSI MagicMirror with Snowboy

Post author:betterclever
Post published:July 23, 2017
Post category:FOSSASIA GSoC SUSI.AI
Post comments:0 Comments

Magic Mirror in the story “Snow White and the Seven Dwarfs” had one cool feature. The Queen in the story could call Mirror just by saying “Mirror” and then ask it questions. MagicMirror project helps you develop a Mirror quite close to the one in the fable but how cool it would be to have the same feature? Hotword Detection on SUSI MagicMirror Module helps us achieve that.

The hotword detection on SUSI MagicMirror Module was accomplished with the help of Snowboy Hotword Detection Library. Snowboy is a cross platform hotword detection library. We are using the same library for Android, iOS as well as in MagicMirror Module (nodejs).

Snowboy can be added to a Javascript/Typescript project with Node Package Manager (npm) by:

$ npm install --save snowboy

For detecting hotword, we need to record audio continuously from the Microphone. To accomplish the task of recording, we have another npm package node-record-lpcm16. It used SoX binary to record audio. First we need to install SoX using

Linux (Debian based distributions)

$ sudo apt-get install sox libsox-fmt-all

Then, you can install node-record-lpcm16 package using npm using

$ npm install node-record-lpcm16

Then, we need to import it in the needed file using

import * as record from "node-record-lpcm16";

You may then create a new microphone stream using,

const mic = record.start({
   threshold: 0,
   sampleRate: 16000,
   verbose: true,
});

The mic constant here is a NodeJS Readable Stream. So, we can read the incoming data from the Microphone and process it.

We can now process this stream using Detector class of Snowboy. We declare a child class extending Snowboy Hotword Decoder to suit our needs.

import { Detector, Models } from "snowboy";

export class HotwordDetector extends Detector {
  
  1 constructor(models: Models) {
       super({
           resource: `${process.env.CWD}/resources/common.res`,
           models: models,
           audioGain: 2.0,
       });
       this.setUp();
   }

   // other methods
}

First, we create a Snowboy Detector by calling the parent constructor with resource file as common.res and a Snowboy model as argument. Snowboy model is a file which tells the detector which Hotword to listen for. Currently, the module supports hotword Susi but it can be extended to support other hotwords like Mirror too. You can train the hotword for SUSI for your voice and get the latest model file at https://snowboy.kitt.ai/hotword/7915 . You may then replace the susi.pmdl file in resources folder with our own susi.pmdl file for a better experience.

Now, we need to delegate the callback methods of Detector class to know about the current state of detector and take an action on its basis. This is done in the setUp() method.

private setUp(): void {
   this.on("silence", () => {
      // handle silent state
   });

   this.on("sound", () => {
      // handle sound detected state
   });

   this.on("error", (error) => {
      // handle error
   });

   this.on("hotword", (index, hotword) => {
      // hotword detected 
   });
}

If you go into the implementation of Detector class of Snowboy, it extends from NodeJS.WritableStream. So, we can pipe our microphone input read stream to Detector class and it handles all the states. This can be done using

mic.pipe(detector as any);

So, now all the input from Microphone will be processed by Snowboy detector class and we can know when the user has spoken the word “SUSI”. We can start speech recognition and do other changes in User Interface based on the different states.

After this, we can simply say “Susi” followed by our query to ask SUSI on the MagicMirror. A video implementation of the same can be seen here:

Resources:

MagicMirror Project Website: https://magicmirror.builders
SUSI Magic Mirror Repository: https://github.com/fossasia/MMM-SUSI-AI
Snowboy NodeJS official example code: https://github.com/Kitt-AI/snowboy/blob/master/examples/Node/file.js
Documentation for Snowboy usage and tutorial: https://snowboy.kitt.ai/docs
Snowboy nodejs implementation: https://github.com/Kitt-AI/snowboy/blob/master/lib/node/index.ts

Hotword Detection in SUSI Android App using Snowboy

Post author:chiragw15
Post published:July 22, 2017
Post category:Android FOSSASIA GSoC SUSI.AI
Post comments:0 Comments

Hotword Detection is as cool as it sounds. What exactly is hotword detection? Hotword detection is a feature in which a device gets activated when it listens to a specific word or a phrase. You must have said “OK Google” or “Hey Cortana” or “Siri” or “Alexa” at least once in your lifetime. These all are hotwords which trigger the specific action attached to them. That specific action can be anything.

Implementing hotword detection from scratch in SUSI Android is not an easy task. You have to define language model, train the model and do various other processes before implementing it in Android. In short, not feasible to implement that along with the code of our Android app. There are many open source projects on hotword detection and speech recognition. They already have done what we need and we can make use of it. One such project is Snowboy. According to Snowboy GitHub repo “Snowboy is a DNN based hotword and wake word detection toolkit.”

Img src: https://snowboy.kitt.ai/

In SUSI Android App, we have used Snowboy for hotword detection with hotword as “susi” (pronounced as ‘suzi’). In this blog, I will tell you how Hotword detection is implemented in SUSI Android app. So, you can just follow the steps and you will be able to implement it in your application too or if you want to contribute in SUSI android app, it may help you a little in knowing the codebase better.

Pre Processing before Implementation

1. Generating Hotword Model

The start of implementation of hotword detection begins with creating a hotword model from snowboy website https://snowboy.kitt.ai/dashboard . Just log in and search for susi and then train it by saying “susi” thrice and download the susi.pmdl file.

There are two types of models:

.pmdl : Personal Model
.umdl : Universal Model

The personal model is specifically trained for you and is instantly available for you to download once you train the hotword by your voice. On the other hand, the Universal model is trained by minimum 500 hundred people and is only available once it is trained. So, we are going to use personal model for now since training of universal model is not yet completed.

Img src: https://snowboy.kitt.ai/

2. Adding some predefined native binary files in your app.

Once you have downloaded the susi.pmdl file and you need to copy some already written native binary file in your app.

1. In your assets folder, make a directory named snowboy and add your downloaded susi.pmdl file along with this file in it.
2. Copy this folder and add it in your /app/src/main/java folder as it is. These are autogenerated swig files. So, don’t change it unless you know what you are doing.
3. Also, create a new folder in your /app/src/main folder called jniLibs and add these files to it.

Implementation in SUSI Android App

Check out the implementation of Hotword detection in SUSI Android App here

You now have everything ready. Now you just need to implement some code in your MainActivity and you are good to go. Initiate Hotword detection in the app by using below written code snippet. The call initHotword() method from your onCreate method in MainActivity. Make sure you have given permission to RECORD_AUDIO and WRITE_EXTERNAL_STORAGE.

private void initHotword() {
   if (ActivityCompat.checkSelfPermission(this,
           Manifest.permission.WRITE_EXTERNAL_STORAGE) == PackageManager.PERMISSION_GRANTED &&
       ActivityCompat.checkSelfPermission(this,
           Manifest.permission.RECORD_AUDIO) == PackageManager.PERMISSION_GRANTED) {
       AppResCopy.copyResFromAssetsToSD(this);

       recordingThread = new RecordingThread(new Handler() {
           @Override
           public void handleMessage(Message msg) {
               MsgEnum message = MsgEnum.getMsgEnum(msg.what);
               switch(message) {
                   case MSG_ACTIVE:
                       //HOTWORD DETECTED. NOW DO WHATEVER YOU WANT TO DO HERE
                       break;
                   case MSG_INFO:
                       break;
                   case MSG_VAD_SPEECH:
                       break;
                   case MSG_VAD_NOSPEECH:
                       break;
                   case MSG_ERROR:
                       break;
                   default:
                       super.handleMessage(msg);
                       break;
               }
           }
       }, new AudioDataSaver());
   }
}

In the above code under case, MSG_ACTIVE add your code which needs to be executed once hotword is detected. Now, hotword detection is initiated and you just have to start recording and stop recording whenever you want.

To start recording : (can be written in onStart() method)

if(recordingThread !=null && !isDetectionOn) {
    recordingThread.startRecording();
    isDetectionOn = true;
}

To stop recording : (can be written in onStop() method)

if(recordingThread !=null && isDetectionOn){
   recordingThread.stopRecording();
   isDetectionOn = false;
}

So, this this is the simple process to add hotword detection feature in your app.

Though this is great and will word wonderfully for you but it has a little problem and a solution which is discussed below.

Problems for now and future steps

As mentioned above, the susi.pmdl model is a personal model and only trained for your voice. So, it will work wonderfully for you and for people with a similar voice like you but not for everyone. There are two ways to make this solve this problem:

Replace susi.pmdl with susi.umdl: This is very easy way to fix this problem. But to get the susi.umdl file at least 500 people have to train susi hotword on snowboy website. Once that target is achieved, you can download susi.umdl file and replace it with susi.pmdl file and you are good to go.
Train and obtain susi.pmdl file for each user: Right now you trained and used your own personal model for everyone but there is a way you can use every user’s own personal model in the app. Now hotword will be trained by the user himself in the app. Snowboy provides a training API http://docs.kitt.ai/snowboy/#api-v1-train to train the hotword and getting personal model. Quoting snowboy docs “You can define truly customized hotword for each of your end customer. Just ask them to say the hotword 3 times and a model will be trained on the fly!”

Resources

Official docs of Snowboy hotword detection http://docs.kitt.ai/snowboy/#introduction
Code for snowboy hotword detection. It contains readme files which give very nice description about the project. https://github.com/kitt-ai/snowboy
This readme file of snowboy which guides on how to set up snowboy in an android app https://github.com/Kitt-AI/snowboy/blob/master/examples/Android/README.md