Hotword Recognition in SUSI iOS

Hot word recognition is a feature by which a specific action can be performed each time a specific word is spoken. There is a service called Snowboy which helps us achieve this for various clients (for ex: iOS, Android, Raspberry pi, etc.). It is basically a DNN based hotword recognition toolkit. In this blog, we will learn how to integrate the snowboy hotword detection wrapper in the SUSI iOS client. This service can be used in any open source project but for using it commercially, a commercial license needs to be obtained. Following are the files that need to be added to the project which are provided by the service itself: snowboy-detect.h libsnowboy-detect.a and a trained model file which can be created using their online service: snowboy.kitt.ai. For the sake of this blog, we will be using the hotword “Susi”, the model file can be found here. The way how snowboy works is that speech is recorded for a few seconds and this data is detected with an already trained model by a specific hotword, now if snowboy returns a 1 means word has been successfully detected else wasn’t. We start with creation of a wrapper class in Objective-C which can be found wrapper and the bridging header in case this needs to be added to a Swift project. The wrapper contains methods for setting sensitivity, audio gain and running the detection using the buffer. It is a wrapper class built on top of the snowboy-detect.h header file. Let’s initialize the service and run it. Below are the steps followed to enable hotword recognition and print out whether it successfully detected the hotword or not: Create a ViewController class with extensions AVAudioRecorderDelegate AVAudioPlayerDelegate since we will be recording speech. Import AVFoundation Create a basic layout containing a label which detects whether hotword detected or not and create corresponding `IBOutlet` in the ViewController and a button to trigger the start and stop of recognition. Create the following variables: let WAKE_WORD = "Susi" // hotword used let RESOURCE = Bundle.main.path(forResource: "common", ofType: "res") let MODEL = Bundle.main.path(forResource: "susi", ofType: "umdl") //path where the model file is stored var wrapper: SnowboyWrapper! = nil // wrapper instance for running detection var audioRecorder: AVAudioRecorder! // audio recorder instance var audioPlayer: AVAudioPlayer! var soundFileURL: URL! //stores the URL of the temp reording file var timer: Timer! //timer to fire a function after an interval var isStarted = false // variable to check if audio recorder already started In `viewDidLoad` initialize the wrapper and set sensitivity and audio gain. Recognition best happens when sensitivity is set to `0.5` and audio gain is set to `1.0` according to the docs. override func viewDidLoad() { super.viewDidLoad() wrapper = SnowboyWrapper(resources: RESOURCE, modelStr: MODEL) wrapper.setSensitivity("0.5") wrapper.setAudioGain(1.0) } Create an `IBAction` for the button to start recognition. This action will be used to start or stop the recording in which the action toggles based on the `isStarted` variable. When true, recording is stopped and the timer invalidated else a timer is started…

Continue ReadingHotword Recognition in SUSI iOS

Managing States in SUSI MagicMirror Module

SUSI MagicMirror Module is a module for MagicMirror project by which you can use SUSI directly on MagicMirror. While developing the module, a problem I faced was that we need to manage the flow between the various stages of processing of voice input by the user and displaying SUSI output to the user. This was solved by making state management flow between various states of SUSI MagicMirror Module namely, Idle State: When SUSI MagicMirror Module is actively listening for a hotword. Listening State: In this state, the user’s speech input from the microphone is recorded to a file. Busy State: The user has finished speaking or timed out. Now, we need to transcribe the audio spoken by the user, send the response to SUSI server and speak out the SUSI response. The flow between these states can be explained by the following diagram: As clear from the above diagram, transitions are not possible from a state to all other states. Only some transitions are allowed. Thus, we need a mechanism to guarantee only allowed transitions and ensure it triggers on the right time. For achieving this, we first implement an abstract class State with common properties of a state. We store the information whether a state can transition into some other state in a map allowedTransitions which maps state names “idle”, “listening” and “busy” to their corresponding states. The transition method to transition from one state to another is implemented in the following way. protected transition(state: State): void { if (!this.canTransition(state)) { console.error(`Invalid transition to state: ${state}`); return; } this.onExit(); state.onEnter(); } private canTransition(state: State): boolean { return this.allowedStateTransitions.has(state.name); } Here we first check if a transition is valid. Then we exit one state and enter into the supplied state.  We also define a state machine that initializes the default state of the Mirror and define valid transitions for each state. Here is the constructor for state machine. constructor(components: IStateMachineComponents) { this.idleState = new IdleState(components); this.listeningState = new ListeningState(components); this.busyState = new BusyState(components); this.idleState.AllowedStateTransitions = new Map<StateName, State>([["listening", this.listeningState]]); this.listeningState.AllowedStateTransitions = new Map<StateName, State>([["busy", this.busyState], ["idle", this.idleState]]); this.busyState.AllowedStateTransitions = new Map<StateName, State>([["idle", this.idleState]]); this.currentState = this.idleState; this.currentState.onEnter(); } Now, the question arises that how do we detect when we need to transition from one state to another. For that we subscribe on the Snowboy Detector Observable. We are using Snowboy library for Hotword Detection. Snowboy detects whether an audio stream is silent, has some sound or whether hotword was spoken. We bind all this information to an observable using the ReactiveX Observable pattern. This gives us a stream of events to which we can subscribe and get the results. It can be understood in the following code snippet. detector.on("silence", () => { this.subject.next(DETECTOR.Silence); }); detector.on("sound", () => {}); detector.on("error", (error) => { console.error(error); }); detector.on("hotword", (index, hotword) => { this.subject.next(DETECTOR.Hotword); }); public get Observable(): Observable<DETECTOR> { return this.subject.asObservable(); } Now, in the idle state, we subscribe to the values emitted by the observable of the detector to know when a hotword…

Continue ReadingManaging States in SUSI MagicMirror Module

Hotword Detection on SUSI MagicMirror with Snowboy

Magic Mirror in the story “Snow White and the Seven Dwarfs” had one cool feature. The Queen in the story could call Mirror just by saying “Mirror” and then ask it questions. MagicMirror project helps you develop a Mirror quite close to the one in the fable but how cool it would be to have the same feature? Hotword Detection on SUSI MagicMirror Module helps us achieve that. The hotword detection on SUSI MagicMirror Module was accomplished with the help of Snowboy Hotword Detection Library. Snowboy is a cross platform hotword detection library. We are using the same library for Android, iOS as well as in MagicMirror Module (nodejs). Snowboy can be added to a Javascript/Typescript project with Node Package Manager (npm) by: $ npm install --save snowboy For detecting hotword, we need to record audio continuously from the Microphone. To accomplish the task of recording, we have another npm package node-record-lpcm16. It used SoX binary to record audio. First we need to install SoX using Linux (Debian based distributions) $ sudo apt-get install sox libsox-fmt-all Then, you can install node-record-lpcm16 package using npm using $ npm install node-record-lpcm16 Then, we need to import it in the needed file using import * as record from "node-record-lpcm16"; You may then create a new microphone stream using, const mic = record.start({ threshold: 0, sampleRate: 16000, verbose: true, }); The mic constant here is a NodeJS Readable Stream. So, we can read the incoming data from the Microphone and process it. We can now process this stream using Detector class of Snowboy. We declare a child class extending Snowboy Hotword Decoder to suit our needs. import { Detector, Models } from "snowboy"; export class HotwordDetector extends Detector { 1 constructor(models: Models) { super({ resource: `${process.env.CWD}/resources/common.res`, models: models, audioGain: 2.0, }); this.setUp(); } // other methods } First, we create a Snowboy Detector by calling the parent constructor with resource file as common.res and a Snowboy model as argument. Snowboy model is a file which tells the detector which Hotword to listen for. Currently, the module supports hotword Susi but it can be extended to support other hotwords like Mirror too. You can train the hotword for SUSI for your voice and get the latest model file at https://snowboy.kitt.ai/hotword/7915 . You may then replace the susi.pmdl file in resources folder with our own susi.pmdl file for a better experience. Now, we need to delegate the callback methods of Detector class to know about the current state of detector and take an action on its basis. This is done in the setUp() method. private setUp(): void { this.on("silence", () => { // handle silent state }); this.on("sound", () => { // handle sound detected state }); this.on("error", (error) => { // handle error }); this.on("hotword", (index, hotword) => { // hotword detected }); } If you go into the implementation of Detector class of Snowboy, it extends from NodeJS.WritableStream. So, we can pipe our microphone input read stream to Detector class and it handles all…

Continue ReadingHotword Detection on SUSI MagicMirror with Snowboy

Hotword Detection in SUSI Android App using Snowboy

Hotword Detection is as cool as it sounds. What exactly is hotword detection? Hotword detection is a feature in which a device gets activated when it listens to a specific word or a phrase. You must have said “OK Google” or “Hey Cortana” or “Siri” or “Alexa” at least once in your lifetime. These all are hotwords which trigger the specific action attached to them. That specific action can be anything. Implementing hotword detection from scratch in SUSI Android is not an easy task. You have to define language model, train the model and do various other processes before implementing it in Android. In short, not feasible to implement that along with the code of our Android app. There are many open source projects on hotword detection and speech recognition. They already have done what we need and we can make use of it. One such project is Snowboy. According to Snowboy GitHub repo “Snowboy is a DNN based hotword and wake word detection toolkit.” Img src: https://snowboy.kitt.ai/ In SUSI Android App, we have used Snowboy for hotword detection with hotword as “susi” (pronounced as ‘suzi’). In this blog, I will tell you how Hotword detection is implemented in SUSI Android app. So, you can just follow the steps and you will be able to implement it in your application too or if you want to contribute in SUSI android app, it may help you a little in knowing the codebase better. Pre Processing before Implementation 1. Generating Hotword Model The start of implementation of hotword detection begins with creating a hotword model from snowboy website https://snowboy.kitt.ai/dashboard . Just log in and search for susi and then train it by saying “susi” thrice and download the susi.pmdl file. There are two types of models: .pmdl : Personal Model .umdl : Universal Model The personal model is specifically trained for you and is instantly available for you to download once you train the hotword by your voice. On the other hand, the Universal model is trained by minimum 500 hundred people and is only available once it is trained. So, we are going to use personal model for now since training of universal model is not yet completed. Img src: https://snowboy.kitt.ai/ 2. Adding some predefined native binary files in your app. Once you have downloaded the susi.pmdl file and you need to copy some already written native binary file in your app. In your assets folder, make a directory named snowboy and add your downloaded susi.pmdl file along with this file in it. Copy this folder and add it in your  /app/src/main/java folder as it is. These are autogenerated swig files. So, don’t change it unless you know what you are doing. Also, create a new folder in your /app/src/main folder called jniLibs and add these files to it. Implementation in SUSI Android App Check out the implementation of Hotword detection in SUSI Android App here You now have everything ready. Now you just need to implement some code in your…

Continue ReadingHotword Detection in SUSI Android App using Snowboy