Hotword Recognition in SUSI iOS
Hot word recognition is a feature by which a specific action can be performed each time a specific word is spoken. There is a service called Snowboy which helps us achieve this for various clients (for ex: iOS, Android, Raspberry pi, etc.). It is basically a DNN based hotword recognition toolkit. In this blog, we will learn how to integrate the snowboy hotword detection wrapper in the SUSI iOS client. This service can be used in any open source project but for using it commercially, a commercial license needs to be obtained. Following are the files that need to be added to the project which are provided by the service itself: snowboy-detect.h libsnowboy-detect.a and a trained model file which can be created using their online service: snowboy.kitt.ai. For the sake of this blog, we will be using the hotword “Susi”, the model file can be found here. The way how snowboy works is that speech is recorded for a few seconds and this data is detected with an already trained model by a specific hotword, now if snowboy returns a 1 means word has been successfully detected else wasn’t. We start with creation of a wrapper class in Objective-C which can be found wrapper and the bridging header in case this needs to be added to a Swift project. The wrapper contains methods for setting sensitivity, audio gain and running the detection using the buffer. It is a wrapper class built on top of the snowboy-detect.h header file. Let’s initialize the service and run it. Below are the steps followed to enable hotword recognition and print out whether it successfully detected the hotword or not: Create a ViewController class with extensions AVAudioRecorderDelegate AVAudioPlayerDelegate since we will be recording speech. Import AVFoundation Create a basic layout containing a label which detects whether hotword detected or not and create corresponding `IBOutlet` in the ViewController and a button to trigger the start and stop of recognition. Create the following variables: let WAKE_WORD = "Susi" // hotword used let RESOURCE = Bundle.main.path(forResource: "common", ofType: "res") let MODEL = Bundle.main.path(forResource: "susi", ofType: "umdl") //path where the model file is stored var wrapper: SnowboyWrapper! = nil // wrapper instance for running detection var audioRecorder: AVAudioRecorder! // audio recorder instance var audioPlayer: AVAudioPlayer! var soundFileURL: URL! //stores the URL of the temp reording file var timer: Timer! //timer to fire a function after an interval var isStarted = false // variable to check if audio recorder already started In `viewDidLoad` initialize the wrapper and set sensitivity and audio gain. Recognition best happens when sensitivity is set to `0.5` and audio gain is set to `1.0` according to the docs. override func viewDidLoad() { super.viewDidLoad() wrapper = SnowboyWrapper(resources: RESOURCE, modelStr: MODEL) wrapper.setSensitivity("0.5") wrapper.setAudioGain(1.0) } Create an `IBAction` for the button to start recognition. This action will be used to start or stop the recording in which the action toggles based on the `isStarted` variable. When true, recording is stopped and the timer invalidated else a timer is started…
