Managing States in SUSI MagicMirror Module

SUSI MagicMirror Module is a module for MagicMirror project by which you can use SUSI directly on MagicMirror. While developing the module, a problem I faced was that we need to manage the flow between the various stages of processing of voice input by the user and displaying SUSI output to the user. This was solved by making state management flow between various states of SUSI MagicMirror Module namely,

  • Idle State: When SUSI MagicMirror Module is actively listening for a hotword.
  • Listening State: In this state, the user’s speech input from the microphone is recorded to a file.
  • Busy State: The user has finished speaking or timed out. Now, we need to transcribe the audio spoken by the user, send the response to SUSI server and speak out the SUSI response.

The flow between these states can be explained by the following diagram:

As clear from the above diagram, transitions are not possible from a state to all other states. Only some transitions are allowed. Thus, we need a mechanism to guarantee only allowed transitions and ensure it triggers on the right time.

For achieving this, we first implement an abstract class State with common properties of a state. We store the information whether a state can transition into some other state in a map allowedTransitions which maps state names “idle”, “listening” and “busy” to their corresponding states. The transition method to transition from one state to another is implemented in the following way.

protected transition(state: State): void {
   if (!this.canTransition(state)) {
       console.error(`Invalid transition to state: ${state}`);
       return;
   }

   this.onExit();
   state.onEnter();
}

private canTransition(state: State): boolean {
   return this.allowedStateTransitions.has(state.name);
}

Here we first check if a transition is valid. Then we exit one state and enter into the supplied state.  We also define a state machine that initializes the default state of the Mirror and define valid transitions for each state. Here is the constructor for state machine.

constructor(components: IStateMachineComponents) {
        this.idleState = new IdleState(components);
        this.listeningState = new ListeningState(components);
        this.busyState = new BusyState(components);

        this.idleState.AllowedStateTransitions = new Map<StateName, State>([["listening", this.listeningState]]);
        this.listeningState.AllowedStateTransitions = new Map<StateName, State>([["busy", this.busyState], ["idle", this.idleState]]);
        this.busyState.AllowedStateTransitions = new Map<StateName, State>([["idle", this.idleState]]);

        this.currentState = this.idleState;
        this.currentState.onEnter();
}

Now, the question arises that how do we detect when we need to transition from one state to another. For that we subscribe on the Snowboy Detector Observable. We are using Snowboy library for Hotword Detection. Snowboy detects whether an audio stream is silent, has some sound or whether hotword was spoken. We bind all this information to an observable using the ReactiveX Observable pattern. This gives us a stream of events to which we can subscribe and get the results. It can be understood in the following code snippet.

detector.on("silence", () => {
   this.subject.next(DETECTOR.Silence);
});

detector.on("sound", () => {});

detector.on("error", (error) => {
   console.error(error);
});

detector.on("hotword", (index, hotword) => {
   this.subject.next(DETECTOR.Hotword);
});
public get Observable(): Observable<DETECTOR> {
   return this.subject.asObservable();
}

Now, in the idle state, we subscribe to the values emitted by the observable of the detector to know when a hotword is detected to transition to the listening state. Here is the code snippet for the same.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Hotword:
           this.transition(this.allowedStateTransitions.get("listening"));
           break;
   }
});

In the listening state, we subscribe to the states emitted by the detector observable to find when silence is detected so that we can stop recording the audio stream for processing and move to busy state.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Silence:
           record.stop();
           this.transition(this.allowedStateTransitions.get("busy"));
           break;
   }
});

The task of speaking the audio and displaying results on the screen is done by a renderer. The communication to renderer is done via a RendererCommunicator object using a notification system. We also bind its events to an observable so that we know when SUSI has finished speaking the result. To transition from busy state to idle state, we subscribe to renderer observable in the following manner.

this.rendererSubscription = this.components.rendererCommunicator.Observable.subscribe((type) => {
   if (type === "finishedSpeaking") {
       this.transition(this.allowedStateTransitions.get("idle"));
   }
});

In this way, we transition between various states of MagicMirror Module for SUSI in an efficient manner.

Resources

Continue Reading

Hotword Detection on SUSI MagicMirror with Snowboy

Magic Mirror in the story “Snow White and the Seven Dwarfs” had one cool feature. The Queen in the story could call Mirror just by saying “Mirror” and then ask it questions. MagicMirror project helps you develop a Mirror quite close to the one in the fable but how cool it would be to have the same feature? Hotword Detection on SUSI MagicMirror Module helps us achieve that.

The hotword detection on SUSI MagicMirror Module was accomplished with the help of Snowboy Hotword Detection Library. Snowboy is a cross platform hotword detection library. We are using the same library for Android, iOS as well as in MagicMirror Module (nodejs).

Snowboy can be added to a Javascript/Typescript project with Node Package Manager (npm) by:

$ npm install --save snowboy

For detecting hotword, we need to record audio continuously from the Microphone. To accomplish the task of recording, we have another npm package node-record-lpcm16. It used SoX binary to record audio. First we need to install SoX using

Linux (Debian based distributions)

$ sudo apt-get install sox libsox-fmt-all

Then, you can install node-record-lpcm16 package using npm using

$ npm install node-record-lpcm16

Then, we need to import it in the needed file using

import * as record from "node-record-lpcm16";

You may then create a new microphone stream using,

const mic = record.start({
   threshold: 0,
   sampleRate: 16000,
   verbose: true,
});

The mic constant here is a NodeJS Readable Stream. So, we can read the incoming data from the Microphone and process it.

We can now process this stream using Detector class of Snowboy. We declare a child class extending Snowboy Hotword Decoder to suit our needs.

import { Detector, Models } from "snowboy";

export class HotwordDetector extends Detector {
  
  1 constructor(models: Models) {
       super({
           resource: `${process.env.CWD}/resources/common.res`,
           models: models,
           audioGain: 2.0,
       });
       this.setUp();
   }

   // other methods
}

First, we create a Snowboy Detector by calling the parent constructor with resource file as common.res and a Snowboy model as argument. Snowboy model is a file which tells the detector which Hotword to listen for. Currently, the module supports hotword Susi but it can be extended to support other hotwords like Mirror too. You can train the hotword for SUSI for your voice and get the latest model file at https://snowboy.kitt.ai/hotword/7915 . You may then replace the susi.pmdl file in resources folder with our own susi.pmdl file for a better experience.

Now, we need to delegate the callback methods of Detector class to know about the current state of detector and take an action on its basis. This is done in the setUp() method.

private setUp(): void {
   this.on("silence", () => {
      // handle silent state
   });

   this.on("sound", () => {
      // handle sound detected state
   });

   this.on("error", (error) => {
      // handle error
   });

   this.on("hotword", (index, hotword) => {
      // hotword detected 
   });
}

If you go into the implementation of Detector class of Snowboy, it extends from NodeJS.WritableStream. So, we can pipe our microphone input read stream to Detector class and it handles all the states. This can be done using

mic.pipe(detector as any);

So, now all the input from Microphone will be processed by Snowboy detector class and we can know when the user has spoken the word “SUSI”. We can start speech recognition and do other changes in User Interface based on the different states.

After this, we can simply say “Susi” followed by our query to ask SUSI on the MagicMirror. A video implementation of the same can be seen here: 

Resources:

Continue Reading
Close Menu