Managing States in SUSI MagicMirror Module

SUSI MagicMirror Module is a module for MagicMirror project by which you can use SUSI directly on MagicMirror. While developing the module, a problem I faced was that we need to manage the flow between the various stages of processing of voice input by the user and displaying SUSI output to the user. This was solved by making state management flow between various states of SUSI MagicMirror Module namely,

  • Idle State: When SUSI MagicMirror Module is actively listening for a hotword.
  • Listening State: In this state, the user’s speech input from the microphone is recorded to a file.
  • Busy State: The user has finished speaking or timed out. Now, we need to transcribe the audio spoken by the user, send the response to SUSI server and speak out the SUSI response.

The flow between these states can be explained by the following diagram:

As clear from the above diagram, transitions are not possible from a state to all other states. Only some transitions are allowed. Thus, we need a mechanism to guarantee only allowed transitions and ensure it triggers on the right time.

For achieving this, we first implement an abstract class State with common properties of a state. We store the information whether a state can transition into some other state in a map allowedTransitions which maps state names “idle”, “listening” and “busy” to their corresponding states. The transition method to transition from one state to another is implemented in the following way.

protected transition(state: State): void {
   if (!this.canTransition(state)) {
       console.error(`Invalid transition to state: ${state}`);
       return;
   }

   this.onExit();
   state.onEnter();
}

private canTransition(state: State): boolean {
   return this.allowedStateTransitions.has(state.name);
}

Here we first check if a transition is valid. Then we exit one state and enter into the supplied state.  We also define a state machine that initializes the default state of the Mirror and define valid transitions for each state. Here is the constructor for state machine.

constructor(components: IStateMachineComponents) {
        this.idleState = new IdleState(components);
        this.listeningState = new ListeningState(components);
        this.busyState = new BusyState(components);

        this.idleState.AllowedStateTransitions = new Map<StateName, State>([["listening", this.listeningState]]);
        this.listeningState.AllowedStateTransitions = new Map<StateName, State>([["busy", this.busyState], ["idle", this.idleState]]);
        this.busyState.AllowedStateTransitions = new Map<StateName, State>([["idle", this.idleState]]);

        this.currentState = this.idleState;
        this.currentState.onEnter();
}

Now, the question arises that how do we detect when we need to transition from one state to another. For that we subscribe on the Snowboy Detector Observable. We are using Snowboy library for Hotword Detection. Snowboy detects whether an audio stream is silent, has some sound or whether hotword was spoken. We bind all this information to an observable using the ReactiveX Observable pattern. This gives us a stream of events to which we can subscribe and get the results. It can be understood in the following code snippet.

detector.on("silence", () => {
   this.subject.next(DETECTOR.Silence);
});

detector.on("sound", () => {});

detector.on("error", (error) => {
   console.error(error);
});

detector.on("hotword", (index, hotword) => {
   this.subject.next(DETECTOR.Hotword);
});
public get Observable(): Observable<DETECTOR> {
   return this.subject.asObservable();
}

Now, in the idle state, we subscribe to the values emitted by the observable of the detector to know when a hotword is detected to transition to the listening state. Here is the code snippet for the same.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Hotword:
           this.transition(this.allowedStateTransitions.get("listening"));
           break;
   }
});

In the listening state, we subscribe to the states emitted by the detector observable to find when silence is detected so that we can stop recording the audio stream for processing and move to busy state.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Silence:
           record.stop();
           this.transition(this.allowedStateTransitions.get("busy"));
           break;
   }
});

The task of speaking the audio and displaying results on the screen is done by a renderer. The communication to renderer is done via a RendererCommunicator object using a notification system. We also bind its events to an observable so that we know when SUSI has finished speaking the result. To transition from busy state to idle state, we subscribe to renderer observable in the following manner.

this.rendererSubscription = this.components.rendererCommunicator.Observable.subscribe((type) => {
   if (type === "finishedSpeaking") {
       this.transition(this.allowedStateTransitions.get("idle"));
   }
});

In this way, we transition between various states of MagicMirror Module for SUSI in an efficient manner.

Resources

Continue ReadingManaging States in SUSI MagicMirror Module

Hotword Detection on SUSI MagicMirror with Snowboy

Magic Mirror in the story “Snow White and the Seven Dwarfs” had one cool feature. The Queen in the story could call Mirror just by saying “Mirror” and then ask it questions. MagicMirror project helps you develop a Mirror quite close to the one in the fable but how cool it would be to have the same feature? Hotword Detection on SUSI MagicMirror Module helps us achieve that.

The hotword detection on SUSI MagicMirror Module was accomplished with the help of Snowboy Hotword Detection Library. Snowboy is a cross platform hotword detection library. We are using the same library for Android, iOS as well as in MagicMirror Module (nodejs).

Snowboy can be added to a Javascript/Typescript project with Node Package Manager (npm) by:

$ npm install --save snowboy

For detecting hotword, we need to record audio continuously from the Microphone. To accomplish the task of recording, we have another npm package node-record-lpcm16. It used SoX binary to record audio. First we need to install SoX using

Linux (Debian based distributions)

$ sudo apt-get install sox libsox-fmt-all

Then, you can install node-record-lpcm16 package using npm using

$ npm install node-record-lpcm16

Then, we need to import it in the needed file using

import * as record from "node-record-lpcm16";

You may then create a new microphone stream using,

const mic = record.start({
   threshold: 0,
   sampleRate: 16000,
   verbose: true,
});

The mic constant here is a NodeJS Readable Stream. So, we can read the incoming data from the Microphone and process it.

We can now process this stream using Detector class of Snowboy. We declare a child class extending Snowboy Hotword Decoder to suit our needs.

import { Detector, Models } from "snowboy";

export class HotwordDetector extends Detector {
  
  1 constructor(models: Models) {
       super({
           resource: `${process.env.CWD}/resources/common.res`,
           models: models,
           audioGain: 2.0,
       });
       this.setUp();
   }

   // other methods
}

First, we create a Snowboy Detector by calling the parent constructor with resource file as common.res and a Snowboy model as argument. Snowboy model is a file which tells the detector which Hotword to listen for. Currently, the module supports hotword Susi but it can be extended to support other hotwords like Mirror too. You can train the hotword for SUSI for your voice and get the latest model file at https://snowboy.kitt.ai/hotword/7915 . You may then replace the susi.pmdl file in resources folder with our own susi.pmdl file for a better experience.

Now, we need to delegate the callback methods of Detector class to know about the current state of detector and take an action on its basis. This is done in the setUp() method.

private setUp(): void {
   this.on("silence", () => {
      // handle silent state
   });

   this.on("sound", () => {
      // handle sound detected state
   });

   this.on("error", (error) => {
      // handle error
   });

   this.on("hotword", (index, hotword) => {
      // hotword detected 
   });
}

If you go into the implementation of Detector class of Snowboy, it extends from NodeJS.WritableStream. So, we can pipe our microphone input read stream to Detector class and it handles all the states. This can be done using

mic.pipe(detector as any);

So, now all the input from Microphone will be processed by Snowboy detector class and we can know when the user has spoken the word “SUSI”. We can start speech recognition and do other changes in User Interface based on the different states.

After this, we can simply say “Susi” followed by our query to ask SUSI on the MagicMirror. A video implementation of the same can be seen here: 

Resources:

Continue ReadingHotword Detection on SUSI MagicMirror with Snowboy

Understanding the working of SUSI Hardware

Susi on Hardware is the latest addition to full suite of SUSI Apps. Being a hardware project, one might feel like it is too much complex, however it is not the case.

The solution is being primary built on a Raspberry Pi which, however small it may be, is a computer. Most things you expect to work on a normal computer, work on Raspberry Pi as well with a few advantages being its small size and General Purpose I/O access. But it comes with caveats of an ARM CPU, which may not support all applications which are mainly targeted for x86.
There are a few other development boards from Intel as well, which use x86/x64 architecture.

While working on the project, I did not wanted to make it too generic for a board or set of Hardware, thus all components used were targeted to be cross-platform.

Components that make Susi Hardware

SUSI Server

SUSI Server is the foremost important thing in any SUSI Project. SUSI Server handles all the queries by user which can be supplied using REST API and supplies answer in a nice format for clients. It also provides AAA: Authentication, Authorization and Accounting support for managing user accounts across platforms.

Github Repository: https://github.com/fossasia/susi_server

Susi Python Library

Susi Python Library was developed along with Susi Hardware project. It can work independent of Hardware Project and can be included in any Python Project for Susi Intelligence. It provides easy access to Susi Server REST API through easy python methods.
Github Repository: https://github.com/fossasia/susi_api_wrapper

Python Speech Recognition Library

The best advantage of using Python is that in most cases , you do not need to re-invent the wheel, some already has done the work for you. Python Speech Recognition library support for speech recognition through microphone and by a voice sample. It supports a number of Speech API providers like Google Speech API. Wit.AI, IBM Watson Speech-To-Text and a lot more.
This provides free to choose any of the speech recognition providers. For now, we are using Google Speech API and IBM Watson Speech API.


Pypi Package: https://pypi.python.org/pypi/SpeechRecognition/Github Repository: https://github.com/Uberi/speech_recognition

PocketSphinx for Hotword Detection

CMU PocketSphinx is an open-source offline speech recognition library. We have used PocketSphinx to enable hotword detection to Susi Hardware so you can interact with Susi handsfree.
More information on its working can be found in my other blog post.

Github Repository: https://github.com/cmusphinx/pocketsphinx

Flite Speech Synthesis System

CMU Flite (Festival-Lite) is a small sized , fast and open source speech synthesis engine developed by Carnegie Mellon University .
More information of integration and usage in Susi can be found in my other blog post

Project Website: http://www.festvox.org/flite/

The whole working of all these components together can be explained using the Diagram below.

Continue ReadingUnderstanding the working of SUSI Hardware

Hotword Detection with Pocketsphinx for SUSI.AI

Susi has many apps across all the major platforms. Latest addition to them is the Susi Hardware which allows you to setup Susi on a Hardware Device like Raspberry Pi.

Susi Hardware was able to interact with a push of a button, but it is always cool and convenient to call your assistant anytime rather than using a button.

Hotword Detection helps achieve that. Hotword Detection involves running a process in the background that continuously listens for voice. On noticing an utterance, we need to check whether it contains desired word. Hotword Detection and its integration with Susi AI can be explained using the diagram below:

 

What is PocketSphinx?

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop.

PocketSphinx is free and open source software. PocketSphinx has various applications but we utilize its power to detect a keyword (say Hotword) in a verbally spoken phrase.

Official Github Repository: https://github.com/cmusphinx/pocketsphinx

Installing PocketSphinx

We shall be using PocketSphinx with Python. Latest version on it can be installed by

pip install pocketsphinx

If you are using a Raspberry Pi or ARM based other board, Python 3.6 , it will install from sources by the above step since author doesn’t provide a Python Wheel. For that, you may need to install swig additionally.

sudo apt install swig

How to detect Hotword with PocketSphinx?

PocketSphinx can be used in various languages. For Susi Hardware, I am using
Python 3.

Steps:

      • Import PyAudio and PocketSphinx
        from pocketsphinx import *
        import pyaudio
        

         

      • Create a decoder with certain model, we are using en-us model and english us default dictionary. Specify a keyphrase for your application, for Susi AI , we are using “Susi” as Hotword
        pocketsphinx_dir = os.path.dirname(pocketsphinx.__file__)
        model_dir = os.path.join(pocketsphinx_dir, 'model')
        
        config = pocketsphinx.Decoder.default_config()
        config.set_string('-hmm', os.path.join(model_dir, 'en-us'))
        config.set_string('-keyphrase', 'susi')
        config.set_string('-dict', os.path.join(model_dir, dict_name))
        config.set_float('-kws_threshold', self.threshold)

         

      • Start a PyAudio Stream from Microphone Input
        p = pyaudio.PyAudio()
        
        stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=20480)
        stream.start_stream()

         

      • In a forever running loop, read from stream and process buffer in chunks of 1024 frames if it is not empty.
        buf = stream.read(1024)
        if buf:
            decoder.process_raw(buff)
        else:
            break;
        

         

      • Check for hotword and start speech recognition if hotword detected. After returning from the method, start detection again.

        if decoder.hyp() is not None:
            print("Hotword Detected")
            decoder.end_utt()
            start_speech_recognition()
            decoder.start_utt()
        

         

      • Run the app, if detection doesn’t seem to work well, adjust kws_threshold in step 2 to give optimal results.

In this way, Hotword Detection can be added to your Python Project. You may also develop some cool hacks with our AI powered Assistant Susi by Voice Control.
Check repository for more info: https://github.com/fossasia/susi_hardware

Continue ReadingHotword Detection with Pocketsphinx for SUSI.AI

Giving a Voice to Susi

Susi AI already has various apps and is available as a chatbot in various messaging platforms. We are going a step forward to make an SDK available for Susi that can be integrated on any Hardware Device (say speakers, toys, your bicycle etc – possibilities are endless )

One of the problem that I encountered while making a Prototype for the same is selecting an appropriate Text to Speech (TTS)  Engine.

It was a challenge, since on platforms like Android and iOS , you may utilize TTS engines bundled with Platform easily via a platform specific API, which are well optimized and give good performance.  The same was difficult on a hardware device that can run only Linux with no TTS provided by default.

Thus, I explored some possibilities

eSpeak TTS: eSpeak TTS (http://espeak.sourceforge.net/) was the first option considered for the task.
eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.

The major advantage of eSpeak is its small size (2MB) and small memory footprint which is advantageous in Low Memory Hardware like Orange Pi Zero or Raspberry Pi Zero.


Setting up eSpeak was easy but with its advantages , there were some drawbacks too.

  • The voice synthesis was quite robotic.
  • Very few voices were available.

Festival TTS: Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface.


Festival is free software. Festival and the speech tools are distributed under an X11-type licence allowing unrestricted commercial and non-commercial use alike.

Installing Festival:

On Arch Linux , it was pretty straight forward.

sudo pacman -S festival

There is a full wiki dedicated to it. ( https://wiki.archlinux.org/index.php/Festival )

Testing Festival

Festival has an interpreter to test it out. It can be invoked using
You may test out a TTS output using:

festival> (SayText "Hi!! I am Susi")

But the default sound in festival is still robotic and male. You don’t want your Personal Assistant to scare you out when you speak to her.

Thus, I searched on what are the best female voices available for Festival.

After looking at a discussion on the thread, https://ubuntuforums.org/showthread.php?t=751169 ,

I found that CMU-Arctic and HTS are some of the best voice sets for  Festival.

In Arch Linux, additional voice packs, are supplied in two additional packages,

festival-us and festival-english

Installation is straightforward:

sudo pacman -S festival-us festival-english

Now, on festival REPL , we can test out our new voices.

To see all available voices

festival> (voice.list)
(rab_diphone kal_diphone cmu_us_rms_cg cmu_us_awb_cg cmu_us_slt_cg)

Testing out a voice

festival> (voice_cmu_us_awb_cg)
cmu_us_awb_cg
festival> (SayText "Hi!! I am Susi")

This way after testing out all voices, with many different phrases. cmu_us_slt_cg  felt like an appropriate voice.

Setting Voice as Default

Voice may be set as default by adding following line to .festivalrc

 (set! voice_default voice_cmu_us_slt_cg)

Now festival will be invoked with this voice as default.

You may read a file using festival using

$ festival --tts <filename>

Calling Festival from Python

This was accomplished by first writing to a file, and then calling a subprocess to output speech using festival tts.

def speak(text):
    filename = '.response'
    file=open(filename,'w')
    file.write(text)
    file.close()
    # Call festival tts to reply the response by Susi
    subprocess.call('festival --tts '+filename, shell=True)

 

So, festival works pretty well on a moderately powerful machine, but while trying it on Raspberry Pi with a custom voice, there was noticeable amount of time delay for synthesis, so I searched for more alternatives.

Flite TTS: CMU Flite (festival-lite) is a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
Flite gives a noticeable speech improvement. It gives a Festival like output for 1/10th the amount taken by Festival.

Installing Flite TTS

On Raspberry Pi (Raspbian) , the version in Raspbian Jessie Repository is Flite 1.4 which does not support using an external speech file, so we need to compile and install from sources.

$ wget http://www.festvox.org/flite/packed/latest/flite-2.0.0-release.tar.bz2
$ tar xf flite-2.0.0-release.tar.bz2
$ cd flite-2.0.0-release/
$ ./configure
$ make
$ sudo make install

Now, you may download flitevox file for voice you wish to use from

http://www.festvox.org/flite/packed/latest/voices/

It can be invoked using

$ flite -voice file://<flitevox_file_path> -f <filepath-to-read>

You may save output audio stream to a file using

$ flite -voice file://<flitevox_file_path> -f <filepath-to-read> -o output.wav

Now, you can playback the audio. This can be invoked from python using.

import os

def speak_flite_tts(text):
    filename = '.response'
    file = open(filename, 'w')
    file.write(text)
    file.close()
    # Call flite tts to reply the response by Susi
    flite_speech_file = 'cmu_us_slt.flitevox'
    print('flite -voice file://{0} -f {1}'.format(flite_speech_file, filename))
    os.system('flite -v -voice file://{0} -f {1} -o output.wav'.format(flite_speech_file, filename))
    os.system('aplay output.wav')

 

In this way, we added offline Text to Speech Support of Susi Hardware SDK.

Continue ReadingGiving a Voice to Susi

Deploying Susi Server on Google Cloud with Kubernetes

Susi (acronym for Scientific User Support Intelligence) is an advanced AI made by people at FOSSASIA. It is an AI made by the people and for the people.
Susi is an Open Source Project under LGPL Licence.

SUSI.AI already has many Skills and anyone can add new skills through simple console rules.

If you want to participate in the development of the SUSI server you can start by learning to deploy it on a cloud system like Google Cloud.

This way whenever you make a change to Susi Server, you can test it out on various Susi Apps instantly.

Google Cloud with Kubernetes provide this ability. Let’s dig deep into what is Google Cloud Platform and Kubernetes.

What is Google Cloud Platform ?

Google Cloud Platform lets you build and host applications and websites, store data, and analyze data on Google’s scalable infrastructure.
Google Cloud Platform (at the time of writing this article) also provides free credits worth $300 for 1 year for testing out the Platform and test your applications.

What is Kubernetes ?

Kubernetes is an open-source system for automatic deployment, management and scaling of containerized applications. It makes it easy to roll out updates to your application with simple commands from your development machine and scale horizontally easily by adding more clusters as demand increase.

Deploying Susi Server on Kubernetes

Deploying Susi Server on Kubernetes is a fairly easy task. Follow up the steps to get it running.

Create a Google Cloud Account

Sign up for a Google Cloud Account (https://cloud.google.com/free-trial/) and get 300$ credits for initial use.

Create a New Project

After successful sign up, create a new project on Google Cloud Console.
Let’s name it Susi-Kubernetes . 

You will be provided a ProjectID. Remember it for further reference.

Install Google Cloud SDK and kubectl

Go to https://cloud.google.com/sdk/ and see instructions to setup Google Cloud SDK on your respective OS.

After Google Cloud SDK install, run

gcloud components install kubectl

This will install kubectl for interacting with Kubernetes.

Login and setup project

  1. Login to your Google Cloud Account using
$ gcloud auth login

2. List all the projects using

$ gcloud config list project
[core]
project = <PROJECT_ID>

3. Select your project

$ gcloud config set project <PROJECT_ID>

4. Install JDK8 for susi_server setup and set it as default.

5. Clone your fork of the Susi Server Repository

$ git clone https://github.com/<your_username>/susi_server.git
$ cd susi_server/

6. Build project and run Susi Server locally

$ ./gradlew build
$ bin/start.sh

Susi server must have been started started and web interface is accessible on http://localhost:4000

Install Docker and build Docker image for Susi

  1. Install Docker.
    Debian and derivatives:  sudo apt install docker
    Arch Linux:   sudo pacman -S docker 
  2. Build Docker Image for Susi
    $ docker build -t gcr.io/<Project_id>/susi:v1 .
  3. Push Image to Google Container Registry private to your project.
$ gcloud docker -- push gcr.io/<Project_id>/susi:v1

Create Cluster and Deploy your Susi Server there

  1. Create Cluster. You may specify different zone, number of nodes and machine type depending upon requirement.
    $ gcloud container clusters create <Cluster-Name> --num-nodes 2 --machine-type n1-standard-1 --zone us-central1-c
  2. Run your deployment. You may specify any name for deployment.
    $ kubectl run <deployment_name> --image=gcr.io/<Project_id>/susi:v1 --port=80
    $ kubectl get deployments
    $ kubectl expose deployment susi --type=LoadBalancer
  3. Check your deployment and get Public IP for Access.
    $ kubectl get services
    NAME         CLUSTER-IP     EXTERNAL-IP     PORT(S)       AGE
    kubernetes   10.3.240.1     <none>          443/TCP        1d
    susi         10.3.241.145   <PUBLIC_IP>     80:31155/TCP   1d
  4. Go to provided public IP to check, if Susi Server is running.

Congratulations, you successfully setup Susi Server on Google Cloud with Kubernetes.

Updating the deployment

Next step is to update deployment when you wish to roll out changes. To do so.

Build Docker Image and Push it to Google Container Registry

$ docker build -t gcr.io/<Project_Id>/susi:v2 .
$ gcloud docker -- push gcr.io/<Project_Id>/susi:v2

Update Deployment Image with Kubernetes

$ kubectl set image deployment/<Deployment_Name> \
  <Deployment_Name>=gcr.io/<Project_id>/susi:v2
deployment "<Deployment_Name>" image updated

Go to public ip to see the changes.

That’s it. Now, you have fully running Susi Server on your own Google Cloud Cluster using Kubernetes.

Continue ReadingDeploying Susi Server on Google Cloud with Kubernetes