Setting up SUSI Desktop Locally for Development and Using Webview Tag and Adding Event Listeners

SUSI Desktop is a cross platform desktop application based on electron which presently uses chat.susi.ai as a submodule and allows the users to interact with susi right from their desktop.

Any electron app essentially comprises of the following components

    • Main Process (Managing windows and other interactions with the operating system)
    • Renderer Process (Manage the view inside the BrowserWindow)

Steps to setup development environment

      • Clone the repo locally.
$ git clone https://github.com/fossasia/susi_desktop.git
$ cd susi_desktop
      • Install the dependencies listed in package.json file.
$ npm install
      • Start the app using the start script.
$ npm start

Structure of the project

The project was restructured to ensure that the working environment of the Main and Renderer processes are separate which makes the codebase easier to read and debug, this is how the current project is structured.

The root directory of the project contains another directory ‘app’ which contains our electron application. Then we have a package.json which contains the information about the project and the modules required for building the project and then there are other github helper files.

Inside the app directory-

  • Main – Files for managing the main process of the app
  • Renderer – Files for managing the renderer process of the app
  • Resources – Icons for the app and the tray/media files
  • Webview Tag

    Display external web content in an isolated frame and process, this is used to load chat.susi.ai in a BrowserWindow as

    <webview src="https://chat.susi.ai/"></webview>
    

    Adding event listeners to the app

    Various electron APIs were used to give a native feel to the application.

  • Send focus to the window WebContents on focussing the app window.
  • win.on('focus', () => {
    	win.webContents.send('focus');
    });
    
  • Display the window only once the DOM has completely loaded.
  • const page = mainWindow.webContents;
    ...
    page.on('dom-ready', () => {
    	mainWindow.show();
    });
    
  • Display the window on ‘ready-to-show’ event
  • win.once('ready-to-show', () => {
    	win.show();
    });
    

    Resources

    1. A quick article to understand electron’s main and renderer process by Cameron Nokes at Medium link
    2. Official documentation about the webview tag at https://electron.atom.io/docs/api/webview-tag/
    3. Read more about electron processes at https://electronjs.org/docs/glossary#process
    4. SUSI Desktop repository at https://github.com/fossasia/susi_desktop.

    Enhancing SUSI Desktop to Display a Loading Animation and Auto-Hide Menu Bar by Default

    SUSI Desktop is a cross platform desktop application based on electron which presently uses chat.susi.ai as a submodule and allows the users to interact with susi right from their desktop. The benefits of using chat.susi.ai as a submodule is that it inherits all the features that the webapp offers and thus serves them in a nicely build native application.

    Display a loading animation during DOM load.

    Electron apps should give a native feel, rather than feeling like they are just rendering some DOM, it would be great if we display a loading animation while the web content is actually loading, as depicted in the gif below is how I implemented that.
    Electron provides a nice, easy to use API for handling BrowserWindow, WebContent events. I read through the official docs and came up with a simple solution for this, as depicted in the below snippet.

    onload = function () {
    	const webview = document.querySelector('webview');
    	const loading = document.querySelector('#loading');
    
    	function onStopLoad() {
    		loading.classList.add('hide');
    	}
    
    	function onStartLoad() {
    		loading.classList.remove('hide');
    	}
    
    	webview.addEventListener('did-stop-loading', onStopLoad);
    	webview.addEventListener('did-start-loading', onStartLoad);
    };
    

    Hiding menu bar as default

    Menu bars are useful, but are annoying since they take up space in main window, so I hid them by default and users can toggle their display on pressing the Alt key at any point of time, I used the autoHideMenuBar property of BrowserWindow class while creating an object to achieve this.

    const win = new BrowserWindow({
    	
    	show: false,
    	autoHideMenuBar: true
    });
    

    Resources

    1. More information about BrowserWindow class in the official documentation at electron.atom.io.
    2. Follow a quick tutorial to kickstart creating apps with electron at https://www.youtube.com/watch?v=jKzBJAowmGg.
    3. SUSI Desktop repository at https://github.com/fossasia/susi_desktop.

    Sending Data between components of SUSI MagicMirror Module

    SUSI MagicMirror module is a module to add SUSI assistant right on your MagicMirror. The software for MagicMirror constitutes of an Electron app to which modules can be added easily. Since there are many modules, there might be functionalities that need interaction between various modules by transfer of information. MagicMirror also provides a node_helper script that facilitates a module to perform some background tasks. Therefore, a mechanism to transfer information from node_helper to various components of module is also needed.

    MagicMirror provides an inbuilt module notification system that can be used to send notification across the modules and a socket notification system to send information between node_helper and various components of the system.

    Our codebase for SUSI MagicMirror is divided mainly into two parts. A Main module that handles all the process of hotword detection, speech recognition, calling SUSI API and saving audio after Text to Speech and a Renderer module which performs the task of managing the display of content on the Mirror Screen and playing back the file obtained by Speech Synthesis. Plainly put, Main module mainly handles the backend logic of the application and the Renderer handles the frontend. Main and Renderer module work on different layers of the application and to facilitate communication between them, we need to make a mechanism. A schematic of flow that is needed to be maintained can be highlighted as:

    As you can see in the above diagram, we need to transfer a lot of information between the components. We display animation and text based on the current state of recognition in the  module, thus we need to transfer this information frequently. This task is accomplished by utilizing the inbuilt socket notification system in the MagicMirror. For every event like when system enters into listening , busy or recognized speech state, we need to pass message to renderer. To achieve this, we made a rendererSend function to send notification to renderer.

    const rendererSend =  (event: NotificationType , payload: any) => {
       this.sendSocketNotification(event, payload);
    }
    

    This function takes an event and a payload as arguments. Event tells which event occurred and payload is any data that we wish to send. This method in turn calls the method provided by MagicMirror module to send socket notifications within the module.

    When certain events occur like when system enters busy state or listening state, we trigger the rendererSend call to send a socket notification to the module. The rendererSend method is supplied in the State Machine Components available to every state. The task of sending notifications can be done using the code snippet as follows:

    // system enters busy state
    this.components.rendererSend("busy", {});
    
    // send speech recognition hypothesis text to renderer
    this.components.rendererSend("recognized", {text: recognizedText});
    
    // send susi api output json to renderer to display interactive results while Speech Output is performed
    this.components.rendererSend("speak", {data: susiResponse});
    

    The socket notification sent via the above method is received in SUSI Module via a callback called socketNotificationReceived . We need to define this callback with implementation while registering module to MagicMirror. So, we register the MMM-SUSI-AI module by adding the definition for socketNotificationReceived method.

    Module.register("MMM-SUSI-AI", {
    //other function definitions
    ***
       // define socketNotificationReceived function
       socketNotificationReceived: function (notification, payload) {
           susiMirror.receivedNotification(notification, payload);
       },
    ***
    });
    

    In this way, we send all the notification received to susiMirror object in the renderer module by calling the receivedNotification method of susiMirror object

    We can now receive all the notifications in the SusiMirror and update UI. To handle notifications, we define receivedNotification method as follows:

    public receivedNotification(type: NotificationType, payload: any): void {
    
       this.visualizer.setMode(type);
       switch (type) {
           case "idle":
                // handle idle state
               break;
           case "listening":
               // handle listening state
               break;
           case "busy":
               // handle busy state
             break;
           case "recognized":
               // handle recognized state. This notification also contains a payload about the hypothesis text           
               break;
           case "speak":
               // handle speaking state. We need to play back audio file and display text on screen for SUSI Output. Notification Payload contains SUSI Response
               break;
       }
    }
    

    In this way, we utilize the Socket Notification System provided by the MagicMirror Electron Application to send data across the components of Magic Mirror module for SUSI AI.

    Resources

    Managing States in SUSI MagicMirror Module

    SUSI MagicMirror Module is a module for MagicMirror project by which you can use SUSI directly on MagicMirror. While developing the module, a problem I faced was that we need to manage the flow between the various stages of processing of voice input by the user and displaying SUSI output to the user. This was solved by making state management flow between various states of SUSI MagicMirror Module namely,

    • Idle State: When SUSI MagicMirror Module is actively listening for a hotword.
    • Listening State: In this state, the user’s speech input from the microphone is recorded to a file.
    • Busy State: The user has finished speaking or timed out. Now, we need to transcribe the audio spoken by the user, send the response to SUSI server and speak out the SUSI response.

    The flow between these states can be explained by the following diagram:

    As clear from the above diagram, transitions are not possible from a state to all other states. Only some transitions are allowed. Thus, we need a mechanism to guarantee only allowed transitions and ensure it triggers on the right time.

    For achieving this, we first implement an abstract class State with common properties of a state. We store the information whether a state can transition into some other state in a map allowedTransitions which maps state names “idle”, “listening” and “busy” to their corresponding states. The transition method to transition from one state to another is implemented in the following way.

    protected transition(state: State): void {
       if (!this.canTransition(state)) {
           console.error(`Invalid transition to state: ${state}`);
           return;
       }
    
       this.onExit();
       state.onEnter();
    }
    
    private canTransition(state: State): boolean {
       return this.allowedStateTransitions.has(state.name);
    }
    

    Here we first check if a transition is valid. Then we exit one state and enter into the supplied state.  We also define a state machine that initializes the default state of the Mirror and define valid transitions for each state. Here is the constructor for state machine.

    constructor(components: IStateMachineComponents) {
            this.idleState = new IdleState(components);
            this.listeningState = new ListeningState(components);
            this.busyState = new BusyState(components);
    
            this.idleState.AllowedStateTransitions = new Map<StateName, State>([["listening", this.listeningState]]);
            this.listeningState.AllowedStateTransitions = new Map<StateName, State>([["busy", this.busyState], ["idle", this.idleState]]);
            this.busyState.AllowedStateTransitions = new Map<StateName, State>([["idle", this.idleState]]);
    
            this.currentState = this.idleState;
            this.currentState.onEnter();
    }
    

    Now, the question arises that how do we detect when we need to transition from one state to another. For that we subscribe on the Snowboy Detector Observable. We are using Snowboy library for Hotword Detection. Snowboy detects whether an audio stream is silent, has some sound or whether hotword was spoken. We bind all this information to an observable using the ReactiveX Observable pattern. This gives us a stream of events to which we can subscribe and get the results. It can be understood in the following code snippet.

    detector.on("silence", () => {
       this.subject.next(DETECTOR.Silence);
    });
    
    detector.on("sound", () => {});
    
    detector.on("error", (error) => {
       console.error(error);
    });
    
    detector.on("hotword", (index, hotword) => {
       this.subject.next(DETECTOR.Hotword);
    });
    
    public get Observable(): Observable<DETECTOR> {
       return this.subject.asObservable();
    }
    

    Now, in the idle state, we subscribe to the values emitted by the observable of the detector to know when a hotword is detected to transition to the listening state. Here is the code snippet for the same.

    this.detectorSubscription = this.components.detector.Observable.subscribe(
       (value) => {
       switch (value) {
           case DETECTOR.Hotword:
               this.transition(this.allowedStateTransitions.get("listening"));
               break;
       }
    });
    

    In the listening state, we subscribe to the states emitted by the detector observable to find when silence is detected so that we can stop recording the audio stream for processing and move to busy state.

    this.detectorSubscription = this.components.detector.Observable.subscribe(
       (value) => {
       switch (value) {
           case DETECTOR.Silence:
               record.stop();
               this.transition(this.allowedStateTransitions.get("busy"));
               break;
       }
    });
    

    The task of speaking the audio and displaying results on the screen is done by a renderer. The communication to renderer is done via a RendererCommunicator object using a notification system. We also bind its events to an observable so that we know when SUSI has finished speaking the result. To transition from busy state to idle state, we subscribe to renderer observable in the following manner.

    this.rendererSubscription = this.components.rendererCommunicator.Observable.subscribe((type) => {
       if (type === "finishedSpeaking") {
           this.transition(this.allowedStateTransitions.get("idle"));
       }
    });
    

    In this way, we transition between various states of MagicMirror Module for SUSI in an efficient manner.

    Resources

    Hotword Detection on SUSI MagicMirror with Snowboy

    Magic Mirror in the story “Snow White and the Seven Dwarfs” had one cool feature. The Queen in the story could call Mirror just by saying “Mirror” and then ask it questions. MagicMirror project helps you develop a Mirror quite close to the one in the fable but how cool it would be to have the same feature? Hotword Detection on SUSI MagicMirror Module helps us achieve that.

    The hotword detection on SUSI MagicMirror Module was accomplished with the help of Snowboy Hotword Detection Library. Snowboy is a cross platform hotword detection library. We are using the same library for Android, iOS as well as in MagicMirror Module (nodejs).

    Snowboy can be added to a Javascript/Typescript project with Node Package Manager (npm) by:

    $ npm install --save snowboy
    

    For detecting hotword, we need to record audio continuously from the Microphone. To accomplish the task of recording, we have another npm package node-record-lpcm16. It used SoX binary to record audio. First we need to install SoX using

    Linux (Debian based distributions)

    $ sudo apt-get install sox libsox-fmt-all
    

    Then, you can install node-record-lpcm16 package using npm using

    $ npm install node-record-lpcm16
    

    Then, we need to import it in the needed file using

    import * as record from "node-record-lpcm16";
    

    You may then create a new microphone stream using,

    const mic = record.start({
       threshold: 0,
       sampleRate: 16000,
       verbose: true,
    });
    

    The mic constant here is a NodeJS Readable Stream. So, we can read the incoming data from the Microphone and process it.

    We can now process this stream using Detector class of Snowboy. We declare a child class extending Snowboy Hotword Decoder to suit our needs.

    import { Detector, Models } from "snowboy";
    
    export class HotwordDetector extends Detector {
      
      1 constructor(models: Models) {
           super({
               resource: `${process.env.CWD}/resources/common.res`,
               models: models,
               audioGain: 2.0,
           });
           this.setUp();
       }
    
       // other methods
    }
    

    First, we create a Snowboy Detector by calling the parent constructor with resource file as common.res and a Snowboy model as argument. Snowboy model is a file which tells the detector which Hotword to listen for. Currently, the module supports hotword Susi but it can be extended to support other hotwords like Mirror too. You can train the hotword for SUSI for your voice and get the latest model file at https://snowboy.kitt.ai/hotword/7915 . You may then replace the susi.pmdl file in resources folder with our own susi.pmdl file for a better experience.

    Now, we need to delegate the callback methods of Detector class to know about the current state of detector and take an action on its basis. This is done in the setUp() method.

    private setUp(): void {
       this.on("silence", () => {
          // handle silent state
       });
    
       this.on("sound", () => {
          // handle sound detected state
       });
    
       this.on("error", (error) => {
          // handle error
       });
    
       this.on("hotword", (index, hotword) => {
          // hotword detected 
       });
    }
    

    If you go into the implementation of Detector class of Snowboy, it extends from NodeJS.WritableStream. So, we can pipe our microphone input read stream to Detector class and it handles all the states. This can be done using

    mic.pipe(detector as any);
    

    So, now all the input from Microphone will be processed by Snowboy detector class and we can know when the user has spoken the word “SUSI”. We can start speech recognition and do other changes in User Interface based on the different states.

    After this, we can simply say “Susi” followed by our query to ask SUSI on the MagicMirror. A video implementation of the same can be seen here: 

    Resources: