Implementing Version Control System for SUSI Skill CMS

SUSI Skill CMS now has a version control system where users can browse through all the previous revisions of a skill and roll back to a selected version. Users can modify existing skills and push the changes. So a skill could have been edited many times by the same or different users and so have many revisions. The version control functionalities help users to :

  • Browse through all the revisions of a selected skill
  • View the content of a selected revision
  • Compare any two selected revisions highlighting the changes
  • Option to edit and rollback to a selected revision.

Let us visit SUSI Skill CMS and try it out.

  1. Select a skill
  2. Click on versions button
  3. A table populated with previous revisions is displayed

  1. Clicking on a single revision opens the content of that version
  2. Selecting 2 versions and clicking on compare selected versions loads the content of the 2 selected revisions and shows the differences between the two.
  3. Clicking on Undo loads the selected revision and the latest version of that skill, highlighting the differences and also an editor loaded with the code of the selected revision to make changes and save to roll back.

How was this implemented?

Firstly, to get the previous revisions of a selected skill, we need the skills meta data including model, group, language and skill name which is used to make an ajax call to the server using the endpoint :

We create a new component SkillVersion and pass the skill meta data in the pathname while accessing that component. The path where SkillVersion component is loaded is /:category/:skill/versions/:lang . We parse this data from the path and set our state with skill meta data. In componentDidMount we use this data to make the ajax call to the server to get all previous version data and update our state. A sample response from getSkillHistory endpoint looks like :

  "session": {
    "identity": {
      "type": "",
      "name": "",
  "commits": [
      "commitRev": "",
      "author_mail": "AUTHOR_MAIL_ID",
      "author": "AUTOR_NAME",
      "commitID": "COMMIT_ID",
      "commit_message": "COMMIT_MESSAGE",
     "commitName": "COMMIT_NAME",
     "commitDate": "COMMIT_DATE"
  "accepted": TRUE/FALSE

We now populate the table with the obtained revision history. We used Material UI Table for tabulating the data. The first 2 columns of the table have radio buttons to select any 2 revisions. The left side radio buttons are for selecting the older versions and the right side radio buttons to select the more recent versions. We keep track of the selected versions through onCheck function of the radio buttons and updating state accordingly.

if(side === 'right'){
  if(!(index >= currLeft)){
    rightChecks[index] = true;
    currRight = index;
else if(side === 'left'){
  if(!(index <= currRight)){
    leftChecks[index] = true;
    currLeft = index;
  currLeftChecked: currLeft,
  currRightChecked: currRight,
  leftChecks: leftChecks,
  rightChecks: rightChecks,

Once 2 versions are selected and we click on compare selected versions button, we get the currently selected versions stored from getCheckedCommits function and we are redirected to /:category/:skill/compare/:lang/:oldid/:recentid where we pass the selected 2 revisions commitIDs in the URL.

{(this.state.commitsChecked.length === 2) &&
<Link to={{
  pathname: '/'+this.state.skillMeta.groupValue+
    label='Compare Selected Versions'

SkillHistory Component is now loaded and the 2 selected revisions commitIDs are parsed from the URL pathname. Once we have the commitIDs we make ajax calls to the server to get the code for that particular commit. The skill meta data is also parsed from the URL path which is required to make the server call to getFileAtCommitID.

We make the ajax calls in componentDidMount and update the state with the received data. A sample response from getFileAtCommitID looks like :

  "accepted": TRUE/FALSE,
  "file": "CONTENT",
  "session": {
    "identity": {
       "type": "",
       "name": "",

We populate the code of each revision in an editor. We used react-ace as our editor component where we use the value prop to populate the content and display it in read-only mode.

  height= '400px'
  editorProps={{$blockScrolling: true}}

We then show the differences between the 2 selected versions content. To compare and highlight the differences, we used react-diff package which takes in the content of both the commits as inputA and inputB props and we compare character by character using the type chars prop. Here input A is compared with input B. The component compares and returns the highlighted element which we display in a scrollable div preventing overflows.

{/* latest code should be inputB */}

Clicking on Undo then redirects to /:category/:skill/edit/:lang/:latestid/:revertid where latest id is the commitID of the latest revision and revert id is the commitID of the oldest commit ID selected amongst the 2 commits selected initially. This redirects to SkillRollBack component where we again parse the skill meta data and the commit IDs from the URL pathname and call getFileAtCommitID to get the content for the latest and the reverting commit and again populate the content in editor using react-ace and also show the differences using react-diff and finally load the modify skill component where an editor is preloaded with the content of the reverting commit and a similar interface like modify skill is shown where user can edit the content of the reverting commit and push the changes.

let baseUrl = this.getSkillAtCommitIDUrl() ;
let self = this;
var url1 = baseUrl + self.state.latestCommit;
  url: url1,
  jsonpCallback: 'pc',
  dataType: 'jsonp',
  jsonp: 'callback',
  crossDomain: true,
  success: function (data1) {
    var url2 = baseUrl + self.state.revertingCommit;
      url: url2,
      jsonpCallback: 'pd',
      dataType: 'jsonp',
      jsonp: 'callback',
      crossDomain: true,
      success: function (data2) {

Here, we make nested ajax calls to maintain synchronization and update state after we receive data from both the calls else if we make ajax calls in a loop, then the second ajax call doesn’t wait for the first one to finish and is most likely to fail.

This is how the skill version system was implemented in SUSI Skill CMS. You can find the complete code at SUSI Skill CMS Repository. Feel free to contribute.


Fetching Images for RSS Responses in SUSI Web Chat

Initially, SUSI Web Chat rendered RSS action type responses like this:

The response from the server initially only contained

  • Title
  • Description
  • Link

We needed to improvise the web search & RSS results display and also add images for the results.

The web search & RSS results are now rendered as :

How was this implemented?

SUSI AI uses Yacy to fetchRSSs feeds. Firstly the server using the console process to return the RSS feeds from Yacy needs to be configured to return images too.


In a console process, we provide the URL needed to fetch data from, the query parameter needed to be passed to the URL and the path to look for the answer in the API response.

  • url = <url>   – the URL to the remote JSON service which will be used to retrieve information. It must contain a $query$ string.
  • test = <parameter> – the parameter that will replace the $query$ string inside the given URL. It is required to test the service.

Here the URL used is :

To include images in RSS action responses, we need to parse the images also from the Yacy response. For this, we need to add `image` in the selection rule while calling the console process

    "expression":"SELECT title,description,link FROM yacy WHERE query='$1$';"

Now the response from the server for RSS action type will also include `image` along with title, description, and link. An example response for the query `Google` :

  "title": "Terms of Service | Google Analytics \u2013 Google",
  "description": "Read Google Analytics terms of service.",
  "link": "",
  "image":   "",

However, the results at times, do not contain images because there are none stored in the index. This may happen if the result comes from p2p transmission within Yacy where no images are transmitted. So in cases where images are not returned by the server, we use the link preview service to preview the link and fetch the image.

The endpoint for previewing the link is :


On the client side, we first search the response for data objects with images in API actions. And the amongst the remaining data objects in answers[0].data, we preview the link to fetch image keeping a check on the count. This needs to be performed for processing the history cognitions too.To preview the remaining links in a loop, we cannot make ajax calls directly in a loop. To handle this, nested ajax calls are made using the function previewURLForImage() where we loop through the remaining links and on the success we decrement the count and call previewURLForImage() on the next link and on error we try previewURLForImage() on the next link without decrementing the count.

success: function (rssResponse) {
    respData.image = rssResponse.image;
    respData.descriptionShort = rssResponse.descriptionShort;
  if(receivedMessage.rssResults.length === count ||
    j === remainingDataIndices.length - 1){
    let message = ChatMessageUtils.getSUSIMessageData(receivedMessage, currentThreadID);
      type: ActionTypes.CREATE_SUSI_MESSAGE,

And we store the results as rssResults which are used in MessageListItems to fetch the data and render. The nested calling of previewURLForImage() ends when we have the required count of results or we have finished trying all links for previewing images. We then dispatch the message to the message store. We now improvise the UI. I used Material UI Cards to display the results and for the carousel like display, react-slick.

<Card className={cardClass} key={i} onClick={() => {,'_blank')
  {tile.image &&
        <img src={tile.image} alt="" className='card-img'/>
  <CardTitle title={tile.title} titleStyle={titleStyle}/>
    <div className='card-text'>{cardText}</div>
    <div className='card-url'>{urlDomain(}</div>

We used the full width of the message section to display the results by not wrapping the result in message-list-item class. The entire card is hyperlinked to the link. Along with title and description, the URL info is also shown at the bottom right. To get the domain name from the link, urlDomain() function is used which makes use of the HTML anchor tag to get the domain info.

function urlDomain(data) {
  var a = document.createElement('a');
  a.href = data;
  return a.hostname;

To prevent stretching of images we use `object-fit: contain;` to make the images fit the image container and align it to the middle.

We finally have our RSS results with images and an improvised UI. The complete code can be found at SUSI WebChat Repo. Feel free to contribute


Implementing Text To Speech Settings in SUSI WebChat

SUSI Web Chat has Text to Speech (TTS) Feature where it gives voice replies for user queries. The Text to Speech functionality was added using Speech Synthesis Feature of the Web Speech API. The Text to Speech Settings were added to customise the speech output by controlling features like :

  1. Language
  2. Rate
  3. Pitch

Let us visit SUSI Web Chat and try it out.

First, ensure that the settings have SpeechOutput or SpeechOutputAlways enabled. Then click on the Mic button and ask a query. SUSI responds to your query with a voice reply.

To control the Speech Output, visit Text To Speech Settings in the /settings route.

First, let us look at the language settings. The drop down list for Language is populated when the app is initialised. speechSynthesis.onvoiceschanged function is triggered when the app loads initially. There we call speechSynthesis.getVoices() to get the voice list of all the languages currently supported by that particular browser. We store this in MessageStore using ActionTypes.INIT_TTS_VOICES action type.

window.speechSynthesis.onvoiceschanged = function () {
  if (!MessageStore.getTTSInitStatus()) {
    var speechSynthesisVoices = speechSynthesis.getVoices();

We also get the translated text for every language present in the voice list for the text – `This is an example of speech synthesis` using google translate API. This is called initially for all the languages and is stored as translatedText attribute in the voice list for each element. This is later used when the user wants to listen to an example of speech output for a selected language, rate and pitch.

When the user visits the Text To Speech Settings, then the voice list stored in the MessageStore is retrieved and the drop down menu for Language is populated. The default language is fetched from UserPreferencesStore and the default language is accordingly highlighted in the dropdown. The list is parsed and populated as a drop down using populateVoiceList() function.

let voiceMenu =,index) => {
  if(voice.translatedText === null){
    voice.translatedText = this.speechSynthesisExample;
    <MenuItem value={voice.lang}
              primaryText={' ('+voice.lang+')'} />

The language selected using this dropdown is only used as the language for the speech output when the server doesn’t specify the language in its response and the browser language is undefined. We then create sliders using Material UI for adjusting speech rate and pitch.

<h4 style={{'marginBottom':'0px'}}><Translate text="Speech Rate"/></h4>
  onChange={this.handleRate} />

The range for the sliders is :

  • Rate : 0.5 – 2
  • Pitch : 0 – 2

The default value for both rate and pitch is 1. We create a controlled slider saving the values in state and using onChange function to record change in values. The Reset buttons can be used to reset the rate and pitch values respectively to their default values. Once the language, rate and pitch values have been selected we can click on `Play a short demonstration of speech synthesis`  to listen to a voice reply with the chosen settings.

{ this.state.playExample &&

We use the VoicePlayer by passing the required props to get the speech output. onStart and onEnd functions are triggered at the beginning and ending of the speech synthesis and are used to control the state from the parent component. Chosen language, rate, pitch and translated text are passed as props to VoicePlayer which creates a new SpeechSynthesisUtterance() with the passed props and plays the speech output.

On saving these settings and then using the Mic button to get voice replies we see that the voice output is controlled according to the selected settings.

Finally, we have to store the selected settings on the server and ensure that these are pulled when the app is initialized. The format in which these settings are stored in the server is :

Speech Rate

- Used to control rate of speech output.
- SETTING_NAME :  `speechRate`
- SETTING_VALUE : `0.5 - 2`
Speech Pitch

- Used to control pitch of speech output.
- SETTING_NAME :  `speechPitch`
- SETTING_VALUE : `0 - 2`
TTS Language

- Used to set the language for Text-To-Speech used when the response from server doesnt specify language and the browser language is also undefined.
- SETTING_NAME :  `ttsLanguage`
- SETTING_VALUE : `Language Code (string)`

This is how the Text To Speech Settings were implemented in SUSI Web Chat. The complete code can be found at SUSI Web Chat Repository.

PS: To test whether your browser supports Text To Speech, open your browser console and try the following :

  • var msg = new SpeechSynthesisUtterance(‘Hello World’);
  • window.speechSynthesis.speak(msg)

If you get a speech output then the Web API Speech Synthesis is supported by your browser and Text To Speech features of SUSI Web Chat will work. The Web Speech API has support for all latest Chrome browsers as mentioned in the Web Speech API Mozilla docs.However there are few bugs with some Chromium versions please check out more on how to fix them locally here in this link.




Generating Map Action Responses in SUSI AI

SUSI AI responds to location related user queries with a Map action response. The different types of responses are referred to as actions which tell the client how to render the answer. One such action type is the Map action type. The map action contains latitude, longitude and zoom values telling the client to correspondingly render a map with the given location.

Let us visit SUSI Web Chat and try it out.

Query: Where is London

Response: (API Response)

The API Response actions contain text describing the specified location, an anchor with text ‘Here is a map` linked to openstreetmaps and a map with the location coordinates.

Let us look at how this is implemented on server.

For location related queries, the key where is used as an identifier. Once the query is matched with this key, a regular expression `where is (?:(?:a )*)(.*)` is used to parse the location name.

"keys"   : ["where"],
"phrases": [
  {"type":"regex", "expression":"where is (?:(?:a )*)(.*)"},

The parsed location name is stored in $1$ and is used to make API calls to fetch information about the place and its location. Console process is used to fetch required data from an API.

"process": [
    "expression":"SELECT location[0] AS lon, location[1] AS lat FROM locations WHERE query='$1$';"},
    "expression":"SELECT object AS locationInfo FROM location-info WHERE query='$1$';"}

Here, we need to make two API calls :

  • For getting information about the place
  • For getting the location coordinates

First let us look at how a Console Process works. In a console process we provide the URL needed to fetch data from, the query parameter needed to be passed to the URL and the path to look for the answer in the API response.

  • url = <url>   – the url to the remote json service which will be used to retrieve information. It must contain a $query$ string.
  • test = <parameter> – the parameter that will replace the $query$ string inside the given url. It is required to test the service.

For getting the information about the place, we used Wikipedia API. We name this console process as location-info and added the required attributes to run it and fetch data from the API.

"location-info": {
  "license":"Copyright by Wikipedia,"

The attributes used are :

  • url : The Media WIKI API endpoint
  • test : The Location name which will be appended to the url before making the API call.
  • parser : Specifies the response type for parsing the answer
  • path : Points to the location in the response where the required answer is present

The API endpoint called is of the following format :

For the query where is london, the API call made returns

  ["London  is the capital and most populous city of England and the United Kingdom."],

The path $.[2] points to the third element of the array i.e “London  is the capital and most populous city of England and the United Kingdom.” which is stored in $locationInfo$.

Similarly to get the location coordinates, another API call is made to loklak API.

"locations": {
  "license":"Copyright by GeoNames"

The location coordinates are found in $.data.location in the API response. The location coordinates are stored as latitude and longitude in $lat$ and $lon$ respectively.

Finally we have description about the location and its coordinates, so we create the actions to be put in the server response.

The first action is of type answer and the text to be displayed is given by $locationInfo$ where the data from wikipedia API response is stored.


The second action is of type anchor. The text to be displayed is `Here is a map` and it must be hyperlinked to openstreetmaps with the obtained $lat$ and $lon$.

  "text":"Here is a map"

The last action is of type map which is populated for latitude and longitude using $lat$ and $lon$ respectively and the zoom value is specified to be 13.


Final output from the server will now contain the three actions with the required data obtained from the respective API calls made. For the sample query `where is london` , the actions will look like :

"actions": [
    "type": "answer",
    "language": "en",
    "expression": "London  is the capital and most populous city of England and the United Kingdom."
    "type": "anchor",
    "link":   "",
    "text": "Here is a map",
    "language": "en"
    "type": "map",
    "latitude": "51.51279067225417",
    "longitude": "-0.09184009399817228",
    "zoom": "13",
    "language": "en"

This is how the map action responses are generated for location related queries. The complete code can be found at SUSI AI Server Repository.


Implementing the Feedback Functionality in SUSI Web Chat

SUSI AI now has a feedback feature where it collects user’s feedback for every response to learn and improve itself. The first step towards guided learning is building a dataset through a feedback mechanism which can be used to learn from and improvise the skill selection mechanism responsible for answering the user queries.

The flow behind the feedback mechanism is :

  1. For every SUSI response show thumbs up and thumbs down buttons.
  2. For the older messages, the feedback thumbs are disabled and only display the feedback already given. The user cannot change the feedback already given.
  3. For the latest SUSI response the user can change his feedback by clicking on thumbs up if he likes the response, else on thumbs down, until he gives a new query.
  4. When the new query is given by the user, the feedback recorded for the previous response is sent to the server.

Let’s visit SUSI Web Chat and try this out.

We can find the feedback thumbs for the response messages. The user cannot change the feedback he has already given for previous messages. For the latest message the user can toggle feedback until he sends the next query.

How is this implemented?

We first design the UI for feedback thumbs using Material UI SVG Icons. We need a separate component for the feedback UI because we need to store the state of feedback as positive or negative because we are allowing the user to change his feedback for the latest response until a new query is sent. And whenever the user clicks on a thumb, we update the state of the component as positive or negative accordingly.

import ThumbUp from 'material-ui/svg-icons/action/thumb-up';
import ThumbDown from 'material-ui/svg-icons/action/thumb-down';

feedbackButtons = (
  <span className='feedback' style={feedbackStyle}>

The next step is to store the response in Message Store using saveFeedback Action. This will help us later to send the feedback to the server by querying it from the Message Store. The Action calls the Dispatcher with FEEDBACK_RECEIVED ActionType which is collected in the MessageStore and the feedback is updated in the Message Store.

let feedback = this.state.skill;

if(!(Object.keys(feedback).length === 0 &&    
feedback.constructor === Object)){
  feedback.rating = rating; = rating;

case ActionTypes.FEEDBACK_RECEIVED: {
  _feedback =;

The final step is to send the feedback to the server. The server endpoint to store feedback for a skill requires other parameters apart from feedback to identify the skill. The server response contains an attribute `skills` which gives the path of the skill used to answer that query. From that path we need to parse :

  • Model : Highest level of abstraction for categorising skills
  • Group : Different groups under a model
  • Language : Language of the skill
  • Skill : Name of the skill

For example, for the query `what is the capital of germany` , the skills object is

"skills": ["/susi_skill_data/models/general/smalltalk/en/English-Standalone-aiml2susi.txt"]

So, for this skill,

    • Model : general
    • Group : smalltalk
    • Language : en
    • Skill : English-Standalone-aiml2susi

The server endpoint to store feedback for a particular skill is :


Where Model, Group, Language and Skill are parsed from the skill attribute of server response as discussed above and the Rating is either positive or negative and is collected from the user when he clicks on feedback thumbs.

When a new query is sent, the sendFeedback Action is triggered with the required attributes to make the server call to store feedback on server. The client then makes an Ajax call to the rateSkill endpoint to send the feedback to the server.

let url = BASE_URL+'/cms/rateSkill.json?'+

  url: url,
  dataType: 'jsonp',
  crossDomain: true,
  timeout: 3000,
  async: false,
  success: function (response) {
  error: function(errorThrown){

This is how the feedback feedback mechanism works in SUSI Web Chat. The entire code can be found at SUSI Web Chat Repository.



Adding a Scroll To Bottom button in SUSI WebChat

SUSI Web Chat now has a scroll-to-bottom button which helps the users to scroll the app automatically to the bottom of the scroll area on button click. When the chat history is lengthy and the user has to scroll down manually it results in a bad UX. So the basic requirements of this scroll-to-bottom button are:

  1. The button must only be displayed when the user has scrolled up the message section
  2. On clicking the scroll-to-bottom button, the scroll area must be automatically scrolled to bottom.

Let’s visit SUSI Web Chat and try this out.

The button is not visible until there are enough messages to enable scrolling and the user has scrolled up. On clicking the button, the app automatically scrolls to the bottom pointing to the most recent message.

How was this implemented?

We first design our scroll-to-bottom button using Material UI  Floating Action Button and SVG Icons.

import FloatingActionButton from 'material-ui/FloatingActionButton';
import NavigateDown from 'material-ui/svg-icons/navigation/expand-more';

The button needs to be styled to be displayed at a fixed position on the bottom right corner of the message section. Positioning it on top of MessageSection above the MessageComposer, the button is also aligned with respect to the edges.

const scrollBottomStyle = {
  button : {
    float: 'right',
    marginRight: '5px',
    marginBottom: '10px',
  backgroundColor: '#fcfcfc',
  icon : {
    fill: UserPreferencesStore.getTheme()==='light' ? '#90a4ae' : '#7eaaaf'

The button must only be displayed when the user has scrolled up. To implement this we need a state variable showScrollBottom which must be set to true or false accordingly based on the scroll offset.

{this.state.showScrollBottom &&
  <div className='scrollBottom'>
    <FloatingActionButton mini={true}
      <NavigateDown />

Now we have to set our state variable showScrollBottom corresponding to the scroll offset. It must be set to true is the user has scrolled up and false if the scrollbar is already at the bottom. To implement this we need to listen to the scrolling events. We used react-custom-scrollbars for the scroll area wrapping the message section. We can listen to the scrolling events using the onScroll props. We also need to tag the scroll area using refs to access the scroll area instead of using findDOMNode as it is being deprecated.

import { Scrollbars } from 'react-custom-scrollbars';

  ref={(ref) => { this.scrollarea = ref; }}

Now, whenever a scroll action is performed, the onScroll() function is triggered. We now have to know if the scroll bar is at the bottom or not. We make use of the scroll area’s props to get the scroll offsets. The getValues() function returns an object containing different scroll offsets and scroll area dimensions. We are interested in which tells about the scroll-top’s progress from 0 to 1 i.e when the scroll bar is at the top most point is 0 and when its on the bottom most point, is 1. So whenever is 1, showScrollBottom is false else true.

onScroll = () => {
  let scrollarea = this.scrollarea;
    let scrollValues = scrollarea.getValues();
    if( === 1){
        showScrollBottom: false,
    else if(!this.state.showScrollBottom){
        showScrollBottom: true,

Finally, we need to scroll the chat app to the bottom on button click. Whenever showScrollBottom is updated, the state is changed, so componentDidUpdate is triggered which calls the _scrollToBottom() function. But we should change this to avoid scrolling to bottom on showScrollBottom update and the user is intending to scroll here. We use the function forcedScrollToBottom to be triggered on clicking the scroll-to-bottom button, which resets the scrollTop value to the height of the scroll area, thus pointing the scrollbar to the bottom.

forcedScrollToBottom = () => {
  let ul = this.scrollarea;
  if (ul) {

We don’t have to worry about resetting showScrollBottom on forced scroll to bottom as the scrolling will trigger the onScroll function where the showScrollBottom state is handled accordingly.

This is how the scroll to bottom button has been implemented in SUSI Web Chat. The entire code can be found at SUSI Web Chat Repository.



Adding IBM Watson TTS Support in Susi Assistant on Raspberry Pi

Susi Hardware project aims at creating a smart assistant for your home that you can run on your Raspberry Pi or similar Development Boards.
I previously wrote a blog on choosing a perfect Text to Speech engine for Susi AI and had used Flite as the solution for it. While Flite is an Open Source solution that can run locally on a client, it does not provide the same quality of voice and speed as cloud providers. We always crave for a more natural voice for better interaction with our assistant. It is always good to have more options. We, therefore, added IBM Watson Text to Speech API in SUSI Hardware project.

IBM Watson TTS can be added to a Python Project easily using the IBM Watson Developer SDK.

For using the IBM Watson Developer SDK for Text to Speech, first of all, we need to sign up for Bluemix

After that, we will get the empty dashboard without any service added currently. We need to create a Text to Speech Service. To do so, click on Create Watson Service button


Select Watson on the left pane and then select Text to Speech service from the list.

Select the standard plan from the options and then click on create button.

You will get service credentials for your newly created text to speech service. Save it for future reference.

After that, we need to add Watson developer cloud python package.

sudo pip3 install watson-developer-cloud

On Ubuntu with Python 3.5 watson-developer-cloud has some extra dependencies. Install them using the following command.

sudo apt install libssl-dev

Now we can add Text to Speech to our project. For that, we need to first import TextToSpeechV1 library. It can be added using following import statement.

from watson_developer_cloud import TextToSpeechV1

Now we need to create a new TextToSpeechV1 object using the Service Credentials we created earlier.

text_to_speech = TextToSpeechV1(

We can now perform synthesis of a text input and write the incoming speech stream from IBM Watson API to a file.

with open('output.wav', 'wb') as audio_file:
       text_to_speech.synthesize(text, accept='audio/wav’, voice='en-US_AllisonVoice'))

In the above code snippet,  we are opening an output file ‘output.wav’ for writing. We then write the binary audio data returned by text_to_speech.synthesize method. IBM Watson provides many free voices. We supply an argument specifying which voice we need to use. We are using English female ‘en-US_AllisonVoice’. You may test out more voices in the online demo here and select the voice that you find best.

We can play the ‘output.wav’ file using the play command from SoX. To do so, we need to install SoX binary.

sudo apt install sox libsox-fmt-all

We can play the file easily now using the following code.

import os
os.system('play output.wav')

The above code invokes the ‘play’ command from the SoX package to play the audio file. We can also use PyAudio to play the audio file but it would require us to manage the audio thread separately. Thus, SoX is a better solution.


Setup SUSI Assistant on Raspberry Pi in under 30 minutes

With our ever growing list of list of platforms supported by Susi AI, we now have a client that can run on Raspberry Pi and you can access it hands-free!! Here is a video that you can refer for its working.

But it might have left you wondering how you can replicate such a setup yourself? It is fairly easy and will be done fairly easy. Just follow the following instructions.

You need to have following hardware in order to have your own SUSI Assistant running on Raspberry Pi.

  • A Raspberry Pi (prefer 2 or 3) with Raspbian Jessie OS.
  • A stable internet connection.  ( Recommended 4 Mbps )
  • A USB Microphone /  USB Webcam with Microphone. You may buy one like this.
  • A Speaker that connects through 3.5mm jack. You may buy one like this.

After you get all the above items in order, you need to get access to a terminal of your Raspberry Pi. You can have that by either connecting a monitor to Raspberry Pi temporarily or by connecting to Raspberry Pi over SSH.

Once this is done, next step is the installation of the dependencies. The installation of the SUSI on Raspberry is automated after dependencies are installed. Run the following command on Raspberry Pi terminal.

sudo apt install git swig3.0 portaudio19-dev pulseaudio libpulse-dev unzip sox libatlas-dev libatlas-base-dev libsox-fmt-all python3

After this, you may check if your output and input devices are working alright. For this, run rec recording.wav . It will start recording audio and saving it to a file named recording.wav. Play back the file using play recording.wav If you hear your audio clearly, setup is done right else you need to configure your Audio Devices correctly.  Most of the time the configuration of Audio works out the box and devices are plug and play so you would not encounter any errors. If you are successful in configuring your devices, install extra dependencies for SUSI Hardware by running the automated install script. In your terminal run,

$ git clone
$ cd susi_hardware
$ ./ 

This will install all the remaining dependencies. After the above step is complete, you may run configuration file generator script to choose the Text to Speech and Speech to Text service according to your wish. For doing so, you need to run

$ python3

Follow the instructions in the script. It will ask you to configure the default service for Text to Speech and Speech to Text and other options. After the configuration is complete, you can simply run the following command to start SUSI.

$ python3

This will start SUSI in a continuously listening mode. You may invoke SUSI anytime, just by saying SUSI followed by a query. The query will be answered by SUSI subsequently.

Since configurations for different hardware devices may vary, you may encounter some problems. In such a scenario, you may refer to the following resources to solve the issues.


Implementing Text-to-Speech (TTS) in SUSI Android

Mobile assistants are designed to perform tasks that the user “commands” through by chat UI or speech. The Android OS already provides Text to speech (TTS) and Speech to text (STT) features. This feature is available from Android version 1.6 onward. In this blog post I will show how tts is implemented in SUSI Android and how I fix the issue ‘delay in speech response’.

TextToSpeech class controls the tts engine. To use TextToSpeech class import it in the activity where you want to use text to speech feature.

import android.speech.tts.TextToSpeech;

After you import TextToSpeech class now we need to initialize TextToSpeech

TextToSpeech tts = new TextToSpeech(this,this);

Here first parameter is the Context and the other one is the listener. The listener is  use  to  inform our app that the engine is ready to use. In order to be notified we have to  implement  TextToSpeech.OnInitListener.

TextToSpeech.OnInitListener listener = new  TextToSpeech.OnInitListener {
public void onInit(int status) {
if (status == TextToSpeech.SUCCESS)
tts.setLanguage(Locale.UK/* set the default language*/);

Hence the engine can be initialized asIf status is success then, it means that TTS is initialized successfully and now we can use it. Otherwise, we can’t use it. setLanguage method is used to set language in which we want reply.

TextToSpeech tts = new TextToSpeech(getApplicationContext,listener)

When you use TTS one thing you have to remember that TTS run  on main thread so sometimes it may cause delays in text to speech conversion or it may block UI for a while. It is better to wrap it like below code.

new Handler().post(new Runnable() {
      public void run() {
         tts = new TextToSpeech(getApplicationContext(), listener);

Now our engine is ready to speak, we need simply pass the string we want to read.

tts.speak(text to read,TextToSpeech.QUEUE_FLUSH, null, null);

But before tts.speak, it is important to check for the audio focus change request. It is important because only one audio source can have focus at a time. You can check it using below code.

private AudioManager.OnAudioFocusChangeListener afChangeListener =
           new AudioManager.OnAudioFocusChangeListener() {
                 public void onAudioFocusChange(int focusChange) {
                                                        //check for focus

OnAudioFocusChangeListener is called when audio focus of the system is changed and according to value of focusChange either we stop TTS or keep using it.

AudioManager audiofocus = (AudioManager)                                    getSystemService(Context.AUDIO_SERVICE);

audiofocus is instance of AudioManager class. We need it to call requestAudioFocus method of AudioManager class. requestAudioFocus method returns the status of request for audio focus change. This method requires three parameter  instance of AudioManager.OnAudioFocusChangeListener, stream type and duration hint. If request is granted only then we can we can use tts.speak .

int result = audiofocus.requestAudioFocus(afChangeListener,AudioManager.STREAM_MUSIC, AudioManager.AUDIOFOCUS_GAIN);

if (result == AudioManager.AUDIOFOCUS_REQUEST_GRANTED) {

tts.speak(text to read,TextToSpeech.QUEUE_FLUSH, null, null);


We were continuously facing issue ‘delay in speech response’ because voiceReply method implementation was wrong. We were initializing TextToSpeech on each call of voiceReply method and since onInit method runs on main thread causing delay in voice response. So I removed it and instead of initializing tts each time I used the tts instance already initialized when activity create.

 String spoken = reply;

textToSpeech.speak(spoken, TextToSpeech.QUEUE_FLUSHnull);

You can also control how the engine read text. Like we can modify pitch and speech rate.




Managing States in SUSI MagicMirror Module

SUSI MagicMirror Module is a module for MagicMirror project by which you can use SUSI directly on MagicMirror. While developing the module, a problem I faced was that we need to manage the flow between the various stages of processing of voice input by the user and displaying SUSI output to the user. This was solved by making state management flow between various states of SUSI MagicMirror Module namely,

  • Idle State: When SUSI MagicMirror Module is actively listening for a hotword.
  • Listening State: In this state, the user’s speech input from the microphone is recorded to a file.
  • Busy State: The user has finished speaking or timed out. Now, we need to transcribe the audio spoken by the user, send the response to SUSI server and speak out the SUSI response.

The flow between these states can be explained by the following diagram:

As clear from the above diagram, transitions are not possible from a state to all other states. Only some transitions are allowed. Thus, we need a mechanism to guarantee only allowed transitions and ensure it triggers on the right time.

For achieving this, we first implement an abstract class State with common properties of a state. We store the information whether a state can transition into some other state in a map allowedTransitions which maps state names “idle”, “listening” and “busy” to their corresponding states. The transition method to transition from one state to another is implemented in the following way.

protected transition(state: State): void {
   if (!this.canTransition(state)) {
       console.error(`Invalid transition to state: ${state}`);


private canTransition(state: State): boolean {
   return this.allowedStateTransitions.has(;

Here we first check if a transition is valid. Then we exit one state and enter into the supplied state.  We also define a state machine that initializes the default state of the Mirror and define valid transitions for each state. Here is the constructor for state machine.

constructor(components: IStateMachineComponents) {
        this.idleState = new IdleState(components);
        this.listeningState = new ListeningState(components);
        this.busyState = new BusyState(components);

        this.idleState.AllowedStateTransitions = new Map<StateName, State>([["listening", this.listeningState]]);
        this.listeningState.AllowedStateTransitions = new Map<StateName, State>([["busy", this.busyState], ["idle", this.idleState]]);
        this.busyState.AllowedStateTransitions = new Map<StateName, State>([["idle", this.idleState]]);

        this.currentState = this.idleState;

Now, the question arises that how do we detect when we need to transition from one state to another. For that we subscribe on the Snowboy Detector Observable. We are using Snowboy library for Hotword Detection. Snowboy detects whether an audio stream is silent, has some sound or whether hotword was spoken. We bind all this information to an observable using the ReactiveX Observable pattern. This gives us a stream of events to which we can subscribe and get the results. It can be understood in the following code snippet.

detector.on("silence", () => {;

detector.on("sound", () => {});

detector.on("error", (error) => {

detector.on("hotword", (index, hotword) => {;
public get Observable(): Observable<DETECTOR> {
   return this.subject.asObservable();

Now, in the idle state, we subscribe to the values emitted by the observable of the detector to know when a hotword is detected to transition to the listening state. Here is the code snippet for the same.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Hotword:

In the listening state, we subscribe to the states emitted by the detector observable to find when silence is detected so that we can stop recording the audio stream for processing and move to busy state.

this.detectorSubscription = this.components.detector.Observable.subscribe(
   (value) => {
   switch (value) {
       case DETECTOR.Silence:

The task of speaking the audio and displaying results on the screen is done by a renderer. The communication to renderer is done via a RendererCommunicator object using a notification system. We also bind its events to an observable so that we know when SUSI has finished speaking the result. To transition from busy state to idle state, we subscribe to renderer observable in the following manner.

this.rendererSubscription = this.components.rendererCommunicator.Observable.subscribe((type) => {
   if (type === "finishedSpeaking") {

In this way, we transition between various states of MagicMirror Module for SUSI in an efficient manner.