Giving a Voice to Susi

Susi AI already has various apps and is available as a chatbot in various messaging platforms. We are going a step forward to make an SDK available for Susi that can be integrated on any Hardware Device (say speakers, toys, your bicycle etc – possibilities are endless )

One of the problem that I encountered while making a Prototype for the same is selecting an appropriate Text to Speech (TTS)  Engine.

It was a challenge, since on platforms like Android and iOS , you may utilize TTS engines bundled with Platform easily via a platform specific API, which are well optimized and give good performance.  The same was difficult on a hardware device that can run only Linux with no TTS provided by default.

Thus, I explored some possibilities

eSpeak TTS: eSpeak TTS (http://espeak.sourceforge.net/) was the first option considered for the task.
eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.

The major advantage of eSpeak is its small size (2MB) and small memory footprint which is advantageous in Low Memory Hardware like Orange Pi Zero or Raspberry Pi Zero.


Setting up eSpeak was easy but with its advantages , there were some drawbacks too.

  • The voice synthesis was quite robotic.
  • Very few voices were available.

Festival TTS: Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface.


Festival is free software. Festival and the speech tools are distributed under an X11-type licence allowing unrestricted commercial and non-commercial use alike.

Installing Festival:

On Arch Linux , it was pretty straight forward.

sudo pacman -S festival

There is a full wiki dedicated to it. ( https://wiki.archlinux.org/index.php/Festival )

Testing Festival

Festival has an interpreter to test it out. It can be invoked using
You may test out a TTS output using:

festival> (SayText "Hi!! I am Susi")

But the default sound in festival is still robotic and male. You don’t want your Personal Assistant to scare you out when you speak to her.

Thus, I searched on what are the best female voices available for Festival.

After looking at a discussion on the thread, https://ubuntuforums.org/showthread.php?t=751169 ,

I found that CMU-Arctic and HTS are some of the best voice sets for  Festival.

In Arch Linux, additional voice packs, are supplied in two additional packages,

festival-us and festival-english

Installation is straightforward:

sudo pacman -S festival-us festival-english

Now, on festival REPL , we can test out our new voices.

To see all available voices

festival> (voice.list)
(rab_diphone kal_diphone cmu_us_rms_cg cmu_us_awb_cg cmu_us_slt_cg)

Testing out a voice

festival> (voice_cmu_us_awb_cg)
cmu_us_awb_cg
festival> (SayText "Hi!! I am Susi")

This way after testing out all voices, with many different phrases. cmu_us_slt_cg  felt like an appropriate voice.

Setting Voice as Default

Voice may be set as default by adding following line to .festivalrc

 (set! voice_default voice_cmu_us_slt_cg)

Now festival will be invoked with this voice as default.

You may read a file using festival using

$ festival --tts <filename>

Calling Festival from Python

This was accomplished by first writing to a file, and then calling a subprocess to output speech using festival tts.

def speak(text):
    filename = '.response'
    file=open(filename,'w')
    file.write(text)
    file.close()
    # Call festival tts to reply the response by Susi
    subprocess.call('festival --tts '+filename, shell=True)

 

So, festival works pretty well on a moderately powerful machine, but while trying it on Raspberry Pi with a custom voice, there was noticeable amount of time delay for synthesis, so I searched for more alternatives.

Flite TTS: CMU Flite (festival-lite) is a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
Flite gives a noticeable speech improvement. It gives a Festival like output for 1/10th the amount taken by Festival.

Installing Flite TTS

On Raspberry Pi (Raspbian) , the version in Raspbian Jessie Repository is Flite 1.4 which does not support using an external speech file, so we need to compile and install from sources.

$ wget http://www.festvox.org/flite/packed/latest/flite-2.0.0-release.tar.bz2
$ tar xf flite-2.0.0-release.tar.bz2
$ cd flite-2.0.0-release/
$ ./configure
$ make
$ sudo make install

Now, you may download flitevox file for voice you wish to use from

http://www.festvox.org/flite/packed/latest/voices/

It can be invoked using

$ flite -voice file://<flitevox_file_path> -f <filepath-to-read>

You may save output audio stream to a file using

$ flite -voice file://<flitevox_file_path> -f <filepath-to-read> -o output.wav

Now, you can playback the audio. This can be invoked from python using.

import os

def speak_flite_tts(text):
    filename = '.response'
    file = open(filename, 'w')
    file.write(text)
    file.close()
    # Call flite tts to reply the response by Susi
    flite_speech_file = 'cmu_us_slt.flitevox'
    print('flite -voice file://{0} -f {1}'.format(flite_speech_file, filename))
    os.system('flite -v -voice file://{0} -f {1} -o output.wav'.format(flite_speech_file, filename))
    os.system('aplay output.wav')

 

In this way, we added offline Text to Speech Support of Susi Hardware SDK.

Published by

betterclever

GSoC Student Developer at FOSSASIA