Custom UI Implementation for Web Search and RSS actions in SUSI iOS Using Kingfisher for Image Caching

The SUSI Server is an AI powered server which is capable of responding to intelligent answers based on user’s queries. The queries to the susi server are obtained either as a websearch using the application or as an RSS feed. Two of the actions are websearch and RSS. These actions as the name suggests respond to queries based on search results from the web which are rendered in the clients. In order to use use these action types and display them in the SUSI iOS client, we need to first parse the actions looking for these action types and then creating a custom UI for them to display them.

To start with, we need to make send the query to the server to receive an intelligent response from the server. This response is parsed into different action types supported by the server and saved into relevant objects. Here, we check the action types by looping through the answers array containing the actions and based on that, we save the data for that action.

if type == ActionType.rss.rawValue {
   message.actionType = ActionType.rss.rawValue
   message.rssData = RSSAction(data: data, actionObject: action)
} else if type == ActionType.websearch.rawValue {
   message.actionType = ActionType.websearch.rawValue
   message.message = action[Client.ChatKeys.Query] as? String ?? ""
}

Here, we parsed the data response from the server and looked for the rss and websearch action type followed by which we saved the data we received from the server for each of the action types in their own objects.

Next, when a message object is created, we insert it into the dataSource item by appending it and use the `insertItems(at: [IndexPath])` method of collection view to insert them into the views at a particular index. Before adding them, we need to create a Custom UI for them. This UI will consist of a Collection View which is scrollable in the horizontal direction inside a CollectionView Cell. To start with this, we create a new class called `WebsearchCollectionView` which will be a `UIView` consisting of a `UICollectionView`.

We start by adding a collection view into the UIView inside the `init` method by overriding it. Declare a collection view using flow layout and scroll direction set to `horizontal`. Also, hide the scroll indicators and assign the delegate and datasource to `self`.

Now to populate this collection view, we need to specify the number of items that will show up. For this, we make use of the `message` variable declared. We use the `websearchData` in case of websearch action and `rssData` otherwise.

Now to specify the number of cells, we use the below method which returns the number of rss or websearch action objects and defaults to 0 such cells.

func collectionView(_ collectionView: UICollectionView, numberOfItemsInSection section: Int) -> Int {
    if let rssData = message?.rssData {
        return rssData.count
    } else if let webData = message?.websearchData {
        return webData.count
    }
    return 0
}

We display the title, description and image for each object for which we need to create a UI for the cells. Let’s start by creating a Custom Collection View cell with the imageview and 2 labels for title and description.

The imageview is given a contentMode of `aspectFit` and assigned a placeholder image in case the image doesn’t exist. The title and description labels are assigned the same font size, the title being bolder and both are center aligned.

class WebsearchCell: BaseCell {

   var imageView: UIImageView = {
       let iv = UIImageView()
       iv.contentMode = .scaleAspectFit
       iv.image = UIImage(named: "placeholder")
       return iv
   }()

   let titleLabel: UILabel = {
       let label = UILabel()
       label.textColor = .black
       label.font = UIFont.boldSystemFont(ofSize: 14)
       label.textAlignment = .center
       label.numberOfLines = 2
       label.backgroundColor = Color.grey.lighten3
       return label
   }()

   let descriptionLabel: UILabel = {
       let label = UILabel()
       label.font = UIFont.systemFont(ofSize: 14)
       label.textAlignment = .center
       return label
   }()

}

Next, we add constraints for each such view adding a title and description label adding a small margin on both sides of the cell for a cleaner UI.

addSubview(imageView)
addSubview(titleLabel)
addSubview(descriptionLabel)
descriptionLabel.numberOfLines = 5
addConstraintsWithFormat(format: "H:|-4-[v0(\(frame.width * 0.4))]-4-[v1]-4-|", views: imageView, titleLabel)
addConstraintsWithFormat(format: "|-\(frame.width * 0.4 + 8)-[v0]-4-|", views: descriptionLabel)
addConstraintsWithFormat(format: "V:|-4-[v0]-4-|", views: imageView)
addConstraintsWithFormat(format: "V:|-4-[v0(44)]-4-[v1]-4-|", views: titleLabel, descriptionLabel)

Now to use this custom cell, we first need to register it with the collection view and then we can use it easily in the `cellForItemAt` method. Since we are using Kingfisher for image caching, we use the `.kf.setImage` method to download the image from a URL and cache it as soon as it is downloaded.

collectionView.register(WebsearchCell.self, forCellWithReuseIdentifier: cellId)
 func collectionView(_ collectionView: UICollectionView, cellForItemAt indexPath: IndexPath) -> UICollectionViewCell {
       if let cell = collectionView.dequeueReusableCell(withReuseIdentifier: cellId, for: indexPath) as? WebsearchCell {
           cell.backgroundColor = .white

           if message?.actionType == ActionType.rss.rawValue {
               let feed = message?.rssData?.rssFeed[indexPath.item]
               cell.titleLabel.text = feed?.title
               cell.descriptionLabel.text = feed?.desc                cell.imageView.kf.setImage(with: URL(string: feed?.rssData?.image))
           } else if message?.actionType == ActionType.websearch.rawValue {
               let webData = message?.websearchData[indexPath.item]
               cell.titleLabel.text = webData?.title
               cell.descriptionLabel.text = webData?.desc.html2String                cell.imageView.kf.setImage(with: URL(string: feed?.webData?.image))
           }
           return cell
       } else {
           return UICollectionViewCell()
       }
   }

We check the action type and assign data based on that. Also, for the images, if they don’t exist a placeholder is added.

Since we are done with the Custom UI of the cell, we need to add it to the chat cell. For that, we add this `UIView` as a subview. For reasons of reusability, the cell was extracted into a separate one and call the `prepareForReuse()` method for reusability. This is followed by adding a subview and setting constraints for each cell and assigning the message object.

class RSSCell: ChatMessageCell {

   var message: Message? {
       didSet {
           self.addWebsearchView()
       }
   }

   lazy var websearchView: WebsearchCollectionView = {
       let view = WebsearchCollectionView()
       return view
   }()

   override func setupViews() {
       super.setupViews()
       prepareForReuse()
   }

   func addWebsearchView() {
       self.addSubview(websearchView)
       self.addConstraintsWithFormat(format: "H:|[v0]|", views: websearchView)
       self.addConstraintsWithFormat(format: "V:[v0(135)]", views: websearchView)
       websearchView.message = message
   }

}
if message.actionType == ActionType.rss.rawValue || message.actionType == ActionType.websearch.rawValue {
   if let cell = collectionView.dequeueReusableCell(withReuseIdentifier: ControllerConstants.rssCell, for: indexPath) as? RSSCell {
       cell.message = message
       return cell
   } else {
       return UICollectionViewCell()
   }
}

This is all we need to add a custom UI to a chat cell in SUSI iOS, very simple and clean.

Check the screenshot below, the app in action.

Resources:

Hotword Recognition in SUSI iOS

Hot word recognition is a feature by which a specific action can be performed each time a specific word is spoken. There is a service called Snowboy which helps us achieve this for various clients (for ex: iOS, Android, Raspberry pi, etc.). It is basically a DNN based hotword recognition toolkit.

In this blog, we will learn how to integrate the snowboy hotword detection wrapper in the SUSI iOS client. This service can be used in any open source project but for using it commercially, a commercial license needs to be obtained.

Following are the files that need to be added to the project which are provided by the service itself: snowboy-detect.h libsnowboy-detect.a and a trained model file which can be created using their online service: snowboy.kitt.ai. For the sake of this blog, we will be using the hotword “Susi”, the model file can be found here.

The way how snowboy works is that speech is recorded for a few seconds and this data is detected with an already trained model by a specific hotword, now if snowboy returns a 1 means word has been successfully detected else wasn’t.

We start with creation of a wrapper class in Objective-C which can be found wrapper and the bridging header in case this needs to be added to a Swift project. The wrapper contains methods for setting sensitivity, audio gain and running the detection using the buffer. It is a wrapper class built on top of the snowboy-detect.h header file.

Let’s initialize the service and run it. Below are the steps followed to enable hotword recognition and print out whether it successfully detected the hotword or not:

  • Create a ViewController class with extensions
    • AVAudioRecorderDelegate
    • AVAudioPlayerDelegate

since we will be recording speech.

  • Import AVFoundation
  • Create a basic layout containing a label which detects whether hotword detected or not and create corresponding `IBOutlet` in the ViewController and a button to trigger the start and stop of recognition.
  • Create the following variables:
    • let WAKE_WORD = “Susi” // hotword used
    • let RESOURCE = Bundle.main.path(forResource: “common”, ofType: “res”)
    • let MODEL = Bundle.main.path(forResource: “susi”, ofType: “umdl”) //path where the model file is stored
    • var wrapper: SnowboyWrapper! = nil // wrapper instance for running detection
    • var audioRecorder: AVAudioRecorder! // audio recorder instance
    • var audioPlayer: AVAudioPlayer!
    • var soundFileURL: URL! //stores the URL of the temp reording file
    • var timer: Timer! //timer to fire a function after an interval
    • var isStarted = false // variable to check if audio recorder already started
  • In `viewDidLoad` initialize the wrapper and set sensitivity and audio gain. Recognition best happens when sensitivity is set to `0.5` and audio gain is set to `1.0` according to the docs.
override func viewDidLoad() {
    super.viewDidLoad()
    wrapper = SnowboyWrapper(resources: RESOURCE, modelStr: MODEL)
    wrapper.setSensitivity("0.5")
    wrapper.setAudioGain(1.0)
}
  • Create an `IBAction` for the button to start recognition. This action will be used to start or stop the recording in which the action toggles based on the `isStarted` variable. When true, recording is stopped and the timer invalidated else a timer is started which calls the `startRecording` method with an interval of 4 seconds.
@IBAction func onClickBtn(_ sender: Any) {
  if (isStarted) {
    stopRecording()
    timer.invalidate()
    btn.setTitle("Start", for: .normal)
    isStarted = false
  } else {
    timer = Timer.scheduledTimer(timeInterval: 4, target: self, 
    selector: #selector(startRecording), userInfo: nil, repeats: true)
    timer.fire()
    btn.setTitle("Stop", for: .normal)
    isStarted = true
  }
}
  • Next, we add the start and stop recording methods.
    • First, a temp file is created which stores the recorded audio output
    • After which, necessary record configurations are made such as setting the sampling rate.
    • The recording is then started and the output stored in the temp file.
func startRecording() {
  do {
    let fileMgr = FileManager.default
    let dirPaths = fileMgr.urls(for: .documentDirectory, in: .userDomainMask)
    soundFileURL = dirPaths[0].appendingPathComponent("temp.wav")
    let recordSettings = [AVEncoderAudioQualityKey: 
    AVAudioQuality.high.rawValue,
    AVEncoderBitRateKey: 128000,
    AVNumberOfChannelsKey: 1,
    AVSampleRateKey: 16000.0] as [String : Any]
    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(AVAudioSessionCategoryRecord)
    try audioRecorder = AVAudioRecorder(url: soundFileURL,
settings: recordSettings as [String : AnyObject])
    audioRecorder.delegate = self
    audioRecorder.prepareToRecord()
    audioRecorder.record(forDuration: 2.0)
    instructionLabel.text = "Speak wake word: \(WAKE_WORD)"print("Started recording...")
  } catch let error {
    print("Audio session error: \(error.localizedDescription)")
  }
}
  • The stop recording method, stops the audioRecorder instance and updates the instruction label to show the same.
func stopRecording() {
  if (audioRecorder != nil && audioRecorder.isRecording) {
    audioRecorder.stop()
  }
  instructionLabel.text = "Stop"
  print("Stopped recording...")
}

The final recognition is done in the `audioRecorderDidFinishRecording` delegate method which runs the snowboy detection function which processes the audio recording in the temp file by creating a buffer and storing the audio in it and giving the wrapper this buffer as input which processes the buffer and returning a `1` is hotword was successfully detected.

func runSnowboy() {
  let file = try! AVAudioFile(forReading: soundFileURL)
  let format = AVAudioFormat(commonFormat: .pcmFormatFloat32, 
  sampleRate: 16000.0, channels: 1, interleaved: false)
  let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(file.length))
  try! file.read(into: buffer)
  let array = Array(UnsafeBufferPointer(start: 
  buffer.floatChannelData![0], count:Int(buffer.frameLength)))
  // print output
  let result = wrapper.runDetection(array, length: Int32(buffer.frameLength))
  print("Result: \(result)")
}

To test this out, click the start button and speak different words and you will notice that once the Hot Word is spoken, log with `result: 1` is printed out.

The snowboy hotword recognition also offers to train the personalized model with the help of Rest Apis for which the docs can be found here. The complete project implementation can be found here.

Sources:

Using CoreLocation in SUSI iOS

The SUSI Server responds with intelligent answers to the user’s queries. To make these answers better, the server makes use of the user’s location which is sent as a parameter to the query request each time. To implement this feature in the SUSI iOS client, we use the CoreLocation framework provided by Apple which helps us to get the user’s location coordinates and add them as a parameter to each request made.

In order to start with using the CoreLocation framework, we first import it inside the view controller.

import CoreLocation

Now, we create a variable of type CLLocationManager which will help us to use the actual functionality.

// Location Manager
var locationManager = CLLocationManager()

The location manager has some delegate methods which give an option to get the maximum accuracy for a user’s location.  To set that, we need the controller to conform to the CLLocationManagerDelegate, so we create an extension of the view controller conforming to this.

extension MainViewController: CLLocationManagerDelegate {

   // use functionality

}

Next, we set the manager delegate.

locationManager.delegate = self

And create a method to ask for using the user’s location and set the delegate properties.

func configureLocationManager() {
       locationManager.delegate = self
       if CLLocationManager.authorizationStatus() == .notDetermined || CLLocationManager.authorizationStatus() == .denied {
           self.locationManager.requestWhenInUseAuthorization()
       }

       locationManager.distanceFilter = kCLDistanceFilterNone
       locationManager.desiredAccuracy = kCLLocationAccuracyBest
}

Here, we ask for the user location if it was previously denied or is not yet determined and following that, we set the `distanceFilter` as kCLDistanceFilterNone  and `desiredAccuray` as kCLLocationAccuracyBest.. Finally, we are left with starting to update the location which we do by:

locationManager.startUpdatingLocation()

We call this method inside viewDidLoad to start updation of the location when the view first loads. The complete extension looks like below:

extension MainViewController: CLLocationManagerDelegate {

   // Configures Location Manager
   func configureLocationManager() {
       locationManager.delegate = self
       if CLLocationManager.authorizationStatus() == .notDetermined || CLLocationManager.authorizationStatus() == .denied {
           self.locationManager.requestWhenInUseAuthorization()
       }

       locationManager.distanceFilter = kCLDistanceFilterNone
       locationManager.desiredAccuracy = kCLLocationAccuracyBest
       locationManager.startUpdatingLocation()
   }

}

Now, it’s very easy to use the location manager and get the coordinates and add it to the params for each request.

if let location = locationManager.location {
   params[Client.ChatKeys.Latitude] = location.coordinate.latitude as AnyObject
   params[Client.ChatKeys.Longitude] = location.coordinate.longitude as AnyObject
}

Now the params which is a dictionary object is added to each request made so that the user get’s the most accurate results for each query he makes.

References:

Calculation of the Frame Size of the Chat Bubble in SUSI iOS

We receive intelligent responses from the SUSI Server based on our query. Each response contains a different set of actions and the content of the action can be of variable sizes, map, string, table, pie chart, etc. To make the chat bubble size dynamic in the SUSI iOS client, we need to check the action type. For each action, we calculate a different frame size which makes the size of the chat bubble dynamic and hence solving the issue of dynamic size of these bubbles.

In order to calculate the frame size, as mentioned above, we need to check the action type of that message. Let’s start by first making the API call sending the query and getting the action types as a response.

func queryResponse(_ params: [String : AnyObject], _ completion: @escaping(_ messages: List<Message>?, _ success: Bool, _ error: String?) -> Void) {

   let url = getApiUrl(UserDefaults.standard.object(forKey: ControllerConstants.UserDefaultsKeys.ipAddress) as! String, Methods.Chat)

       _ = makeRequest(url, .get, [:], parameters: params, completion: { (results, message) in
           if let _ = message {
               completion(nil, false, ResponseMessages.ServerError)
           } else if let results = results {

               guard let response = results as? [String : AnyObject] else {
                   completion(nil, false, ResponseMessages.InvalidParams)
                   return
               }

               let messages = Message.getAllActions(data: response)
               completion(messages, true, nil)
           }
           return
})
}

Here, we are sending the query in the params dictionary. The `makeRequest` method makes the actual API call and returns a results object and an error object if any which default to `nil`. First, we check if the error variable is `nil` or not and if it is, we parse the complete response by using a helper method created in the Message object called `getAllActions`. This basically takes the response and gives us a list of messages of all action types returned for that query.

In order to display this in the UI, we need to call this method in the View Controller to actually use the result. Here is how we call this method.

var params: [String : AnyObject] = [
 Client.WebsearchKeys.Query: inputTextView.text! as AnyObject,
 Client.ChatKeys.TimeZoneOffset: ControllerConstants.timeZone as AnyObject,
       Client.ChatKeys.Language: Locale.current.languageCode as AnyObject
]

if let location = locationManager.location {
 params[Client.ChatKeys.Latitude] = location.coordinate.latitude as AnyObject
       params[Client.ChatKeys.Longitude] = location.coordinate.longitude as AnyObject
}

if let userData = UserDefaults.standard.dictionary(forKey: ControllerConstants.UserDefaultsKeys.user) as [String : AnyObject]? {
 let user = User(dictionary: userData)
        params[Client.ChatKeys.AccessToken] = user.accessToken as AnyObject
       }

Client.sharedInstance.queryResponse(params) { (messages, success, _) in
 DispatchQueue.main.async {
 if success {
 self.collectionView?.performBatchUpdates({
                            for message in messages! {
                                try! self.realm.write {
                                    self.realm.add(message)
                                    self.messages.append(message)
                                    let indexPath = IndexPath(item: self.messages.count - 1, section: 0)
                                    self.collectionView?.insertItems(at: [indexPath])
                                }
                            }
                        }, completion: { (_) in
                            self.scrollToLast()
                     })
 }
 }
}

Here, we are creating a params object sending the query and some additional parameters such as time zone, location coordinates and access token identifying the user. After the response is received, which contains a list of messages, we use a method called `performBatchUpdates` on the collection view where we loop through all the messages, writing each one of them to the database and then adding at the end of the collection view. Here we got all the messages inside the `messages` list object where each message can be checked for its action type and a frame can be calculated for the same.

Since the frame for each cell is returned in the `sizeForItemAt` delegate method of the collectionViewDelegate, we first grab the message using its indexPath and check the action type for each such message added to the collection view.

if message.actionType == ActionType.map.rawValue {
   // map action
} else if message.actionType == ActionType.rss.rawValue {
   // rss action type
} else if message.actionType == ActionType.websearch.rawValue {
   // web search action
}

Since map action will be a static map, we use a hard coded value for the map’s height and the width equals to the width of the cell frame.

CGSize(width: view.frame.width, height: Constants.mapActionHeight)

Next, web search and rss action having the same UI, will have the same frame size but the number of cells inside each of the frame for these action depends on number of responses were received from the server.

CGSize(width: view.frame.width, height: Constants.rssActionHeight)

And the check can be condensed as well, instead of checking each action separately, we use a `||` (pipes) or an `OR`.

else if message.actionType == ActionType.rss.rawValue ||
            message.actionType == ActionType.websearch.rawValue {
           // web search and rss action
 }

The anchor and answer action types, are supposed to display a string in the chat bubble. So the chat bubble size can be calculated using the following method:

let size = CGSize(width: Constants.maxCellWidth, height: Constants.maxCellHeight)
let options = NSStringDrawingOptions.usesFontLeading.union(.usesLineFragmentOrigin)
let estimatedFrame = NSString(string: messageBody).boundingRect(with: size, options: options, attributes: [NSFontAttributeName: UIFont.systemFont(ofSize: Constants.messageFontSize)], context: nil)

Here, we first create a maximum frame size that can exist. Then, using the drawingOptions create an options variable. The actual calculation of the frame happens in the last method where we use the complete message string and this returns us the actual frame for the above action types. Use the above method to get the frame in the `CGRect` type.

Below, is the complete method used for this calculation:

func collectionView(_ collectionView: UICollectionView, layout collectionViewLayout: UICollectionViewLayout, sizeForItemAt indexPath: IndexPath) -> CGSize {
        let message = messages[indexPath.row]
 
        let estimatedFrame = self.estimatedFrame(messageBody: message.message)
        if message.actionType == ActionType.map.rawValue {
            return CGSize(width: view.frame.width, height: Constants.mapActionHeight)
        } else if message.actionType == ActionType.rss.rawValue ||
            message.actionType == ActionType.websearch.rawValue {
            return CGSize(width: view.frame.width, height: Constants.rssActionType)
        }
        return CGSize(width: view.frame.width, height: estimatedFrame.height + Constants.answerActionMargin)
    }

rn CGSize(width: view.frame.width, height: estimatedFrame.height + Constants.answerActionMargin)
}

Below are the results of the app in action.

References:

Implementing Speech To Text in SUSI iOS

SUSI being an intelligent bot has the capabilities by which the user can provide input in a hands-free mode by talking and not requiring to even lift the phone for typing. The speech to text feature is available in SUSI iOS with the help of the Speech framework which was released alongside iOS 10 which enables continuous speech detection and transcription. The detection is really fast and supports around 50 languages and dialects from Arabic to Vietnamese. The speech recognition API does its heavy tasks of detection on Apple’s servers which requires an internet connection. The same API is also not always available on all newer devices and also provides the ability to check if a specific language is supported at a particular time.

How to use the Speech to Text feature?

  • Go to the view controller and import the speech framework
  • Now, because the speech is transmitted over the internet and uses Apple’s servers for computation, we need to ask the user for permissions to use the microphone and speech recognition feature. Add the following two keys to the Info.plist file which displays alerts asking user permission to use speech recognition and for accessing the microphone. Add a specific sentence for each key string which will be displayed to the user in the alerts.
    1. NSSpeechRecognitionUsageDescription
    2. NSMicrophoneUsageDescription

The prompts appear automatically when the functionality is used in the app. Since we already have the Hot word recognition enabled, the microphone alert would show up automatically after login and the speech one shows after the microphone button is tapped.

3) To request the user for authorization for Speech Recognition, we use the method SFSpeechRecognizer.requestAuthorization.

func configureSpeechRecognizer() {
        speechRecognizer?.delegate = self

        SFSpeechRecognizer.requestAuthorization { (authStatus) in
            var isEnabled = false

            switch authStatus {
            case .authorized:
                print("Autorized speech")
                isEnabled = true
            case .denied:
                print("Denied speech")
                isEnabled = false
            case .restricted:
                print("speech restricted")
                isEnabled = false
            case .notDetermined:
                print("not determined")
                isEnabled = false
            }

            OperationQueue.main.addOperation {

                // handle button enable/disable

                self.sendButton.tag = isEnabled ? 0 : 1

                self.addTargetSendButton()
            }
        }
    }

4)   Now, we create instances of the AVAudioEngine, SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest,SFSpeechRecognitionTask

let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US"))
var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
var recognitionTask: SFSpeechRecognitionTask?
let audioEngine = AVAudioEngine()

5)  Create a method called `readAndRecognizeSpeech`. Here, we do all the recognition related stuff. We first check if the recognitionTask is running or not and if it does we cancel the task.

if recognitionTask != nil {
  recognitionTask?.cancel()
  recognitionTask = nil
}

6)  Now, create an instance of AVAudioSession to prepare the audio recording where we set the category of the session as recording, the mode and activate it. Since these might throw an exception, they are added inside the do catch block.

let audioSession = AVAudioSession.sharedInstance()

do {

    try audioSession.setCategory(AVAudioSessionCategoryRecord)

    try audioSession.setMode(AVAudioSessionModeMeasurement)

    try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

} catch {

    print("audioSession properties weren't set because of an error.")

}

7)  Instantiate the recognitionRequest.

recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

8) Check if the device has an audio input else throw an error.

guard let inputNode = audioEngine.inputNode else {

fatalError("Audio engine has no input node")

}

9)  Enable recognitionRequest to report partial results and start the recognitionTask.

recognitionRequest.shouldReportPartialResults = true

recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in

  var isFinal = false // to indicate if final result

  if result != nil {

    self.inputTextView.text = result?.bestTranscription.formattedString

    isFinal = (result?.isFinal)!

  }

  if error != nil || isFinal {

    self.audioEngine.stop()

    inputNode.removeTap(onBus: 0)

    self.recognitionRequest = nil

    self.recognitionTask = nil

  }
})

10) Next, we start with writing the method that performs the actual speech recognition. This will record and process the speech continuously.

  • First, we create a singleton for the incoming audio using .inputNode
  • .installTap configures the node and sets up the buffer size and the format
let recordingFormat = inputNode.outputFormat(forBus: 0)

inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, _) in

    self.recognitionRequest?.append(buffer)

}

11)  Next, we prepare and start the audio engine.

audioEngine.prepare()

do {

  try audioEngine.start()

} catch {

  print("audioEngine couldn't start because of an error.")

}

12)  Create a method that stops the Speech recognition.

func stopSTT() {

    print("audioEngine stopped")

    audioEngine.inputNode?.removeTap(onBus: 0)

    audioEngine.stop()

    recognitionRequest?.endAudio()

    indicatorView.removeFromSuperview()



    if inputTextView.text.isEmpty {

        self.sendButton.setImage(UIImage(named: ControllerConstants.mic), for: .normal)

    } else {

        self.sendButton.setImage(UIImage(named: ControllerConstants.send), for: .normal)

    }

        self.inputTextView.isUserInteractionEnabled = true
}

13)  Update the view when the speech recognition is running indicating the user its status. Add below code just below audio engine preparation.

// Listening indicator swift

self.indicatorView.frame = self.sendButton.frame

self.indicatorView.isUserInteractionEnabled = true

let gesture: UITapGestureRecognizer = UITapGestureRecognizer(target: self, action: #selector(startSTT))

gesture.numberOfTapsRequired = 1

self.indicatorView.addGestureRecognizer(gesture)
self.sendButton.setImage(UIImage(), for: .normal)

indicatorView.startAnimating()

self.sendButton.addSubview(indicatorView)

self.sendButton.addConstraintsWithFormat(format: "V:|[v0(24)]|", views: indicatorView)

self.sendButton.addConstraintsWithFormat(format: "H:|[v0(24)]|", views: indicatorView)

self.inputTextView.isUserInteractionEnabled = false

The screenshot of the implementation is below:

       

References

Youtube videos in the SUSI iOS Client

The iOS and android client already have the functionality to play videos based on the queries. In order to implement this feature of playing videos in the iOS client, we use the Youtube Data API v3. The task here was to create an UI/UX for the playing of videos within the app. An API call is made initially to fetch the youtube videos based on the query and they video ID of the first object is extracted and used to play the video.

The API endpoint for youtube data API looks like:

https://www.googleapis.com/youtube/v3/search?part=snippet&q={query}&key={your_api_key}

Using this we get the following result: ( I am adding only the first item which is required since the response is too long )

Path: $.items[0]

{

 "kind": "youtube#searchResult",

 "etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/oR-eA572vNoma1XIhrbsFTotfTY\"",

 "id": {

"kind": "youtube#channel",

"channelId": "UCQprMsG-raCIMlBudm20iLQ"

 },

 "snippet": {

"publishedAt": "2015-01-01T11:06:00.000Z",

"channelId": "UCQprMsG-raCIMlBudm20iLQ",

"title": "FOSSASIA",

"description": "FOSSASIA is supporting the development of Free and Open Source technologies for social change in Asia. The annual FOSSASIA Summit brings together ...",

"thumbnails": {

"default": {

"url": "https://yt3.ggpht.com/-CP18cWbo34A/AAAAAAAAAAI/AAAAAAAAAAA/kEmIgO8OjCk/s88-c-k-no-mo-rj-c0xffffff/photo.jpg"

},

"medium": {

"url": "https://yt3.ggpht.com/-CP18cWbo34A/AAAAAAAAAAI/AAAAAAAAAAA/kEmIgO8OjCk/s240-c-k-no-mo-rj-c0xffffff/photo.jpg"

},

"high": {

"url": "https://yt3.ggpht.com/-CP18cWbo34A/AAAAAAAAAAI/AAAAAAAAAAA/kEmIgO8OjCk/s240-c-k-no-mo-rj-c0xffffff/photo.jpg"

}

},

"channelTitle": "FOSSASIA",

"liveBroadcastContent": "upcoming"

 }

}

We parse the above object to grab the videoID, based on the query,  we will use code below:

if let itemsObject = response[Client.YoutubeResponseKeys.Items] as? [[String : AnyObject]] {
    if let items = itemsObject[0][Client.YoutubeResponseKeys.ID] as? [String : AnyObject] {
         let videoID = items[Client.YoutubeResponseKeys.VideoID] as? String
         completion(videoID, true, nil)
    }
}

This videoID is returned to the Controller where this method was called.

Now, we begin with designing the UI for the same. First of all, we need a view on which the youtube video will be played and this view would help dismiss the video by clicking on it.

First, we add the blackView to the entire screen.

// declaration
let blackView = UIView()

// Add backgroundView
func addBackgroundView() {

   If let window = UIApplication.shared.keyWindow {

           self.view.addSubview(blackView) 

           // Cover the entire screen
           blackView.frame = window.frame

           blackView.backgroundColor = UIColor(white: 0, alpha: 0.5)
           blackView.addGestureRecognizer(UITapGestureRecognizer(target: self, action: #selector(handleDismiss)))

   }

}

func handleDismiss() {
   UIView.animate(withDuration: 0.5, delay: 0, usingSpringWithDamping: 1, initialSpringVelocity: 1, options: .curveEaseOut, animations: {
       self.blackView.removeFromSuperview()
   }, completion: nil)
}

Next, we add the YoutubePlayerView. For this we use the Pod `YoutubePlayer`. Since, it had a few warnings showing as well as some videos not being played I had to make fixes to the original pod and use my own customized version ( available here ).

// Youtube Player
lazy var youtubePlayer: YouTubePlayerView = {
    let frame = CGRect(x: 0, y: 0, width: self.view.frame.width - 16, height: self.view.frame.height * 1 / 3)
    let player = YouTubePlayerView(frame: frame)
    return player
}()

// Shows Youtube Player

func addYotubePlayer(_ videoID: String) {
    if let window = UIApplication.shared.keyWindow {

       // Add YoutubePlayer view on top of blackView
        self.blackView.addSubview(self.youtubePlayer)
        // Calculate and set frame

       let centerX = UIScreen.main.bounds.size.width / 2
        let centerY = UIScreen.main.bounds.size.height / 3
        self.youtubePlayer.center = CGPoint(x: centerX, y: centerY)

       // Load Player using the Video ID 
        self.youtubePlayer.loadVideoID(videoID)

        blackView.alpha = 0
        youtubePlayer.alpha = 0

        UIView.animate(withDuration: 0.5, delay: 0, usingSpringWithDamping: 1, initialSpringVelocity: 1, options: .curveEaseOut, animations: {
            self.blackView.alpha = 1
            self.youtubePlayer.alpha = 1
       }, completion: nil)
    }
}

We are set with the UI and the only thing we are left with is to actually call the API in the client and after getting the `videoID` from that we call the above method passing this `videoID`. Before calling we need to check whether our query contains the play action or not and if it does we make the API call and add the player.

if let text = inputTextField.text {
    if text.contains("play") || text.contains("Play") {
        let query = text.replacingOccurrences(of: "play", with: "").replacingOccurrences(of: "Play", with: "")
        Client.sharedInstance.searchYotubeVideos(query) { (videoID, _, _) in
            DispatchQueue.main.async {
                if let videoID = videoID {
                    self.addYotubePlayer(videoID)
                 }
             }
          }
    }

}

We are all set now!Below is the output for the Youtube Player:

Websearch and Link Preview support in SUSI iOS

The SUSI.AI server responds to API calls with answers to the queries made. These answers might contain an action, for example a web search, where the client needs to make a web search request to fetch different web pages based on the query. Thus, we need to add a link preview in the iOS Client for each such page extracting and displaying the title, description and a main image of the webpage.

At first we make the API call adding the query to the query parameter and get the result from it.

API Call:

http://api.susi.ai/susi/chat.json?timezoneOffset=-330&q=amazon

And get the following result:

{

"query": "amazon",

"count": 1,

"client_id": "aG9zdF8xMDguMTYyLjI0Ni43OQ==",

"query_date": "2017-06-02T14:34:15.675Z",

"answers": [

{

"data": [{

"0": "amazon",

"1": "amazon",

"timezoneOffset": "-330"

}],

"metadata": {

"count": 1

},

"actions": [{

"type": "answer",

"expression": "I don't know how to answer this. Here is a web search result:"

},

{

"type": "websearch",

"query": "amazon"

}]

}],

"answer_date": "2017-06-02T14:34:15.773Z",

"answer_time": 98,

"language": "en",

"session": {

"identity": {

"type": "host",

"name": "108.162.246.79",

"anonymous": true

}

}

}

After parsing this response, we first recognise the type of action that needs to be performed, here we get `websearch` which means we need to make a web search for the query. Here, we use `DuckDuckGo’s` API to get the result.

API Call to DuckDuckGo:

http://api.duckduckgo.com/?q=amazon&format=json

I am adding just the first object of the required data since the API response is too long.

Path: $.RelatedTopics[0]

{

 "Result": "<a href=\"https://duckduckgo.com/Amazon.com\">Amazon.com</a>Amazon.com, also called Amazon, is an American electronic commerce and cloud computing company...",

 "Icon": {

"URL": "https://duckduckgo.com/i/d404ba24.png",

"Height": "",

"Width": ""

},

 "FirstURL": "https://duckduckgo.com/Amazon.com",

 "Text": "Amazon.com Amazon.com, also called Amazon, is an American electronic commerce and cloud computing company..."

}

For the link preview, we need an image logo, URL and a description for the same so, here we will use the `Icon.URL` and `Text` key. We have our own class to parse this data into an object.

class WebsearchResult: NSObject {
   
   var image: String = "no-image"
   var info: String = "No data found"
   var url: String = "https://duckduckgo.com/"
   var query: String = ""
   
   init(dictionary: [String:AnyObject]) {
       
       if let relatedTopics = dictionary[Client.WebsearchKeys.RelatedTopics] as? [[String : AnyObject]] {
           
           if let icon = relatedTopics[0][Client.WebsearchKeys.Icon] as? [String : String] {
               if let image = icon[Client.WebsearchKeys.Icon] {
                   self.image = image
               }
           }
           
           if let url = relatedTopics[0][Client.WebsearchKeys.FirstURL] as? String {
               self.url = url
           }
           
           if let info = relatedTopics[0][Client.WebsearchKeys.Text] as? String {
               self.info = info
           }
           
           if let query = dictionary[Client.WebsearchKeys.Heading] as? String {
               let string = query.lowercased().replacingOccurrences(of: " ", with: "+")
               self.query = string
           }   
       }   
   }   
}

We now have the data and only thing left is to display it in the UI.

Within the Chat Bubble, we need to add a container view which will contain the image, and the text description.

 let websearchContentView = UIView()
    
    let searchImageView: UIImageView = {
        let imageView = UIImageView(frame: CGRect(x: 0, y: 0, width: 44, height: 44))
        imageView.contentMode = .scaleAspectFit

       // Placeholder image assigned
        imageView.image = UIImage(named: "no-image")
        return imageView
    }()
    
    let websiteText: UILabel = {
        let label = UILabel()
        label.textColor = .white
        return label
    }()

   func addLinkPreview(_ frame: CGRect) {
        textBubbleView.addSubview(websearchContentView)
        websearchContentView.backgroundColor = .lightGray
        websearchContentView.frame = frame
        
        websearchContentView.addSubview(searchImageView)
        websearchContentView.addSubview(websiteText)




       // Add constraints in UI
        websearchContentView.addConstraintsWithFormat(format: "H:|-4-[v0(44)]-4-[v1]-4-|", views: searchImageView, websiteText)
        websearchContentView.addConstraintsWithFormat(format: "V:|-4-[v0]-4-|", views: searchImageView)
        websearchContentView.addConstraintsWithFormat(format: "V:|-4-[v0(44)]-4-|", views: websiteText)
    }

Next, in the Collection View, while checking other action types, we add checking for `websearch` and then call the API there followed by adding frame size and calling the `addLinkPreview` function.

else if message.responseType == Message.ResponseTypes.websearch {
  let params = [
    Client.WebsearchKeys.Query: message.query!,
    Client.WebsearchKeys.Format: "json"
  ]

  Client.sharedInstance.websearch(params, { (results, success, error) in                      
    if success {
      cell.message?.websearchData = results
      message.websearchData = results
      self.collectionView?.reloadData()
      self.scrollToLast()
    } else {
      print(error)
    }
  })
                    
  cell.messageTextView.frame = CGRect(x: 16, y: 0, width: estimatedFrame.width + 16, height: estimatedFrame.height + 30)
  

 cell.textBubbleView.frame = CGRect(x: 4, y: -4, width: estimatedFrame.width + 16 + 8 + 16, height: estimatedFrame.height + 20 + 6 + 64)
                    
  let frame = CGRect(x: 16, y: estimatedFrame.height + 20, width: estimatedFrame.width + 16 - 4, height: 60 - 8)
  

 cell.addLinkPreview(frame)

}

And set the collection View cell’s size.

else if message.responseType == Message.ResponseTypes.websearch {
  return CGSize(width: view.frame.width, height: estimatedFrame.height + 20 + 64)
}

And we are done 🙂

Here is the final version how this would look like on the device:

 

Susi AI Skill Development

What is Susi?

Susi is an open source intelligent personal assistant which has the capability to learn and respond better to queries. It is also capable of making to-do lists, setting alarms, providing weather and traffic info all in real time. Susi responds based on skills.

What is a skill? How do we teach a skill?

A skill is a piece of code which performs a set of actions in order to respond to the user’s query. These skills are based on pattern matching which help them mapping the user’s query to a specific skill and responding accordingly. Teaching a skill to Susi is surprisingly very easy to implement. One can take a look at the Susi Skill Development Tutorial and a video workshop by Michael Christen.

I will try to give a basic idea on how to create a skill, it’s basic structure and some of the skills I developed in the first week.

Prepare to create a skill:

  • Head over to http://dream.susi.ai
  • Create a etherpad with some relevant name
  • Delete all text currently present in there
  • Start writing your skill

Adding to this, for testing a skill one can head over to Susi Web Chat Interface.

Basic Structure for calling an API:

<Regular expression to be matched here>

!console:<response given to the user>
 {
 "url":"<API endpoint>",
 "path":"<Json path here>"
 }
 eol

So, let me explain this line by line.

  1. The regular expression is the one to which the user’s query is matched first.
  2. The console is meant to output the actual response the user sees as response.
  3. In place of the “url”, the API endpoint is passed in.
  4. “path” here specifies how we traverse through the response Json or Jsonp to get the object, starts with “$.”.
  5. At last, “eol” which is the end-of-line marks the end of a skill.

Let’s take an example for better understanding of this:

random gif
!console: $url$
{
    "url" : "http://api.giphy.com/v1/gifs/trending?api_key=dc6zaTOxFJmzC",
    "path" : "$.data[0].images.fixed_height"
}
eol 

 

This skill responds with a link to a random gif.

Steps involved:

  1. Match the string “random gif” with the user’s query.
  2. On successful match, make an API call to the API endpoint specified in “url”
  3. On response, extract the object at the specified path in the json under “path”
  4. Respond to the user with the “url” key’s value which would here be an URL of a GIF.

Let’s try it out on Susi Web Chat. For this, you will first have to load your skill using the dream command followed by etherpad name: dream <etherpad name>. And then you can start testing your skill.

So, we queried “random gif” and we got a response “Click Here!”. The complete URL didn’t show up because all the URLs are currently parsed and a hyperlink for each is created. So try clicking on it to find a GIF.

 

Now, let’s look at one more skill I developed during this period.

# Returns the name of the president of a country

 president of *|who is the president of *| president *
 !console:$plaintext$
 {      "url":"https://api.wolframalpha.com/v2/query?input=president+$1$&output=JSON&appid=9WA6XR-26EWTGEVTE&includepodid=Result",
   "path" : "$.queryresult.pods[0].subpods[0]"
 }
 eol

 

Let’s understand this step by step:

  1. We have here “president of *|who is the president of *| president *”, which means the user’s query matches with anyone of the following because of the use of pipe symbol “|”. The “*” here replaces a word or a list of words, which can be accessed like: “${index}$”  where index is replaced by the position of the “*” in the expression starting from 1.
  2. Now we have something new in the URL. See that  $1$  inside the URL? On runtime, that is replaced with the content of the “*” variable. So if a user puts in query like: “president of usa”, “usa” is mapped to $1$ and is replaced in the URL and appropriate API request is made.
  3. Then the path is traversed in the json response and the value of the “plaintext” key is used to respond to the user.

 

It’s now time to try it out on Susi Web Chat.

So, we got our desired response here, i.e., the name of the president of usa.