Custom UI Implementation for Web Search and RSS actions in SUSI iOS Using Kingfisher for Image Caching

The SUSI Server is an AI powered server which is capable of responding to intelligent answers based on user’s queries. The queries to the susi server are obtained either as a websearch using the application or as an RSS feed. Two of the actions are websearch and RSS. These actions as the name suggests respond to queries based on search results from the web which are rendered in the clients. In order to use use these action types and display them in the SUSI iOS client, we need to first parse the actions looking for these action types and then creating a custom UI for them to display them.

To start with, we need to make send the query to the server to receive an intelligent response from the server. This response is parsed into different action types supported by the server and saved into relevant objects. Here, we check the action types by looping through the answers array containing the actions and based on that, we save the data for that action.

if type == ActionType.rss.rawValue {
   message.actionType = ActionType.rss.rawValue
   message.rssData = RSSAction(data: data, actionObject: action)
} else if type == ActionType.websearch.rawValue {
   message.actionType = ActionType.websearch.rawValue
   message.message = action[Client.ChatKeys.Query] as? String ?? ""
}

Here, we parsed the data response from the server and looked for the rss and websearch action type followed by which we saved the data we received from the server for each of the action types in their own objects.

Next, when a message object is created, we insert it into the dataSource item by appending it and use the `insertItems(at: [IndexPath])` method of collection view to insert them into the views at a particular index. Before adding them, we need to create a Custom UI for them. This UI will consist of a Collection View which is scrollable in the horizontal direction inside a CollectionView Cell. To start with this, we create a new class called `WebsearchCollectionView` which will be a `UIView` consisting of a `UICollectionView`.

We start by adding a collection view into the UIView inside the `init` method by overriding it. Declare a collection view using flow layout and scroll direction set to `horizontal`. Also, hide the scroll indicators and assign the delegate and datasource to `self`.

Now to populate this collection view, we need to specify the number of items that will show up. For this, we make use of the `message` variable declared. We use the `websearchData` in case of websearch action and `rssData` otherwise.

Now to specify the number of cells, we use the below method which returns the number of rss or websearch action objects and defaults to 0 such cells.

func collectionView(_ collectionView: UICollectionView, numberOfItemsInSection section: Int) -> Int {
    if let rssData = message?.rssData {
        return rssData.count
    } else if let webData = message?.websearchData {
        return webData.count
    }
    return 0
}

We display the title, description and image for each object for which we need to create a UI for the cells. Let’s start by creating a Custom Collection View cell with the imageview and 2 labels for title and description.

The imageview is given a contentMode of `aspectFit` and assigned a placeholder image in case the image doesn’t exist. The title and description labels are assigned the same font size, the title being bolder and both are center aligned.

class WebsearchCell: BaseCell {

   var imageView: UIImageView = {
       let iv = UIImageView()
       iv.contentMode = .scaleAspectFit
       iv.image = UIImage(named: "placeholder")
       return iv
   }()

   let titleLabel: UILabel = {
       let label = UILabel()
       label.textColor = .black
       label.font = UIFont.boldSystemFont(ofSize: 14)
       label.textAlignment = .center
       label.numberOfLines = 2
       label.backgroundColor = Color.grey.lighten3
       return label
   }()

   let descriptionLabel: UILabel = {
       let label = UILabel()
       label.font = UIFont.systemFont(ofSize: 14)
       label.textAlignment = .center
       return label
   }()

}

Next, we add constraints for each such view adding a title and description label adding a small margin on both sides of the cell for a cleaner UI.

addSubview(imageView)
addSubview(titleLabel)
addSubview(descriptionLabel)
descriptionLabel.numberOfLines = 5
addConstraintsWithFormat(format: "H:|-4-[v0(\(frame.width * 0.4))]-4-[v1]-4-|", views: imageView, titleLabel)
addConstraintsWithFormat(format: "|-\(frame.width * 0.4 + 8)-[v0]-4-|", views: descriptionLabel)
addConstraintsWithFormat(format: "V:|-4-[v0]-4-|", views: imageView)
addConstraintsWithFormat(format: "V:|-4-[v0(44)]-4-[v1]-4-|", views: titleLabel, descriptionLabel)

Now to use this custom cell, we first need to register it with the collection view and then we can use it easily in the `cellForItemAt` method. Since we are using Kingfisher for image caching, we use the `.kf.setImage` method to download the image from a URL and cache it as soon as it is downloaded.

collectionView.register(WebsearchCell.self, forCellWithReuseIdentifier: cellId)
 func collectionView(_ collectionView: UICollectionView, cellForItemAt indexPath: IndexPath) -> UICollectionViewCell {
       if let cell = collectionView.dequeueReusableCell(withReuseIdentifier: cellId, for: indexPath) as? WebsearchCell {
           cell.backgroundColor = .white

           if message?.actionType == ActionType.rss.rawValue {
               let feed = message?.rssData?.rssFeed[indexPath.item]
               cell.titleLabel.text = feed?.title
               cell.descriptionLabel.text = feed?.desc                cell.imageView.kf.setImage(with: URL(string: feed?.rssData?.image))
           } else if message?.actionType == ActionType.websearch.rawValue {
               let webData = message?.websearchData[indexPath.item]
               cell.titleLabel.text = webData?.title
               cell.descriptionLabel.text = webData?.desc.html2String                cell.imageView.kf.setImage(with: URL(string: feed?.webData?.image))
           }
           return cell
       } else {
           return UICollectionViewCell()
       }
   }

We check the action type and assign data based on that. Also, for the images, if they don’t exist a placeholder is added.

Since we are done with the Custom UI of the cell, we need to add it to the chat cell. For that, we add this `UIView` as a subview. For reasons of reusability, the cell was extracted into a separate one and call the `prepareForReuse()` method for reusability. This is followed by adding a subview and setting constraints for each cell and assigning the message object.

class RSSCell: ChatMessageCell {

   var message: Message? {
       didSet {
           self.addWebsearchView()
       }
   }

   lazy var websearchView: WebsearchCollectionView = {
       let view = WebsearchCollectionView()
       return view
   }()

   override func setupViews() {
       super.setupViews()
       prepareForReuse()
   }

   func addWebsearchView() {
       self.addSubview(websearchView)
       self.addConstraintsWithFormat(format: "H:|[v0]|", views: websearchView)
       self.addConstraintsWithFormat(format: "V:[v0(135)]", views: websearchView)
       websearchView.message = message
   }

}
if message.actionType == ActionType.rss.rawValue || message.actionType == ActionType.websearch.rawValue {
   if let cell = collectionView.dequeueReusableCell(withReuseIdentifier: ControllerConstants.rssCell, for: indexPath) as? RSSCell {
       cell.message = message
       return cell
   } else {
       return UICollectionViewCell()
   }
}

This is all we need to add a custom UI to a chat cell in SUSI iOS, very simple and clean.

Check the screenshot below, the app in action.

Resources:

Continue Reading

Hotword Recognition in SUSI iOS

Hot word recognition is a feature by which a specific action can be performed each time a specific word is spoken. There is a service called Snowboy which helps us achieve this for various clients (for ex: iOS, Android, Raspberry pi, etc.). It is basically a DNN based hotword recognition toolkit.

In this blog, we will learn how to integrate the snowboy hotword detection wrapper in the SUSI iOS client. This service can be used in any open source project but for using it commercially, a commercial license needs to be obtained.

Following are the files that need to be added to the project which are provided by the service itself: snowboy-detect.h libsnowboy-detect.a and a trained model file which can be created using their online service: snowboy.kitt.ai. For the sake of this blog, we will be using the hotword “Susi”, the model file can be found here.

The way how snowboy works is that speech is recorded for a few seconds and this data is detected with an already trained model by a specific hotword, now if snowboy returns a 1 means word has been successfully detected else wasn’t.

We start with creation of a wrapper class in Objective-C which can be found wrapper and the bridging header in case this needs to be added to a Swift project. The wrapper contains methods for setting sensitivity, audio gain and running the detection using the buffer. It is a wrapper class built on top of the snowboy-detect.h header file.

Let’s initialize the service and run it. Below are the steps followed to enable hotword recognition and print out whether it successfully detected the hotword or not:

  • Create a ViewController class with extensions
    • AVAudioRecorderDelegate
    • AVAudioPlayerDelegate

since we will be recording speech.

  • Import AVFoundation
  • Create a basic layout containing a label which detects whether hotword detected or not and create corresponding `IBOutlet` in the ViewController and a button to trigger the start and stop of recognition.
  • Create the following variables:
    • let WAKE_WORD = “Susi” // hotword used
    • let RESOURCE = Bundle.main.path(forResource: “common”, ofType: “res”)
    • let MODEL = Bundle.main.path(forResource: “susi”, ofType: “umdl”) //path where the model file is stored
    • var wrapper: SnowboyWrapper! = nil // wrapper instance for running detection
    • var audioRecorder: AVAudioRecorder! // audio recorder instance
    • var audioPlayer: AVAudioPlayer!
    • var soundFileURL: URL! //stores the URL of the temp reording file
    • var timer: Timer! //timer to fire a function after an interval
    • var isStarted = false // variable to check if audio recorder already started
  • In `viewDidLoad` initialize the wrapper and set sensitivity and audio gain. Recognition best happens when sensitivity is set to `0.5` and audio gain is set to `1.0` according to the docs.
override func viewDidLoad() {
    super.viewDidLoad()
    wrapper = SnowboyWrapper(resources: RESOURCE, modelStr: MODEL)
    wrapper.setSensitivity("0.5")
    wrapper.setAudioGain(1.0)
}
  • Create an `IBAction` for the button to start recognition. This action will be used to start or stop the recording in which the action toggles based on the `isStarted` variable. When true, recording is stopped and the timer invalidated else a timer is started which calls the `startRecording` method with an interval of 4 seconds.
@IBAction func onClickBtn(_ sender: Any) {
  if (isStarted) {
    stopRecording()
    timer.invalidate()
    btn.setTitle("Start", for: .normal)
    isStarted = false
  } else {
    timer = Timer.scheduledTimer(timeInterval: 4, target: self, 
    selector: #selector(startRecording), userInfo: nil, repeats: true)
    timer.fire()
    btn.setTitle("Stop", for: .normal)
    isStarted = true
  }
}
  • Next, we add the start and stop recording methods.
    • First, a temp file is created which stores the recorded audio output
    • After which, necessary record configurations are made such as setting the sampling rate.
    • The recording is then started and the output stored in the temp file.
func startRecording() {
  do {
    let fileMgr = FileManager.default
    let dirPaths = fileMgr.urls(for: .documentDirectory, in: .userDomainMask)
    soundFileURL = dirPaths[0].appendingPathComponent("temp.wav")
    let recordSettings = [AVEncoderAudioQualityKey: 
    AVAudioQuality.high.rawValue,
    AVEncoderBitRateKey: 128000,
    AVNumberOfChannelsKey: 1,
    AVSampleRateKey: 16000.0] as [String : Any]
    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(AVAudioSessionCategoryRecord)
    try audioRecorder = AVAudioRecorder(url: soundFileURL,
settings: recordSettings as [String : AnyObject])
    audioRecorder.delegate = self
    audioRecorder.prepareToRecord()
    audioRecorder.record(forDuration: 2.0)
    instructionLabel.text = "Speak wake word: \(WAKE_WORD)"print("Started recording...")
  } catch let error {
    print("Audio session error: \(error.localizedDescription)")
  }
}
  • The stop recording method, stops the audioRecorder instance and updates the instruction label to show the same.
func stopRecording() {
  if (audioRecorder != nil && audioRecorder.isRecording) {
    audioRecorder.stop()
  }
  instructionLabel.text = "Stop"
  print("Stopped recording...")
}

The final recognition is done in the `audioRecorderDidFinishRecording` delegate method which runs the snowboy detection function which processes the audio recording in the temp file by creating a buffer and storing the audio in it and giving the wrapper this buffer as input which processes the buffer and returning a `1` is hotword was successfully detected.

func runSnowboy() {
  let file = try! AVAudioFile(forReading: soundFileURL)
  let format = AVAudioFormat(commonFormat: .pcmFormatFloat32, 
  sampleRate: 16000.0, channels: 1, interleaved: false)
  let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(file.length))
  try! file.read(into: buffer)
  let array = Array(UnsafeBufferPointer(start: 
  buffer.floatChannelData![0], count:Int(buffer.frameLength)))
  // print output
  let result = wrapper.runDetection(array, length: Int32(buffer.frameLength))
  print("Result: \(result)")
}

To test this out, click the start button and speak different words and you will notice that once the Hot Word is spoken, log with `result: 1` is printed out.

The snowboy hotword recognition also offers to train the personalized model with the help of Rest Apis for which the docs can be found here. The complete project implementation can be found here.

Sources:

Continue Reading

Calculation of the Frame Size of the Chat Bubble in SUSI iOS

We receive intelligent responses from the SUSI Server based on our query. Each response contains a different set of actions and the content of the action can be of variable sizes, map, string, table, pie chart, etc. To make the chat bubble size dynamic in the SUSI iOS client, we need to check the action type. For each action, we calculate a different frame size which makes the size of the chat bubble dynamic and hence solving the issue of dynamic size of these bubbles.

In order to calculate the frame size, as mentioned above, we need to check the action type of that message. Let’s start by first making the API call sending the query and getting the action types as a response.

func queryResponse(_ params: [String : AnyObject], _ completion: @escaping(_ messages: List<Message>?, _ success: Bool, _ error: String?) -> Void) {

   let url = getApiUrl(UserDefaults.standard.object(forKey: ControllerConstants.UserDefaultsKeys.ipAddress) as! String, Methods.Chat)

       _ = makeRequest(url, .get, [:], parameters: params, completion: { (results, message) in
           if let _ = message {
               completion(nil, false, ResponseMessages.ServerError)
           } else if let results = results {

               guard let response = results as? [String : AnyObject] else {
                   completion(nil, false, ResponseMessages.InvalidParams)
                   return
               }

               let messages = Message.getAllActions(data: response)
               completion(messages, true, nil)
           }
           return
})
}

Here, we are sending the query in the params dictionary. The `makeRequest` method makes the actual API call and returns a results object and an error object if any which default to `nil`. First, we check if the error variable is `nil` or not and if it is, we parse the complete response by using a helper method created in the Message object called `getAllActions`. This basically takes the response and gives us a list of messages of all action types returned for that query.

In order to display this in the UI, we need to call this method in the View Controller to actually use the result. Here is how we call this method.

var params: [String : AnyObject] = [
 Client.WebsearchKeys.Query: inputTextView.text! as AnyObject,
 Client.ChatKeys.TimeZoneOffset: ControllerConstants.timeZone as AnyObject,
       Client.ChatKeys.Language: Locale.current.languageCode as AnyObject
]

if let location = locationManager.location {
 params[Client.ChatKeys.Latitude] = location.coordinate.latitude as AnyObject
       params[Client.ChatKeys.Longitude] = location.coordinate.longitude as AnyObject
}

if let userData = UserDefaults.standard.dictionary(forKey: ControllerConstants.UserDefaultsKeys.user) as [String : AnyObject]? {
 let user = User(dictionary: userData)
        params[Client.ChatKeys.AccessToken] = user.accessToken as AnyObject
       }

Client.sharedInstance.queryResponse(params) { (messages, success, _) in
 DispatchQueue.main.async {
 if success {
 self.collectionView?.performBatchUpdates({
                            for message in messages! {
                                try! self.realm.write {
                                    self.realm.add(message)
                                    self.messages.append(message)
                                    let indexPath = IndexPath(item: self.messages.count - 1, section: 0)
                                    self.collectionView?.insertItems(at: [indexPath])
                                }
                            }
                        }, completion: { (_) in
                            self.scrollToLast()
                     })
 }
 }
}

Here, we are creating a params object sending the query and some additional parameters such as time zone, location coordinates and access token identifying the user. After the response is received, which contains a list of messages, we use a method called `performBatchUpdates` on the collection view where we loop through all the messages, writing each one of them to the database and then adding at the end of the collection view. Here we got all the messages inside the `messages` list object where each message can be checked for its action type and a frame can be calculated for the same.

Since the frame for each cell is returned in the `sizeForItemAt` delegate method of the collectionViewDelegate, we first grab the message using its indexPath and check the action type for each such message added to the collection view.

if message.actionType == ActionType.map.rawValue {
   // map action
} else if message.actionType == ActionType.rss.rawValue {
   // rss action type
} else if message.actionType == ActionType.websearch.rawValue {
   // web search action
}

Since map action will be a static map, we use a hard coded value for the map’s height and the width equals to the width of the cell frame.

CGSize(width: view.frame.width, height: Constants.mapActionHeight)

Next, web search and rss action having the same UI, will have the same frame size but the number of cells inside each of the frame for these action depends on number of responses were received from the server.

CGSize(width: view.frame.width, height: Constants.rssActionHeight)

And the check can be condensed as well, instead of checking each action separately, we use a `||` (pipes) or an `OR`.

else if message.actionType == ActionType.rss.rawValue ||
            message.actionType == ActionType.websearch.rawValue {
           // web search and rss action
 }

The anchor and answer action types, are supposed to display a string in the chat bubble. So the chat bubble size can be calculated using the following method:

let size = CGSize(width: Constants.maxCellWidth, height: Constants.maxCellHeight)
let options = NSStringDrawingOptions.usesFontLeading.union(.usesLineFragmentOrigin)
let estimatedFrame = NSString(string: messageBody).boundingRect(with: size, options: options, attributes: [NSFontAttributeName: UIFont.systemFont(ofSize: Constants.messageFontSize)], context: nil)

Here, we first create a maximum frame size that can exist. Then, using the drawingOptions create an options variable. The actual calculation of the frame happens in the last method where we use the complete message string and this returns us the actual frame for the above action types. Use the above method to get the frame in the `CGRect` type.

Below, is the complete method used for this calculation:

func collectionView(_ collectionView: UICollectionView, layout collectionViewLayout: UICollectionViewLayout, sizeForItemAt indexPath: IndexPath) -> CGSize {
        let message = messages[indexPath.row]
 
        let estimatedFrame = self.estimatedFrame(messageBody: message.message)
        if message.actionType == ActionType.map.rawValue {
            return CGSize(width: view.frame.width, height: Constants.mapActionHeight)
        } else if message.actionType == ActionType.rss.rawValue ||
            message.actionType == ActionType.websearch.rawValue {
            return CGSize(width: view.frame.width, height: Constants.rssActionType)
        }
        return CGSize(width: view.frame.width, height: estimatedFrame.height + Constants.answerActionMargin)
    }

rn CGSize(width: view.frame.width, height: estimatedFrame.height + Constants.answerActionMargin)
}

Below are the results of the app in action.

References:

Continue Reading
Close Menu