Set proper content type when uploading files on s3 with python-magic

In the open-event-orga-server project, we had been using Amazon s3 storage for a long time now. After some time we encountered an issue that no matter what the file type was, the Content-Type when retrieving this files from the storage solution was application/octet-stream.

An example response when retrieving an image from s3 was as follows:


Accept-Ranges →bytes
Content-Disposition →attachment; filename=HansBakker_111.jpg
Content-Length →56060
Content-Type →application/octet-stream
Date →Fri, 09 Sep 2016 10:51:06 GMT
ETag →"964b1d839a9261fb0b159e960ceb4cf9"
Last-Modified →Tue, 06 Sep 2016 05:06:23 GMT
Server →AmazonS3
x-amz-id-2 →1GnO0Ta1e+qUE96Qgjm5ZyfyuhMetjc7vfX8UWEsE4fkZRBAuGx9gQwozidTroDVO/SU3BusCZs=
x-amz-request-id →ACF274542E950116

 

As seen above instead of providing image/jpeg as the Content-Type, it provides the Content-Type as application/octet-stream.While uploading the files, we were not providing the content type explicitly, which seemed to be the root of the problem.

It was decided that we would be providing the content type explicitly, so it was time to choose an efficient library to determine the file type based on the content of the file and not the file extension. After researching through the available libraries python-magic seemed to be the obvious choice. python-magic is a python interface to the libmagic file type identification library. libmagic identifies file types by checking their headers according to a predefined list of file types.

Here is an example straight from python-magic‘s readme on its usage:


>>> import magic
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
>>> magic.from_buffer(open("testdata/test.pdf").read(1024))
'PDF document, version 1.2'
>>> magic.from_file("testdata/test.pdf", mime=True)
'application/pdf'

 

Given below is a code snippet for the s3 upload function in the project:


file_data = file.read()
    file_mime = magic.from_buffer(file_data, mime=True)
    size = len(file_data)
    # k is defined as  k = Key(bucket) in previous code
    sent = k.set_contents_from_string(
        file_data,
        headers={
            'Content-Disposition': 'attachment; filename=%s' % filename,
            'Content-Type': '%s' % file_mime
        }
    ) 

 

One thing to note that as python-magic uses libmagic-dev as a dependency and many of the distros do not come with libmagic-dev pre-installed, make sure you install libmagic-dev explicitly. (Installation instructions may vary per distro)


sudo apt-get install libmagic-dev

Voila !! Now when retrieving each and every file you’ll get the proper content type.

 

Deploying Angular 2 application using GitHub Pages

In recent months I have started working with Angular 2 technology as my project is based on this tech stack. Angular 2 is the one of the famous framework of JavaScript. The project name is ‘Susper’ which is currently being in development stages under FOSSASIA. In FOSSASIA, to be a good developer everyone follows good practices. One of the good practice is providing a live preview of the fix done in a pull request related to a particular issue. It was not simple to deploy test pages as it looks on GitHub pages. I read a lot of stackoverflow answers and surfed google a lot to find solution. Then I came to the solution, which I’ll be sharing with you in this blog.

I’m assuming your Angular 2 app must be using webpack services and latest version of angular has been installed. Firstly, be sure Angular CLI must be updated. If not, then update the Angular CLI to a new version. You must update both the global package and local package of your project.

Global package:

npm uninstall -g @angular/cli
npm cache clean
npm install -g @angular/cli@latest

Local package:

rm -rf node_modules dist
npm install --save-dev @angular/cli@latest
npm install

NOTE – Make sure to install local packages, you must be inside the project folder.

To make deployments easier, follow these steps after updating global and local packages –

Install angular-cli-ghpages :

npm i -g angular-cli-ghpages

This command is similar to the old github pages:deploy command of @angular/cli and this script works great with Travis CI.
After installing you should see the changes in package.json as well :

"devDependencies": {
    "angular-cli-ghpages": "^0.5.0"
}

After updating the global and local package you will notice a new folder named ‘node_modules’ has been created. Now the magic part comes to play here!

Add deploy script :

In package.json file add the following deploy script –

"scripts": {
    "deploy": "ng build --prod --aot --base-href=/project_repo_name/ && cp ./dist/index.html ./dist/404.html && ./node_modules/.bin/angular-cli-ghpages --no-silent"
}

We have setup the required dependencies to deploy test page. Now, here it comes to generate a live preview :

Steps :
git checkout working_branch
ng build
npm run deploy

We have successfully deployed the repository to GitHub pages. To refer live preview go here –

https://yourusername.github.io/project_name

How did it worked out ?

Well, this is the most easiest way to deploy any angular 2 app on GitHub pages. The only disadvantage of deploying to GitHub pages is that we have to always perform a manual build before providing a live preview whenever some changes have been done in that particular branch.

Detecting and Fixing Memory leaks in Susi Android App

In the fast development of the Susi App, somehow developers missed out some memory leaks in the app. It is a very common mistake that developers do. Most new android developers don’t know much about Memory leaks and how to fix them. Memory leaks makes the app slower and causes crashes due to OutOfMemoryException. To make the susi app more efficient, it is advised to look out for these memory leaks and fix them. This post will focus on teaching developers who’ll be contributing in Susi android App or any other android app about the memory leaks, detecting them and fixing them.

What is a memory leak?

Android system manages memory allocation to run the apps efficiently. When memory runs short, it triggers Garbage Collector (GC) which cleans up the objects which are no longer useful, clearing up memory for other useful objects. But, suppose a case when a non-useful object is referenced from a useful object. In that case Garbage Collector would mark the non-useful object as useful and thus won’t be able to remove it, causing a Memory Leak.

Now, when a memory leak occurs, the app demands for memory from the android system but the android system can only give a certain amount of memory and after that point it will refuse to give more memory and thus causing a OutOfMemoryException and crashing the app. Even if sometime due to memory leaks, the app doesn’t crash but it surely will slow down and skip frames.

Now, few other questions arises, like “How to detect these leaks?” , “What causes these leaks ?” and “How can we fix these?” Let’s cover these one by one.

Detecting Memory Leaks

You can detect memory leaks in android app in two ways :

  1. Using Android Studio
  2. Using Leak Canary

In this post I’ll be describing the way to use Leak Canary to detect Memory Leaks. If you want to know about the way to use android studio to detect leaks, check out this link .

Using Leak Canary for Memory Leak detection :

Leak Canary is a very handy tool when it comes to detecting memory leaks. It shows the memory leak in an another app in the mobile itself. Just add these lines under the dependencies in build.gradle file.

debugCompile ‘com.squareup.leakcanary:leakcanary-android:1.5.1’
releaseCompile ‘com.squareup.leakcanary:leakcanary-android-no-op:1.5.1’
testCompile ‘com.squareup.leakcanary:leakcanary-android-no-op:1.5.1’

And this code here in the MainApplication.java file.

if (LeakCanary.isInAnalyzerProcess(this)) {
// This process is dedicated to LeakCanary for heap analysis.

// You should not init your app in this process.

return
;
}
LeakCanary.install(this);
// Normal app init code…

You are good to go. Now just run the app and if there is a memory leak you will get something like this. It dumps the memory in .hprof file and displays it in another app.

      

Causes of Memory Leaks

There are many causes of memory leaks. I will list a few top of my head but there can be more.

  1. Static Activities and Views : Defining a static variable inside the class definition of the Activity and then setting it to the running instance of that Activity. If this reference is not cleared before the Activity’s lifecycle completes, the Activity will be leaked.
  2. Listeners : When you register a listener, it is advised to unregister it in onDestroy() method to prevent memory leaks. It is not that prominent but may cause memory leaks.
  3. Inner Classes : If you create an instance of Inner Class and maintain a static reference to it, there is a chance of memory leak.
  4. Anonymous Classes : A leak can occur if you declare and instantiate an AsyncTask anonymously inside your Activity. If it continues to perform background work after the Activity has been destroyed, the reference to the Activity will persist and it won’t be garbage collected until after the background task completes.
  5. Handlers and Threads : The very same principle applies to background tasks declared anonymously by a Runnable object and queued up for execution by a Handler object.

Preventing and Fixing Memory Leaks

So, now you know what are the causes of these memory leaks. You just have to be a little more careful while implementing these. Here are some more tips to prevent or fix memory leaks :

  1. Be extra careful when dealing with Inner classes and Anonymous classes. Make them static wherever possible. Use a static inner class with a WeakReference to the outer class if that helps.
  2. Be very careful with a static variable in your activity class because it can reference your activity and cause leak. Be sure to remove the reference in onDestroy().
  3. Unregister all listeners in onDestroy() method.
  4. Always terminate worker threads you initiated on Activity onDestroy().
  5. Make sure that your allocated resources are all collected as expected. Do not always rely on Garbage Collector.
  6. Try using the context-application instead of a context-activity.

Conclusion

So, now if you want to contribute in Susi Android App and implement a feature in it, you can just check if there is a memory leak due to your implementation and fix it for better performance of the app. Also, if you find any other memory leak in the app, do report it on the issue tracker, fix it and make the Susi Android App more efficient.

Happy Coding!

Porting PSLab Libraries – Python to Java

PSLab has existing communication libraries and sensor files in Python which were created during the development of Python Desktop Application.

The initial task and challenge was porting this existing code to Java to be used by the Android App. Since, the python libraries also utilized the object oriented model of programming, porting from Python to Java had the similar code structure and organization.

Common problems faced while porting from Python to Java

  • The most common problem is explicitly assigning data types to variables in Java since Python manages data types on its own. However, most of the time the data types are quite evident from the context of their use and understanding the purpose of the code can make the task much simpler.
  • Another task was migrating the Python data structures to their corresponding Java counterparts like a List in Python represents an ArrayList in Java, similarly a Dictionary corresponds to a HashMap and so on.
  • Some of the sections of the code uses highly efficient libraries like Numpy and Scipy for some mathematical functions. Finding their corresponding Java counterparts in libraries was a challenge. This was partly solved by using Apache Common Math which is a library dedicated for mathematical functions. Some of the functions were directly implemented using this library and for rest of the portions, the code was written after understanding the structure and function of Numpy methods.

While porting the code from Python to Java, some of the steps which we followed:

  • Matching corresponding data-structures

The Dictionary in python…

Gain_scaling = OrderedDict ([('GAIN_TWOTHIRDS', 0.1875), ('GAIN_ONE', 0.125), ('GAIN_TWO', 0.0625), ('GAIN_FOUR', 0.03125), ('GAIN_EIGHT', 0.015625), ('GAIN_SIXTEEN', 0.0078125)])

…was mapped to corresponding Java HashMap in the manner given below. A point to be noted here is for adding elements to a HashMap can be done only from a method and not at the time of declaration of HashMap.

private HashMap <String,Double> gainScaling = new HashMap <String,Double>();

gainScaling.put("GAIN_TWOTHIRDS",0.1875);
gainScaling.put("GAIN_ONE",0.125);
gainScaling.put("GAIN_TWO",0.0625);
gainScaling.put("GAIN_FOUR",0.03125);
gainScaling.put("GAIN_EIGHT",0.015625);
gainScaling.put("GAIN_SIXTEEN",0.0078125);

Similarly, the List in Python can be  be converted to the corresponding ArrayList in Java.

  • Assigning data types and access modifiers to corresponding variables in Java
POWER_ON = 0x01
gain_choices = [RES_500mLx, RES_1000mLx, RES_4000mLx]
ain_literal_choices = ['500mLx', '1000mLx', '4000mLx']
scaling = [2, 1, .25]
private int POWER_ON = 0x01;
public int[] gainChoices = {RES_500mLx,RES_1000mLx,RES_4000mLx};
public String[] gainLiteralChoices = {"500mLx", "1000mLx", "4000mLx"};
public double[] scaling = {2,1,0.25};

Assigning data types and the corresponding access modifiers can get tricky sometimes. So, understanding the code is essential to know whether a variable in limited to the class or needs to be accessed outside the class, whether a variable is int, short, float or double etc.

  • Porting Numpy & Scipy functions to Java using Apache Common Math

For example, this piece of code gives the pitch of acceleration. It uses mathematical functions like arc-tan.

pitchAcc = np.arctan2(accData[1], accData[2]) * 180 / np.pi

The corresponding version of arc-tan in Apache Common Math is used in Java.

double pitchAcc = Math.atan2(accelerometerData[1], accelerometerData[2]) * 180 / pi;
  • Porting by writing the code for Numpy and Scipy functions explicitly

In the code below, rfftfreq is used to calculate the Discrete Fourier Transform(DFT) sample frequencies.

freqs = self.fftpack.rfftfreq(N, d=(xReal[1] - xReal[0]) / (2 * np.pi))

Since, hardly any library in Java supports DFT, the corresponding code for rfftfreq was self-written.

double[] rfftFrequency(int n, double space){
    double[] returnArray = new double[n + 1];
    for(int i = 0; i < n + 1; i++){
        returnArray[i] =  Math.floor(i / 2) / (n * space);
    }
    return Arrays.copyOfRange(returnArray, 1, returnArray.length);
}

After porting of all communication libraries and sensor files are done, the testing of features can also be initiated. Currently, the ongoing development includes porting of the some of the remaining files and working on the the best possible User Interface.

Using custom themes with Yaydoc to build documentation

What is Yaydoc?

Yaydoc aims to be a one stop solution for all your documentation needs. It is continuously integrated to your repository and builds the site on each commit. One of it’s primary aim is to minimize user configuration. It is currently in active development.

Why Themes?

Themes gives the user ability to generate visually different sites with the same markup documents without any configuration. It is one of the many features Yaydoc inherits from sphinx.

Now sphinx comes with 10 built in themes but there are much more custom themes available on PyPI, the official Python package repository. To use these custom themes, sphinx requires some setup. But Yaydoc being an automated system needs to performs those tasks automatically.

To use a custom theme which has been installed, sphinx needs to know the name of the theme and where to find it. We do that by specifying two variables in the sphinx configuration file. html_theme and html_theme_path respectively. Custom themes provide a method that can be called to get the html_theme_path of the theme. Usually that method is named get_html_theme_path . But that is not always the case. We have no way find the appropriate method automatically.

So how do we get the path of an installed theme just by it’s name and how do we add it to the generated configuration file.

The configuration file is generated by the sphinx-quickstart command which Yaydoc uses to initialize the documentation directory. We can override the default generated files by providing our own project templates. The templates are based on the Jinja2 template engine.

Firstly, I replaced

html_theme = ‘alabaster’

With

html_theme = ‘{{ html_theme }}’

This provides us the ability to pass the name of the theme as a parameter to sphinx-quickstart. Now the user has an option to choose between 10 built-in themes. For custom themes however there is a different story. I had to solve two major issues.

  • The name of the package and the theme may differ.
  • We also need the absolute path to the theme.

The following code snippet solves the above mentioned problems.

{% if html_theme in (['alabaster', 'classic', 'sphinxdoc', 'scrolls',
'agogo', 'traditional', 'nature', 'haiku',
'pyramid', 'bizstyle'])
%}
# Theme is builtin. Just set the name
html_theme = '{{ html_theme }}'
{% else %}
# Theme is a custom python package. Lets install it.
import pip
exitcode = pip.main(['install', '{{ html_theme }}'])
if exitcode:
    # Non-zero exit code
    print("""{0} is not available on pypi. Please ensure the theme can be installed using 'pip install {0}'.""".format('{{ html_theme }}'), file=sys.stderr)
else:
    import {{ html_theme }}
    def get_path_to_theme():
        package_path = os.path.dirname({{ html_theme }}.__file__)
        for root, dirs, files in os.walk(package_path):
            if 'theme.conf' in files:
                return root
    path_to_theme = get_path_to_theme()
    if path_to_theme is None:
        print("\n{0} does not appear to be a sphinx theme.".format('{{ html_theme }}'), file=sys.stderr)
        html_theme = 'alabaster'
    else:
        html_theme = os.path.basename(path_to_theme)
        html_theme_path = [os.path.abspath(os.path.join(path_to_theme, os.pardir))]
{% endif %}

It performs the following tasks in order:

  • It first checks if the provided theme is one of the built in themes. If that is indeed the case, we just set the html_theme config value to the name of the theme.
  • Otherwise, It installs the package using pip.
  • Now __file__ has a special meaning in python. It returns us the path of the module. We use it to get the path of the installed package.
  • Now each sphinx theme must have a file named `theme.conf` which defines several properties of the theme. We do a recursive search for that file.
  • We set html_theme to be the name of the directory which contains that file, and html_theme_path to be it’s parent directory.

Now let’s see everything in action. Here are four pages created by Yaydocs from a single markup document with no user configuration.

 

Now you can choose between many of the themes available on PyPI. You can even create your own theme. Follow this blog to get more insights and latest news about Yaydoc.

 

How to teach SUSI skills calling an External API

SUSI is an intelligent  personal assistant. SUSI can learn skills to understand and respond to user queries better. A skill is taught using rules. Writing rules is an easy task and one doesn’t need any programming background too. Anyone can start contributing. Check out these tutorials and do watch this video to get started and start teaching susi.

SUSI can be taught to call external API’s to answer user queries.

While writing skills we first mention string patterns to match the user’s query and then tell SUSI what to do with the matched pattern. The pattern matching is similar to regular expressions and we can also retrieve the matched parameters using $<parameter number>$ notation.

Example :

 My name is *
 Hi $1$!

When the user inputs “My name is Uday” , it is matched with “My name is *” and “Uday” is stored in $1$. So the output given is “Hi Uday!”.

SUSI can call an external API to reply to user query. An API endpoint or url when called must return a JSON or JSONP response for SUSI to be able to parse the response and retrieve the answer.

Rule Format for a skill calling an external API

The rule format for calling an external API is :

<regular expression for pattern matching>
!console: <return answer using $object$ or $required_key$>
{
“url”: “<API endpoint or url>”,
“path”: “$.<key in the API response to find the answer>”,

}
eol
  • Url is the API endpoint to be called which returns a JSON or JSONP response.
    The parameters to the url if any can be added using $$ notation.
  • Path is used to help susi know where to look for the answer in the returned response.
    If the path points to a root element, then the answer is stored in $object$, otherwise we can query $key$ to get the answer which is a value to the key under the path.
  • eol or end of line indicates the end of the rule.

Understanding the Path Attribute

Let us understand the Path attribute better through some test cases.

In each of the test cases we discuss what the path should be and how to retrieve the answer for a given required answer from the json response of an API.

  1. API response in json :

    { 
       “Key1” : “Value1”
    }

Required answer : Value1
Path : “$.Key1    =>   Retrieve Answer:  $object$

 

  1. API response in json :

    { 
      “Key1” : [{“Key11” : “Value11”}]
    }

Required answer : Value11
Path : $.Key1[0]   =>  Retrieve Answer: $Key11$
Path : $.Key1[0].Key11   => Retrieve Answer: $object$

 

  1. API response in json :

    { 
      “Key1” : {“Key11” : “Value11”}
    }


Required answer : Value11
Path : $.Key1  => Retrieve Answer:  $Key11$
Path : $.Key1.Key11  => Retrieve Answer: $object$

 

  1. API response in json :
{ 
  “Key1” : {
               “Key11” : “Value11”,
               “Key12” : “Value12”
           }
}

Required answer : Value11 , Value12
Path : $.Key1  => Retrieve Answer:  $Key11$ , $Key12$

Where to write these rules?

Now, since we know how to write rules let’s see where to write them.

We use etherpads to write and test rules and once we finish testing our rule we can push those rules to the repo.

Steps to open, write and test rules:

  1. Open a new etherpad with a desired name <etherpad name> at http://dream.susi.ai/
  2. Write your skills code in the etherpad following the code format explained above.
  3. Now, to test your skill let’s chat with susi. Start a conversation with susi at http://susi.ai/chat to test your skills.
  4. Load your skills by typing dream <etherpad name> and wait for a response saying dreaming enabled for <etherpad name>
  5. Test your skill and follow step 4 every time you make changes to the code in your etherpad.
  6. After you are done testing, type stop dreaming and if you are satisfied with your skill do send a PR to help susi learn.

Examples

Let us try an example to understand this better.

1. Plot of a TV Show

Tvmaze is an open  TV API that provides information about tv shows. Let us write a rule to know the plot of a tv show. We can find many such APIs. Check out this link listing few of them.

  1.  Open an etherpad at http://dream.susi.ai/ named tvshowplot. 

  2.   Enter the code to query plot of a TV show in the etherpad at                           http://dream.susi.ai/p/tvshowplot

  * plot of *|* summary of *
  !console:$object$
  {
      "url":"http://api.tvmaze.com/singlesearch/shows?q=$2$",
      "path":"$.summary"
  }
  eol
  1. Now lets test our skill by starting a conversation with susi at http://susi.ai/chat.
  • User Query: dream tvshowplot
    Response:  dreaming enabled  for tvshowplot
  • User Query: what is the plot of legion
    Response: Legion introduces the story of David Haller: Since he was a teenager, David has struggled with mental illness. Diagnosed as schizophrenic, David has been in and out of psychiatric hospitals for years. But after a strange encounter with a fellow patient, he’s confronted with the possibility that the voices he hears and the visions he sees might be real. He’s based on the Marvel comics character Legion, the son of X-Men founder Charles Xavier (played by Patrick Stewart and James McAvoy in the films), first introduced in 1985.

Intermediate Processing:

Pattern Matching : $1$ = “what is the” ; $2$ = “legion”

Url : http://api.tvmaze.com/singlesearch/shows?q=legion

API response:

{
   "id": 6393,
   "url": "http:\/\/www.tvmaze.com\/shows\/6393\/legion",
   "name": "Legion",
   "type": "Scripted",
   "language": "English",
   "genres": [
     "Drama",
     "Action",
     "Science-Fiction"
   ], 
   "summary": "<p><strong>Legion<\/strong> introduces the story of David Haller: Since he was a teenager, David has struggled with mental illness. Diagnosed as schizophrenic, David has been in and out of psychiatric hospitals for years. But after a strange encounter with a fellow patient, he's confronted with the possibility that the voices he hears and the visions he sees might be real. He's based on the Marvel comics character Legion, the son of X-Men founder Charles Xavier (played by Patrick Stewart and James McAvoy in the films), first introduced in 1985.<\/p>",
   "updated": 1491955072,  
 }

Note: The API response has been trimmed to show only the relevant content.

Path : $.summary

Retrieving Answer: so our required answer in the api response is under the key summary and is retrieved using $object$ since it is a root element.

 

Screenshots:

2. Cooking Recipes

Let us try it out with another API.
Recipepuppy is an cooking recipe API where users can query various recipes.

  1.  Open a etherpad at http://dream.susi.ai/ named recipe. 

  2.   Enter the code to query a recipe in the etherpad at  http://dream.susi.ai/p/recipe
#Gives recipes and links to cook a dish
* cook *
!console:<p>To cook  <strong>$title$</strong> : <br>The ingridients required are: $ingredients$. <br> For instruction to prepare the dish $href$ </p>
{
  "url":"http://www.recipepuppy.com/api/?q=$2$",
  "path":"$.results"
}
eol
  1. Now lets test our skill by starting a conversation with susi at http://susi.ai/chat.
  • User Query: dream recipe
    Response:  dreaming enabled  for recipe
  • User Query: how to cook chicken biryani
    Response: To cook Chicken Biryani Recipe :
    The ingridients required are: chicken, seeds, chicken broth, rice, butter, peas, garlic, red onions, cardamom, curry paste, olive oil, tomato, coriander, cumin, brown sugar, tumeric.
    For instruction to prepare the dish Click Here!

Intermediate Processing:

Pattern Matching : $1$ = “how to” ; $2$ = “chicken biryani”

Url : http://www.recipepuppy.com/api/?q=chicken biryani

API response:

{
   "title": "Recipe Puppy",
   "version": 0.1,
   "href": "http:\/\/www.recipepuppy.com\/",
   "results": [
     {
       "title": "Chicken Biryani Recipe",
       "href": "http:\/\/www.grouprecipes.com\/53040\/chicken-biryani.html",
       "ingredients": "chicken, seeds, chicken broth, rice, butter, peas, garlic, red onions, cardamom, curry paste, olive oil, tomato, coriander, cumin, brown sugar, tumeric",
       "thumbnail": "http:\/\/img.recipepuppy.com\/413822.jpg"
     },
 ]
 }

Note: The API response has been trimmed to show only the relevant content.

Path : $.results[0]

Retrieving Answer: so our required answer in the api response is under the key results and since it’s an array we are using the first element of the array and since the element is a dictionary too we use its keys correspondingly to answer. The $href$ is rendered as “Click Here” hyperlinked to the actual url.

 

Screenshots:

 

We have successfully taught susi a skill which tells users about the plot of a tv show and a skill to query recipes.
Cheers!
Following similar procedure, we can make use of other APIs and teach susi several new skills.

 

Deploying Susi Server on Google Cloud with Kubernetes

Susi (acronym for Scientific User Support Intelligence) is an advanced AI made by people at FOSSASIA. It is an AI made by the people and for the people.
Susi is an Open Source Project under LGPL Licence.

SUSI.AI already has many Skills and anyone can add new skills through simple console rules.

If you want to participate in the development of the SUSI server you can start by learning to deploy it on a cloud system like Google Cloud.

This way whenever you make a change to Susi Server, you can test it out on various Susi Apps instantly.

Google Cloud with Kubernetes provide this ability. Let’s dig deep into what is Google Cloud Platform and Kubernetes.

What is Google Cloud Platform ?

Google Cloud Platform lets you build and host applications and websites, store data, and analyze data on Google’s scalable infrastructure.
Google Cloud Platform (at the time of writing this article) also provides free credits worth $300 for 1 year for testing out the Platform and test your applications.

What is Kubernetes ?

Kubernetes is an open-source system for automatic deployment, management and scaling of containerized applications. It makes it easy to roll out updates to your application with simple commands from your development machine and scale horizontally easily by adding more clusters as demand increase.

Deploying Susi Server on Kubernetes

Deploying Susi Server on Kubernetes is a fairly easy task. Follow up the steps to get it running.

Create a Google Cloud Account

Sign up for a Google Cloud Account (https://cloud.google.com/free-trial/) and get 300$ credits for initial use.

Create a New Project

After successful sign up, create a new project on Google Cloud Console.
Let’s name it Susi-Kubernetes . 

You will be provided a ProjectID. Remember it for further reference.

Install Google Cloud SDK and kubectl

Go to https://cloud.google.com/sdk/ and see instructions to setup Google Cloud SDK on your respective OS.

After Google Cloud SDK install, run

gcloud components install kubectl

This will install kubectl for interacting with Kubernetes.

Login and setup project

  1. Login to your Google Cloud Account using
$ gcloud auth login

2. List all the projects using

$ gcloud config list project
[core]
project = <PROJECT_ID>

3. Select your project

$ gcloud config set project <PROJECT_ID>

4. Install JDK8 for susi_server setup and set it as default.

5. Clone your fork of the Susi Server Repository

$ git clone https://github.com/<your_username>/susi_server.git
$ cd susi_server/

6. Build project and run Susi Server locally

$ ./gradlew build
$ bin/start.sh

Susi server must have been started started and web interface is accessible on http://localhost:4000

Install Docker and build Docker image for Susi

  1. Install Docker.
    Debian and derivatives:  sudo apt install docker
    Arch Linux:   sudo pacman -S docker 
  2. Build Docker Image for Susi
    $ docker build -t gcr.io/<Project_id>/susi:v1 .
  3. Push Image to Google Container Registry private to your project.
$ gcloud docker -- push gcr.io/<Project_id>/susi:v1

Create Cluster and Deploy your Susi Server there

  1. Create Cluster. You may specify different zone, number of nodes and machine type depending upon requirement.
    $ gcloud container clusters create <Cluster-Name> --num-nodes 2 --machine-type n1-standard-1 --zone us-central1-c
  2. Run your deployment. You may specify any name for deployment.
    $ kubectl run <deployment_name> --image=gcr.io/<Project_id>/susi:v1 --port=80
    $ kubectl get deployments
    $ kubectl expose deployment susi --type=LoadBalancer
  3. Check your deployment and get Public IP for Access.
    $ kubectl get services
    NAME         CLUSTER-IP     EXTERNAL-IP     PORT(S)       AGE
    kubernetes   10.3.240.1     <none>          443/TCP        1d
    susi         10.3.241.145   <PUBLIC_IP>     80:31155/TCP   1d
  4. Go to provided public IP to check, if Susi Server is running.

Congratulations, you successfully setup Susi Server on Google Cloud with Kubernetes.

Updating the deployment

Next step is to update deployment when you wish to roll out changes. To do so.

Build Docker Image and Push it to Google Container Registry

$ docker build -t gcr.io/<Project_Id>/susi:v2 .
$ gcloud docker -- push gcr.io/<Project_Id>/susi:v2

Update Deployment Image with Kubernetes

$ kubectl set image deployment/<Deployment_Name> \
  <Deployment_Name>=gcr.io/<Project_id>/susi:v2
deployment "<Deployment_Name>" image updated

Go to public ip to see the changes.

That’s it. Now, you have fully running Susi Server on your own Google Cloud Cluster using Kubernetes.

Susi AI Skill Development

What is Susi?

Susi is an open source intelligent personal assistant which has the capability to learn and respond better to queries. It is also capable of making to-do lists, setting alarms, providing weather and traffic info all in real time. Susi responds based on skills.

What is a skill? How do we teach a skill?

A skill is a piece of code which performs a set of actions in order to respond to the user’s query. These skills are based on pattern matching which help them mapping the user’s query to a specific skill and responding accordingly. Teaching a skill to Susi is surprisingly very easy to implement. One can take a look at the Susi Skill Development Tutorial and a video workshop by Michael Christen.

I will try to give a basic idea on how to create a skill, it’s basic structure and some of the skills I developed in the first week.

Prepare to create a skill:

  • Head over to http://dream.susi.ai
  • Create a etherpad with some relevant name
  • Delete all text currently present in there
  • Start writing your skill

Adding to this, for testing a skill one can head over to Susi Web Chat Interface.

Basic Structure for calling an API:

<Regular expression to be matched here>

!console:<response given to the user>
 {
 "url":"<API endpoint>",
 "path":"<Json path here>"
 }
 eol

So, let me explain this line by line.

  1. The regular expression is the one to which the user’s query is matched first.
  2. The console is meant to output the actual response the user sees as response.
  3. In place of the “url”, the API endpoint is passed in.
  4. “path” here specifies how we traverse through the response Json or Jsonp to get the object, starts with “$.”.
  5. At last, “eol” which is the end-of-line marks the end of a skill.

Let’s take an example for better understanding of this:

random gif
!console: $url$
{
    "url" : "http://api.giphy.com/v1/gifs/trending?api_key=dc6zaTOxFJmzC",
    "path" : "$.data[0].images.fixed_height"
}
eol 

 

This skill responds with a link to a random gif.

Steps involved:

  1. Match the string “random gif” with the user’s query.
  2. On successful match, make an API call to the API endpoint specified in “url”
  3. On response, extract the object at the specified path in the json under “path”
  4. Respond to the user with the “url” key’s value which would here be an URL of a GIF.

Let’s try it out on Susi Web Chat. For this, you will first have to load your skill using the dream command followed by etherpad name: dream <etherpad name>. And then you can start testing your skill.

So, we queried “random gif” and we got a response “Click Here!”. The complete URL didn’t show up because all the URLs are currently parsed and a hyperlink for each is created. So try clicking on it to find a GIF.

 

Now, let’s look at one more skill I developed during this period.

# Returns the name of the president of a country

 president of *|who is the president of *| president *
 !console:$plaintext$
 {      "url":"https://api.wolframalpha.com/v2/query?input=president+$1$&output=JSON&appid=9WA6XR-26EWTGEVTE&includepodid=Result",
   "path" : "$.queryresult.pods[0].subpods[0]"
 }
 eol

 

Let’s understand this step by step:

  1. We have here “president of *|who is the president of *| president *”, which means the user’s query matches with anyone of the following because of the use of pipe symbol “|”. The “*” here replaces a word or a list of words, which can be accessed like: “${index}$”  where index is replaced by the position of the “*” in the expression starting from 1.
  2. Now we have something new in the URL. See that  $1$  inside the URL? On runtime, that is replaced with the content of the “*” variable. So if a user puts in query like: “president of usa”, “usa” is mapped to $1$ and is replaced in the URL and appropriate API request is made.
  3. Then the path is traversed in the json response and the value of the “plaintext” key is used to respond to the user.

 

It’s now time to try it out on Susi Web Chat.

So, we got our desired response here, i.e., the name of the president of usa.

Displaying error notifications in whatsTrending? app

The issue I am solving in the whatsTrending app is to display error notifications when the date fields and the count field are not validated and when a user enters invalid data. Specifically we want to display error notifications for junk values and dates with formats other than YYYY-MM-DD and any other invalid data in the whatsTrending app’s filter option.

The whatsTrending app is a web app that shows the top trending hashtags of twitter messages in a given date range using tweets collected by the loklak search engine. Users can also limit the number of top hash tags they want to see and use filters with start and end dates.

App to know trending hashtags on twitter

What is the problem? The date fields and the count field are not validated which means junk values and date with formats other than YYYY-MM-DD do not show any error.

So how can the problem be solved? Well the format (pattern) of the date can be verified by regular expression. A regular expression describes a pattern in a given text.So the format checking problem can be described as finding the pattern YYYY-MM-DD in the input date where Y, M and D are numbers.The Regex should specify that the pattern should be present at the beginning of the text.

More detailed information about regex can be found here.

The regex for this pattern is :

/^\d{4}-\d{2}-\d{2}$/

The pattern says there should be 4 numbers followed by ‘-’ then two numbers then again ‘-’ and then again two numbers.

This can be implemented the following way :

$scope.isValidDate = function(dateString) {
        var regEx = /^\d{4}-\d{2}-\d{2}$/;
        if (dateString.match(regEx) === null) {
            return false;
        }

        dateComp = dateString.split('-');
        var i=0;
        for (i=0; i<dateComp.length; i++) { dateComp[i] = parseInt(dateComp[i]); } if (dateComp.length > 3) {
            return false;
        }

        if (dateComp[1] > 12 || dateComp[1] <= 0) { return false; } if (dateComp[2] > 31 || dateComp[2] <= 0) { return false; } if (((dateComp[1] === 4) || (dateComp[1] === 6) || (dateComp[1] === 9) || (dateComp[1] === 11)) && (dateComp[2] > 30)) {
            return false;
        }

        if (dateComp[1] ===2) {
            if (((dateComp[0] % 4 === 0) && (dateComp[0] % 100 !== 0)) || (dateComp[0] % 400 === 0)) {
                if (dateComp[2] > 29) {
                    return false;
                }
            } else {
                if (dateComp[2] > 28) {
                    return false;
                }
            }
        }

        return true;
    }

So the first part of the code checks for the above mentioned pattern in the input. If not found it returns false.If found then we split the entire date into a list containing year, month and day and the remaining part if any is removed.Each component is converted to integer.Then further validation is done on the month and day as can be seen from the code above.The range of the month and date is checked.Also leap year checking is done.

In the same way the count field is also validated. The regex for this field is much simpler. We just need to check that the input consists only of numbers and nothing else.
So the regex for this is :

 /^[0-9]+$/

This means repetition of digits in the range 0-9.We search for this pattern in the text. If found we return true else false.The function for this is as follows:

$scope.isNumber = function(numString) {
        var regEx = /^[0-9]+$/;
        return String(numString).match(regEx) != null;
    }

Next we need to call these function and see if their is any error. If there is an error we need to display it.This can be done using a modal. Bootstrap has got an inbuilt modal which can be invoked using javascript.

Showing error using modal

So at first we need to define the modal and its content (empty if necessary as in this case)using HTML.The HTML code for this can be found here.

A small yet nice tutorial on Bootstrap modal can be found here
Next we need to set the content of the modal and invoke it from our JS file on encountering an error.

$scope.displayErrorModal = function(val, type) {
        if (type === 0) {
            if (!$scope.isValidDate(val)) {
                $scope.loading = false;
                $('.modal-body').html('Please enter valid date in YYYY-MM-DD format'); 
                $('#myModal').modal('show');                 
                return false; 
            } 
         } else { 
             if (!$scope.isNumber(val)) { 
                 $scope.loading = false; 
                 $('.modal-body').html('Please enter a valid number'); 
                 $('#myModal').modal('show');                    
                 return false; 
             } 
         } 
         return true; 
}

The above function accepts a parameter val and another parameter type.The parameter type tells what validation needs to be performed, date validation or number validation and calls previous two methods accordingly and passes val which is the value to validated.If any of the validation fails then it sets the content of the modal using : $(‘.modal-body’).html(“your content”) and then invokes it using : $(‘#modalID’).modal(‘show’). This displays a nice modal on the page and the user is notified about the error.

So this is it for this post.Thanks for reading it.My next post will be on fixing the design of the boilerplate app.

Using Cloud storage for event exports

Open-event orga server provides the ability to the organizer to create a complete export of the event they created. Currently, when an organizer triggers the export in orga server, A celery job is set to complete the export task resulting asynchronous completion of the job. Organizer gets the download button enabled once export is ready.

Till now the main issue was related to storage of those export zip files. All exported zip files were stored directly in local storage and that even not by using storage module created under orga server.

local storage path

On a mission to solve this, I made three simple steps that I followed to solve this issue.

These three steps were:

  1. Wait for shutil.make_archive to complete archive and store it in local storage.
  2. Copy the created archive to storage ( specified by user )
  3. Delete local archive created.

The easiest part here was to make these files upload to different storage ( s3, gs, local) as we already have storage helper

def upload(uploaded_file, key, **kwargs):
    """
    Upload handler
    """

The most important logic of this issue resides to this code snippet.

    dir_path = dir_path + ".zip"
 
     storage_path = UPLOAD_PATHS['exports']['zip'].format(
         event_id = event_id
     )
     uploaded_file = UploadedFile(dir_path, dir_path.rsplit('/', 1)[1])
     storage_url = upload(uploaded_file, storage_path)
 
    if get_settings()['storage_place'] != "s3" or get_settings()['storage_place'] != 'gs':
        storage_url = app.config['BASE_DIR'] + storage_url.replace("/serve_","/")
    return storage_url

From above snippet, it is clear that we are extending the process of creating the zip. Once the zip is created we will make storage path for cloud storage and upload it. Only one thing will take the time to understand here is the last second and third line of above snippet.

if get_settings()['storage_place'] != "s3" or get_settings()['storage_place'] != 'gs':
        storage_url = app.config['BASE_DIR'] + storage_url.replace("/serve_","/")

Initial the plan was simple to serve the files through “serve_static” but then the test cases were expecting a file at this location thus I had to remove “serve_” part for local storage and then it works fine on those three steps.

Next thing on this storage process need to be discussed is the feature to delete old exports. I believe one reason to keep them would be an old backup of your event will be always there with us at our cloud storage.