Automatic Imports of Events to Open Event from online event sites with Query Server and Event Collect

One goal for the next version of the Open Event project is to allow an automatic import of events from various event listing sites. We will implement this using Open Event Import APIs and two additional modules: Query Server and Event Collect. The idea is to run the modules as micro-services or as stand-alone solutions.

Query Server
The query server is, as the name suggests, a query processor. As we are moving towards an API-centric approach for the server, query-server also has API endpoints (v1). Using this API you can get the data from the server in the mentioned format. The API itself is quite intuitive.

API to get data from query-server

GET /api/v1/search/<search-engine>/query=query&format=format

Sample Response Header

 Cache-Control: no-cache
 Connection: keep-alive
 Content-Length: 1395
 Content-Type: application/xml; charset=utf-8
 Date: Wed, 24 May 2017 08:33:42 GMT
 Server: Werkzeug/0.12.1 Python/2.7.13
 Via: 1.1 vegur

The server is built in Flask. The GitHub repository of the server contains a simple Bootstrap front-end, which is used as a testing ground for results. The query string calls the search engine result scraper scraper.py that is based on the scraper at searss. This scraper takes search engine, presently Google, Bing, DuckDuckGo and Yahoo as additional input and searches on that search engine. The output from the scraper, which can be in XML or in JSON depending on the API parameters is returned, while the search query is stored into MongoDB database with the query string indexing. This is done keeping in mind the capabilities to be added in order to use Kibana analyzing tools.

The frontend prettifies results with the help of PrismJS. The query-server will be used for initial listing of events from different search engines. This will be accessed through the following API.

The query server app can be accessed on heroku.

➢ api/list​: To provide with an initial list of events (titles and links) to be displayed on Open Event search results.

When an event is searched on Open Event, the query is passed on to query-server where a search is made by calling scraper.py with appending some details for better event hunting. Recent developments with Google include their event search feature. In the Google search app, event searches take over when Google detects that a user is looking for an event.

The feed from the scraper is parsed for events inside query server to generate a list containing Event Titles and Links. Each event in this list is then searched for in the database to check if it exists already. We will be using elastic search to achieve fuzzy searching for events in Open Event database as elastic search is planned for the API to be used.

One example of what we wish to achieve by implementing this type of search in the database follows. The user may search for

-Google Cloud Event Delhi
-Google Event, Delhi
-Google Cloud, Delhi
-google cloud delhi
-Google Cloud Onboard Delhi
-Google Delhi Cloud event

All these searches should match with “Google Cloud Onboard Event, Delhi” with good accuracy. After removing duplicates and events which already exist in the database from this list have been deleted, each event is rendered on search frontend of Open Event as a separate event. The user can click on any of these event, which will make a call to event collect.

Event Collect

The event collect project is developed as a separate module which has two parts

● Site specific scrapers
In its present state, event collect has scrapers for eventbrite and ticket-leap which, given a query, scrape eventbrite (and ticket-leap respectively) search results and downloads JSON files of each event using Loklak‘s API.
The scrapers can be developed in any form or any number of scrapers/scraping tools can be added as long as they are in alignment with the Open Event Import API’s data format. Writing tests for these against the concurrent API formats will take care of this. This part will be covered by using a json-validator​ to check against a pre-generated schema.

● REST APIs
The scrapers are exposed through a set of APIs, which will include, but not limited to,
➢ api/fetch-event : ​to scrape any event given the link and compose the data in a predefined JSON format which will be generated based on Open Event Import API. When this function is called on an event link, scrapers are invoked which collect event data such as event, meta, forms etc. This data will be validated against the generated JSON schema. The scraped JSON and directory structure for media files:
➢ api/export : to export all the JSON data containing event information into Open Event Server. As and when the scraping is complete, the data will be added into Open Event’s database as a new event.

How the Import works

The following graphic shows how the import works.




Let’s dive into the workflow. So as the diagram illustrates, the ‘search​’ functionality makes a call to api/list API endpoint provided by query-server which returns with events’ ‘Title’ and ‘Event Link’ from the parsed XML/JSON feed. This list is displayed as Open Event’s search results. Now the results having been displayed, the user can click on any of the events. When the user clicks on any event, the event is searched for in Open Event’s database. Two things happen now:

  • The event page loads if the event is found.
  • If the event does not already exist in the database, clicking on any event will

➢ Insert this event’s title and link in the database and get the event_id

➢ Make a call to api/fetch-event in event-collect which then invokes a site-specific scraper to fetch data about the event the user has chosen

➢ When the data is scraped, it is imported into Open Event database using the previously generated event_id. The page will be loaded using jquery ajax ​as and when the scraping is done.​When the imports are done, the search page refreshes with the new results. The Open Event Orga Server exposes a well documented REST API that can be used by external services to access the data.

ember.js – the right choice for the Open Event Front-end

With the development of the API server for the Open Event project we needed to decide which framework to choose for the new Open Event front-end. With the plethora of javascript frameworks available, it got really difficult to decide, which one is actually the right choice. Every month a new framework arrives, and the existing ones keep actively updating themselves often. We decided to go with Ember.js. This article covers the emberJS framework and highlights its advantages over others and  demonstrates its usefulness.

EmberJS is an open-source JavaScript application front end framework for creating web applications, and uses Model-View-Controller (MVC) approach. The framework provides universal data binding. It’s focus lies on scalability.

Why is Ember JS great?

Convention over configuration – It does all the heavy lifting.

Ember JS mandates best practices, enforces naming conventions and generates the boilerplate code for the various components and routes itself. This has advantages other than uniformity. It is easier for other developers to join the project and start working right away, instead of spending hours on existing codebase to understand it, as the core structure of all ember apps is similar. To get an ember app started with the basic route, user doesn’t has to do much, ember does all the heavy lifting.

ember new my-app
ember server

After installing this is all it takes to create your app.

Ember CLI

Similar to Ruby on Rails, ember has a powerful CLI. It can be used to generate boiler plate codes for components, routes, tests and much more. Testing is possible via the CLI as well.

ember generate component my-component
ember generate route my-route
ember test

These are some of the examples which show how easy it is to manage the code via the ember CLI.

Tests.Tests.Tests.

Ember JS makes it incredibly easy to use test-first approach. Integration tests, acceptance tests, and unit tests are in built into the framework. And can be generated from the CLI itself, the documentation on them is well written and it’s really easy to customise them.

ember generate acceptance-test my-test

This is all it takes to set up the entire boiler plate for the test, which you can customise

Excellent documentation and guides

Ember JS has one of the best possible documentations available for a framework. The guides are a breeze to follow. It is highly recommended that, if starting out on ember, make the demo app from the official ember Guides. That should be enough to get familiar with ember.

Ember Guides is all you need to get started.

Ember Data

It sports one of the best implemented API data fetching capabilities. Fetching and using data in your app is a breeze. Ember comes with an inbuilt data management library Ember Data.

To generate a data model via ember CLI , all you have to do is

ember generate model my-model

Where is it being used?

Ember has a huge community and is being used all around. This article focuses on it’s salient features via the example of Open Event Orga Server project of FOSSASIA. The organizer server is primarily based on FLASK with jinja2 being used for rendering templates. At the small scale, it was efficient to have both the front end and backend of the server together, but as it grew larger in size with more refined features it became tough to keep track of all the minor edits and customizations of the front end and the code started to become complex in nature. And that gave birth to the new project Open Event Front End which is based on ember JS which will be covered in the next week.

With the orga server being converted into a fully functional API, the back end and the front end will be decoupled thereby making the code much cleaner and easy to understand for the other developers that may wish to contribute in the future. Also, since the new front end is being designed with ember JS, it’s UI will have a lot of enhanced features and enforcing uniformity across the design would be much easier with the help of components in ember. For instance, instead of making multiple copies of the same code, components are used to avoid repetition and ensure uniformity (change in one place will reflect everywhere)

<.div class="{{if isWide 'event wide ui grid row'}}">
  {{#if isWide}}
    {{#unless device.isMobile}}
      <.div class="ui card three wide computer six wide tablet column">
        <.a class="image" href="{{href-to 'public' event.identifier}}">
          {{widgets/safe-image src=(if event.large event.large event.placeholderUrl)}}
        <./a>
      <./div>
    {{/unless}}
  {{/if}}
  <.div class="ui card {{unless isWide 'event fluid' 'thirteen wide computer ten wide tablet sixteen wide mobile column'}}">
    {{#unless isWide}}
      <.a class="image" href="{{href-to 'public' event.identifier}}">
        {{widgets/safe-image src=(if event.large event.large event.placeholderUrl)}}
      <./a>
    {{/unless}}
    <.div class="main content">
      <.a class="header" href="{{href-to 'public' event.identifier}}">
        <.span>{{event.name}}<./span>
      <./a>
      <.div class="meta">
        <.span class="date">
          {{moment-format event.startTime 'ddd, MMM DD HH:mm A'}}
        <./span>
      <./div>
      <.div class="description">
        {{event.shortLocationName}}
      <./div>
    <./div>
    <.div class="extra content small text">
      <.span class="right floated">
        <.i role="button" class="share alternate link icon" {{action shareEvent event}}><./i>
      <./span>
      <.span>
        {{#if isYield}}
          {{yield}}
        {{else}}
          {{#each tags as |tag|}}
            <.a>{{tag}}<./a>
          {{/each}}
        {{/if}}
      <./span>
    <./div>
  <./div>
<./div>

This is a perfect example of the power of components in ember, this is a component for event information display in a card format which in addition to being rendered differently for various screen sizes can act differently based on passed parameters, thereby reducing the redundancy of writing separate components for the same.

Ember is a step forward towards the future of the web. With the help of Babel.js it is possible to write ES6/2015 syntax and not worry about it’s compatibility with the browsers. It will take care of it.

This is perfectly valid and will be compatible with majority of the supported browsers.

actions: {
  submit() {
    this.onValid(()=> {
    });
  }
}

 

Some references used for the blog article:

  1. https://www.codeschool.com/blog/2015/10/26/7-reasons-to-use-ember-js/
  2. https://www.quora.com/What-are-the-advantages-of-using-Ember-js
  3. Official Ember Guides: https://guides.emberjs.com

 
This page/product/etc is unaffiliated with the Ember project. Ember is a trademark of Tilde Inc

DetachedInstanceError: dealing with Celery, Flask’s app context and SQLAlchemy

In the open event server project, we had chosen to go with celery for async background tasks. From the official website,

What is celery?

Celery is an asynchronous task queue/job queue based on distributed message passing.

What are tasks?

The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing.

After the tasks had been set up, an error constantly came up whenever a task was called

The error was:

DetachedInstanceError: Instance <User at 0x7f358a4e9550> is not bound to a Session; attribute refresh operation cannot proceed

The above error usually occurs when you try to access the session object after it has been closed. It may have been closed by an explicit session.close() call or after committing the session with session.commit().

The celery tasks in question were performing some database operations. So the first thought was that maybe these operations might be causing the error. To test this theory, the celery task was changed to :

@celery.task(name='lorem.ipsum')
def lorem_ipsum():
    pass

But sadly, the error still remained. This proves that the celery task was just fine and the session was being closed whenever the celery task was called. The method in which the celery task was being called was of the following form:

def restore_session(session_id):
    session = DataGetter.get_session(session_id)
    session.deleted_at = None
    lorem_ipsum.delay()
    save_to_db(session, "Session restored from Trash")
    update_version(session.event_id, False, 'sessions_ver')


In our app, the app_context was not being passed whenever a celery task was initiated. Thus, the celery task, whenever called, closed the previous app_context eventually closing the session along with it. The solution to this error would be to follow the pattern as suggested on http://flask.pocoo.org/docs/0.12/patterns/celery/.

def make_celery(app):
    celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
    celery.conf.update(app.config)
    task_base = celery.Task

    class ContextTask(task_base):
        abstract = True

        def __call__(self, *args, **kwargs):
            if current_app.config['TESTING']:
                with app.test_request_context():
                    return task_base.__call__(self, *args, **kwargs)
            with app.app_context():
                return task_base.__call__(self, *args, **kwargs)

    celery.Task = ContextTask
    return celery

celery = make_celery(current_app)


The __call__ method ensures that celery task is provided with proper app context to work with.

 

Event-driven programming in Flask with Blinker signals

Setting up blinker:

The Open Event Project offers event managers a platform to organize all kinds of events including concerts, conferences, summits and regular meetups. In the server part of the project, the issue at hand was to perform multiple tasks in background (we use celery for this) whenever some changes occurred within the event, or the speakers/sessions associated with the event.

The usual approach to this would be applying a function call after any relevant changes are made. But the statements making these changes were distributed all over the project at multiple places. It would be cumbersome to add 3-4 function calls (which are irrelevant to the function they are being executed) in so may places. Moreover, the code would get unstructured with this and it would be really hard to maintain this code over time.

That’s when signals came to our rescue. From Flask 0.6, there is integrated support for signalling in Flask, refer http://flask.pocoo.org/docs/latest/signals/ . The Blinker library is used here to implement signals. If you’re coming from some other language, signals are analogous to events.

Given below is the code to create named signals in a custom namespace:


from blinker import Namespace

event_signals = Namespace()
speakers_modified = event_signals.signal('event_json_modified')

If you want to emit a signal, you can do so by calling the send() method:


speakers_modified.send(current_app._get_current_object(), event_id=event.id, speaker_id=speaker.id)

From the user guide itself:

“ Try to always pick a good sender. If you have a class that is emitting a signal, pass self as sender. If you are emitting a signal from a random function, you can pass current_app._get_current_object() as sender. “

To subscribe to a signal, blinker provides neat decorator based signal subscriptions.


@speakers_modified.connect
def name_of_signal_handler(app, **kwargs):

 

Some Design Decisions:

When sending the signal, the signal may be sending lots of information, which your signal may or may not want. e.g when you have multiple subscribers listening to the same signal. Some of the information sent by the signal may not be of use to your specific function. Thus we decided to enforce the pattern below to ensure flexibility throughout the project.


@speakers_modified.connect
def new_handler(app, **kwargs):
# do whatever you want to do with kwargs['event_id']

In this case, the function new_handler needs to perform some task solely based on the event_id. If the function was of the form def new_handler(app, event_id), an error would be raised by the app. A big plus of this approach, if you want to send some more info with the signal, for the sake of example, if you also want to send speaker_name along with the signal, this pattern ensures that no error is raised by any of the subscribers defined before this change was made.

When to use signals and when not ?

The call to send a signal will of course be lying in another function itself. The signal and the function should be independent of each other. If the task done by any of the signal subscribers, even remotely affects your current function, a signal shouldn’t be used, use a function call instead.

How to turn off signals while testing?

When in testing mode, signals may slow down your testing as unnecessary signals subscribers which are completely independent from the function being tested will be executed numerous times. To turn off executing the signal subscribers, you have to make a small change in the send function of the blinker library.

Below is what we have done. The approach to turn it off may differ from project to project as the method of testing differs. Refer https://github.com/jek/blinker/blob/master/blinker/base.py#L241 for the original function.


def new_send(self, *sender, **kwargs):
    if len(sender) == 0:
        sender = None
    elif len(sender) > 1:
        raise TypeError('send() accepts only one positional argument, '
                        '%s given' % len(sender))
    else:
        sender = sender[0]
    # only this line was changed
    if not self.receivers or app.config['TESTING']:
        return []
    else:
        return [(receiver, receiver(sender, **kwargs))
                for receiver in self.receivers_for(sender)]
                
Signal.send = new_send

event_signals = Namespace
# and so on ....

That’s all for now. Have some fun signaling 😉 .

 

Set proper content type when uploading files on s3 with python-magic

In the open-event-orga-server project, we had been using Amazon s3 storage for a long time now. After some time we encountered an issue that no matter what the file type was, the Content-Type when retrieving this files from the storage solution was application/octet-stream.

An example response when retrieving an image from s3 was as follows:


Accept-Ranges →bytes
Content-Disposition →attachment; filename=HansBakker_111.jpg
Content-Length →56060
Content-Type →application/octet-stream
Date →Fri, 09 Sep 2016 10:51:06 GMT
ETag →"964b1d839a9261fb0b159e960ceb4cf9"
Last-Modified →Tue, 06 Sep 2016 05:06:23 GMT
Server →AmazonS3
x-amz-id-2 →1GnO0Ta1e+qUE96Qgjm5ZyfyuhMetjc7vfX8UWEsE4fkZRBAuGx9gQwozidTroDVO/SU3BusCZs=
x-amz-request-id →ACF274542E950116

 

As seen above instead of providing image/jpeg as the Content-Type, it provides the Content-Type as application/octet-stream.While uploading the files, we were not providing the content type explicitly, which seemed to be the root of the problem.

It was decided that we would be providing the content type explicitly, so it was time to choose an efficient library to determine the file type based on the content of the file and not the file extension. After researching through the available libraries python-magic seemed to be the obvious choice. python-magic is a python interface to the libmagic file type identification library. libmagic identifies file types by checking their headers according to a predefined list of file types.

Here is an example straight from python-magic‘s readme on its usage:


>>> import magic
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
>>> magic.from_buffer(open("testdata/test.pdf").read(1024))
'PDF document, version 1.2'
>>> magic.from_file("testdata/test.pdf", mime=True)
'application/pdf'

 

Given below is a code snippet for the s3 upload function in the project:


file_data = file.read()
    file_mime = magic.from_buffer(file_data, mime=True)
    size = len(file_data)
    # k is defined as  k = Key(bucket) in previous code
    sent = k.set_contents_from_string(
        file_data,
        headers={
            'Content-Disposition': 'attachment; filename=%s' % filename,
            'Content-Type': '%s' % file_mime
        }
    ) 

 

One thing to note that as python-magic uses libmagic-dev as a dependency and many of the distros do not come with libmagic-dev pre-installed, make sure you install libmagic-dev explicitly. (Installation instructions may vary per distro)


sudo apt-get install libmagic-dev

Voila !! Now when retrieving each and every file you’ll get the proper content type.

 

Using custom themes with Yaydoc to build documentation

What is Yaydoc?

Yaydoc aims to be a one stop solution for all your documentation needs. It is continuously integrated to your repository and builds the site on each commit. One of it’s primary aim is to minimize user configuration. It is currently in active development.

Why Themes?

Themes gives the user ability to generate visually different sites with the same markup documents without any configuration. It is one of the many features Yaydoc inherits from sphinx.

Now sphinx comes with 10 built in themes but there are much more custom themes available on PyPI, the official Python package repository. To use these custom themes, sphinx requires some setup. But Yaydoc being an automated system needs to performs those tasks automatically.

To use a custom theme which has been installed, sphinx needs to know the name of the theme and where to find it. We do that by specifying two variables in the sphinx configuration file. html_theme and html_theme_path respectively. Custom themes provide a method that can be called to get the html_theme_path of the theme. Usually that method is named get_html_theme_path . But that is not always the case. We have no way find the appropriate method automatically.

So how do we get the path of an installed theme just by it’s name and how do we add it to the generated configuration file.

The configuration file is generated by the sphinx-quickstart command which Yaydoc uses to initialize the documentation directory. We can override the default generated files by providing our own project templates. The templates are based on the Jinja2 template engine.

Firstly, I replaced

html_theme = ‘alabaster’

With

html_theme = ‘{{ html_theme }}’

This provides us the ability to pass the name of the theme as a parameter to sphinx-quickstart. Now the user has an option to choose between 10 built-in themes. For custom themes however there is a different story. I had to solve two major issues.

  • The name of the package and the theme may differ.
  • We also need the absolute path to the theme.

The following code snippet solves the above mentioned problems.

{% if html_theme in (['alabaster', 'classic', 'sphinxdoc', 'scrolls',
'agogo', 'traditional', 'nature', 'haiku',
'pyramid', 'bizstyle'])
%}
# Theme is builtin. Just set the name
html_theme = '{{ html_theme }}'
{% else %}
# Theme is a custom python package. Lets install it.
import pip
exitcode = pip.main(['install', '{{ html_theme }}'])
if exitcode:
    # Non-zero exit code
    print("""{0} is not available on pypi. Please ensure the theme can be installed using 'pip install {0}'.""".format('{{ html_theme }}'), file=sys.stderr)
else:
    import {{ html_theme }}
    def get_path_to_theme():
        package_path = os.path.dirname({{ html_theme }}.__file__)
        for root, dirs, files in os.walk(package_path):
            if 'theme.conf' in files:
                return root
    path_to_theme = get_path_to_theme()
    if path_to_theme is None:
        print("\n{0} does not appear to be a sphinx theme.".format('{{ html_theme }}'), file=sys.stderr)
        html_theme = 'alabaster'
    else:
        html_theme = os.path.basename(path_to_theme)
        html_theme_path = [os.path.abspath(os.path.join(path_to_theme, os.pardir))]
{% endif %}

It performs the following tasks in order:

  • It first checks if the provided theme is one of the built in themes. If that is indeed the case, we just set the html_theme config value to the name of the theme.
  • Otherwise, It installs the package using pip.
  • Now __file__ has a special meaning in python. It returns us the path of the module. We use it to get the path of the installed package.
  • Now each sphinx theme must have a file named `theme.conf` which defines several properties of the theme. We do a recursive search for that file.
  • We set html_theme to be the name of the directory which contains that file, and html_theme_path to be it’s parent directory.

Now let’s see everything in action. Here are four pages created by Yaydocs from a single markup document with no user configuration.

 

Now you can choose between many of the themes available on PyPI. You can even create your own theme. Follow this blog to get more insights and latest news about Yaydoc.

 

How to teach SUSI skills calling an External API

SUSI is an intelligent  personal assistant. SUSI can learn skills to understand and respond to user queries better. A skill is taught using rules. Writing rules is an easy task and one doesn’t need any programming background too. Anyone can start contributing. Check out these tutorials and do watch this video to get started and start teaching susi.

SUSI can be taught to call external API’s to answer user queries.

While writing skills we first mention string patterns to match the user’s query and then tell SUSI what to do with the matched pattern. The pattern matching is similar to regular expressions and we can also retrieve the matched parameters using $<parameter number>$ notation.

Example :

 My name is *
 Hi $1$!

When the user inputs “My name is Uday” , it is matched with “My name is *” and “Uday” is stored in $1$. So the output given is “Hi Uday!”.

SUSI can call an external API to reply to user query. An API endpoint or url when called must return a JSON or JSONP response for SUSI to be able to parse the response and retrieve the answer.

Rule Format for a skill calling an external API

The rule format for calling an external API is :

<regular expression for pattern matching>
!console: <return answer using $object$ or $required_key$>
{
“url”: “<API endpoint or url>”,
“path”: “$.<key in the API response to find the answer>”,

}
eol
  • Url is the API endpoint to be called which returns a JSON or JSONP response.
    The parameters to the url if any can be added using $$ notation.
  • Path is used to help susi know where to look for the answer in the returned response.
    If the path points to a root element, then the answer is stored in $object$, otherwise we can query $key$ to get the answer which is a value to the key under the path.
  • eol or end of line indicates the end of the rule.

Understanding the Path Attribute

Let us understand the Path attribute better through some test cases.

In each of the test cases we discuss what the path should be and how to retrieve the answer for a given required answer from the json response of an API.

  1. API response in json :

    { 
       “Key1” : “Value1”
    }

Required answer : Value1
Path : “$.Key1    =>   Retrieve Answer:  $object$

 

  1. API response in json :

    { 
      “Key1” : [{“Key11” : “Value11”}]
    }

Required answer : Value11
Path : $.Key1[0]   =>  Retrieve Answer: $Key11$
Path : $.Key1[0].Key11   => Retrieve Answer: $object$

 

  1. API response in json :

    { 
      “Key1” : {“Key11” : “Value11”}
    }


Required answer : Value11
Path : $.Key1  => Retrieve Answer:  $Key11$
Path : $.Key1.Key11  => Retrieve Answer: $object$

 

  1. API response in json :
{ 
  “Key1” : {
               “Key11” : “Value11”,
               “Key12” : “Value12”
           }
}

Required answer : Value11 , Value12
Path : $.Key1  => Retrieve Answer:  $Key11$ , $Key12$

Where to write these rules?

Now, since we know how to write rules let’s see where to write them.

We use etherpads to write and test rules and once we finish testing our rule we can push those rules to the repo.

Steps to open, write and test rules:

  1. Open a new etherpad with a desired name <etherpad name> at http://dream.susi.ai/
  2. Write your skills code in the etherpad following the code format explained above.
  3. Now, to test your skill let’s chat with susi. Start a conversation with susi at http://susi.ai/chat to test your skills.
  4. Load your skills by typing dream <etherpad name> and wait for a response saying dreaming enabled for <etherpad name>
  5. Test your skill and follow step 4 every time you make changes to the code in your etherpad.
  6. After you are done testing, type stop dreaming and if you are satisfied with your skill do send a PR to help susi learn.

Examples

Let us try an example to understand this better.

1. Plot of a TV Show

Tvmaze is an open  TV API that provides information about tv shows. Let us write a rule to know the plot of a tv show. We can find many such APIs. Check out this link listing few of them.

  1.  Open an etherpad at http://dream.susi.ai/ named tvshowplot. 

  2.   Enter the code to query plot of a TV show in the etherpad at                           http://dream.susi.ai/p/tvshowplot

  * plot of *|* summary of *
  !console:$object$
  {
      "url":"http://api.tvmaze.com/singlesearch/shows?q=$2$",
      "path":"$.summary"
  }
  eol
  1. Now lets test our skill by starting a conversation with susi at http://susi.ai/chat.
  • User Query: dream tvshowplot
    Response:  dreaming enabled  for tvshowplot
  • User Query: what is the plot of legion
    Response: Legion introduces the story of David Haller: Since he was a teenager, David has struggled with mental illness. Diagnosed as schizophrenic, David has been in and out of psychiatric hospitals for years. But after a strange encounter with a fellow patient, he’s confronted with the possibility that the voices he hears and the visions he sees might be real. He’s based on the Marvel comics character Legion, the son of X-Men founder Charles Xavier (played by Patrick Stewart and James McAvoy in the films), first introduced in 1985.

Intermediate Processing:

Pattern Matching : $1$ = “what is the” ; $2$ = “legion”

Url : http://api.tvmaze.com/singlesearch/shows?q=legion

API response:

{
   "id": 6393,
   "url": "http:\/\/www.tvmaze.com\/shows\/6393\/legion",
   "name": "Legion",
   "type": "Scripted",
   "language": "English",
   "genres": [
     "Drama",
     "Action",
     "Science-Fiction"
   ], 
   "summary": "<p><strong>Legion<\/strong> introduces the story of David Haller: Since he was a teenager, David has struggled with mental illness. Diagnosed as schizophrenic, David has been in and out of psychiatric hospitals for years. But after a strange encounter with a fellow patient, he's confronted with the possibility that the voices he hears and the visions he sees might be real. He's based on the Marvel comics character Legion, the son of X-Men founder Charles Xavier (played by Patrick Stewart and James McAvoy in the films), first introduced in 1985.<\/p>",
   "updated": 1491955072,  
 }

Note: The API response has been trimmed to show only the relevant content.

Path : $.summary

Retrieving Answer: so our required answer in the api response is under the key summary and is retrieved using $object$ since it is a root element.

 

Screenshots:

2. Cooking Recipes

Let us try it out with another API.
Recipepuppy is an cooking recipe API where users can query various recipes.

  1.  Open a etherpad at http://dream.susi.ai/ named recipe. 

  2.   Enter the code to query a recipe in the etherpad at  http://dream.susi.ai/p/recipe
#Gives recipes and links to cook a dish
* cook *
!console:<p>To cook  <strong>$title$</strong> : <br>The ingridients required are: $ingredients$. <br> For instruction to prepare the dish $href$ </p>
{
  "url":"http://www.recipepuppy.com/api/?q=$2$",
  "path":"$.results"
}
eol
  1. Now lets test our skill by starting a conversation with susi at http://susi.ai/chat.
  • User Query: dream recipe
    Response:  dreaming enabled  for recipe
  • User Query: how to cook chicken biryani
    Response: To cook Chicken Biryani Recipe :
    The ingridients required are: chicken, seeds, chicken broth, rice, butter, peas, garlic, red onions, cardamom, curry paste, olive oil, tomato, coriander, cumin, brown sugar, tumeric.
    For instruction to prepare the dish Click Here!

Intermediate Processing:

Pattern Matching : $1$ = “how to” ; $2$ = “chicken biryani”

Url : http://www.recipepuppy.com/api/?q=chicken biryani

API response:

{
   "title": "Recipe Puppy",
   "version": 0.1,
   "href": "http:\/\/www.recipepuppy.com\/",
   "results": [
     {
       "title": "Chicken Biryani Recipe",
       "href": "http:\/\/www.grouprecipes.com\/53040\/chicken-biryani.html",
       "ingredients": "chicken, seeds, chicken broth, rice, butter, peas, garlic, red onions, cardamom, curry paste, olive oil, tomato, coriander, cumin, brown sugar, tumeric",
       "thumbnail": "http:\/\/img.recipepuppy.com\/413822.jpg"
     },
 ]
 }

Note: The API response has been trimmed to show only the relevant content.

Path : $.results[0]

Retrieving Answer: so our required answer in the api response is under the key results and since it’s an array we are using the first element of the array and since the element is a dictionary too we use its keys correspondingly to answer. The $href$ is rendered as “Click Here” hyperlinked to the actual url.

 

Screenshots:

 

We have successfully taught susi a skill which tells users about the plot of a tv show and a skill to query recipes.
Cheers!
Following similar procedure, we can make use of other APIs and teach susi several new skills.

 

Generating a documentation site from markup documents with Sphinx and Pandoc

Generating a fully fledged website from a set of markup documents is no easy feat. But thanks to the wonderful tool sphinx, it certainly makes the task easier. Sphinx does the heavy lifting of generating a website with built in javascript based search. But sometimes it’s not enough.

This week we were faced with two issues related to documentation generation on loklak_server and susi_server. First let me give you some context. Now sphinx requires an index.rst file within /docs/  which it uses to generate the first page of the site. A very obvious way to fill it which helps us avoid unnecessary duplication is to use the include directive of reStructuredText to include the README file from the root of the repository.

This leads to the following two problems:

  • Include directive can only properly include a reStructuredText, not a markdown document. Given a markdown document, it tries to parse the markdown as  reStructuredText which leads to errors.
  • Any relative links in README break when it is included in another folder.

To fix the first issue, I used pypandoc, a thin wrapper around Pandoc. Pandoc is a wonderful command line tool which allows us to convert documents from one markup format to another. From the official Pandoc website itself,

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

pypandoc requires a working installation of Pandoc, which can be downloaded and installed automatically using a single line of code.

pypandoc.download_pandoc()

This gives us a cross-platform way to download pandoc without worrying about the current platform. Now, pypandoc leaves the installer in the current working directory after download, which is fine locally, but creates a problem when run on remote systems like Travis. The installer could get committed accidently to the repository. To solve this, I had to take a look at source code for pypandoc and call an internal method, which pypandoc basically uses to set the name of the installer. I use that method to find out the name of the file and then delete it after installation is over. This is one of many benefits of open-source projects. Had pypandoc not been open source, I would not have been able to do that.

url = pypandoc.pandoc_download._get_pandoc_urls()[0][pf]
filename = url.split(‘/’)[-1]
os.remove(filename)

Here pf is the current platform which can be one of ‘win32’, ‘linux’, or ‘darwin’.

Now let’s take a look at our second issue. To solve that, I used regular expressions to capture any relative links. Capturing links were easy. All links in reStructuredText are in the same following format.

`Title <url>`__

Similarly links in markdown are in the following format

[Title](url)

Regular expressions were the perfect candidate to solve this. To detect which links was relative and need to be fixed, I checked which links start with the \docs\ directory and then all I had to do was remove the \docs prefix from those links.

A note about loklak and susi server project

Loklak is a server application which is able to collect messages from various sources, including twitter.

SUSI AI is an intelligent Open Source personal assistant. It is capable of chat and voice interaction and by using APIs to perform actions such as music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, and other real time information

Using NodeBuilder to instantiate node based Elasticsearch client and Visualising data

As elastic.io mentions, Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. But in many setups, it is not possible to manually install an Elasticsearch node on a machine. To handle these type of scenarios, Elasticsearch provides the NodeBuilder module, which can be used to spawn Elasticsearch node programmatically. Let’s see how.

Getting Dependencies

In order to get the ES Java API, we need to add the following line to dependencies.

compile group: 'org.elasticsearch', name: 'securesm', version: '1.0'

The required packages will be fetched the next time we gradle build.

Configuring Settings

In the Elasticsearch Java API, Settings are used to configure the node(s). To create a node, we first need to define its properties.

Settings.Builder settings = new Settings.Builder();

settings.put("cluster.name", "cluster_name");  // The name of the cluster

// Configuring HTTP details
settings.put("http.enabled", "true");
settings.put("http.cors.enabled", "true");
settings.put("http.cors.allow-origin", "https?:\/\/localhost(:[0-9]+)?/");  // Allow requests from localhost
settings.put("http.port", "9200");

// Configuring TCP and host
settings.put("transport.tcp.port", "9300");
settings.put("network.host", "localhost");

// Configuring node details
settings.put("node.data", "true");
settings.put("node.master", "true");

// Configuring index
settings.put("index.number_of_shards", "8");
settings.put("index.number_of_replicas", "2");
settings.put("index.refresh_interval", "10s");
settings.put("index.max_result_window", "10000");

// Defining paths
settings.put("path.conf", "/path/to/conf/");
settings.put("path.data", "/path/to/data/");
settings.put("path.home", "/path/to/data/");

settings.build();  // Buid with the assigned configurations

There are many more settings that can be tuned in order to get desired node configuration.

Building the Node and Getting Clients

The Java API makes it very simple to launch an Elasticsearch node. This example will make use of settings that we just built.

Node elasticsearchNode = NodeBuilder.nodeBuilder().local(false).settings(settings).node();

A piece of cake. Isn’t it? Let’s get a client now, on which we can execute our queries.

Client elasticsearhClient = elasticsearchNode.client();

Shutting Down the Node

elasticsearchNode.close();

A nice implementation of using the module can be seen at ElasticsearchClient.java in the loklak project. It uses the settings from a configuration file and builds the node using it.


Visualisation using elasticsearch-head

So by now, we have an Elasticsearch client which is capable of doing all sorts of operations on the node. But how do we visualise the data that is being stored? Writing code and running it every time to check results is a lengthy thing to do and significantly slows down development/debugging cycle.

To overcome this, we have a web frontend called elasticsearch-head which lets us execute Elasticsearch queries and monitor the cluster.

To run elasticsearch-head, we first need to have grunt-cli installed –

$ sudo npm install -g grunt-cli

Next, we will clone the repository using git and install dependencies –

$ git clone git://github.com/mobz/elasticsearch-head.git
$ cd elasticsearch-head
$ npm install

Next, we simply need to run the server and go to indicated address on a web browser –

$ grunt server

At the top, enter the location at which elasticsearch-head can interact with the cluster and Connect.

Upon connecting, the dashboard appears telling about the status of cluster –

The dashboard shown above is from the loklak project (will talk more about it).

There are 5 major sections in the UI –
1. Overview: The above screenshot, gives details about the indices and shards of the cluster.
2. Index: Gives an overview of all the indices. Also allows to add new from the UI.
3. Browser: Gives a browser window for all the documents in the cluster. It looks something like this –

The left pane allows us to set the filter (index, type and field). The table listed is sortable. But we don’t always get what we are looking for manually. So, we have the following two sections.
4. Structured Query: Gives a dead simple UI that can be used to make a well structured request to Elasticsearch. This is what we need to search for to get Tweets from @gsoc that are indexed –

5. Any Request: Gives an advance console that allows executing any query allowable by Elasticsearch API.

A little about the loklak project and Elasticsearch

loklak is a server application which is able to collect messages from various sources, including twitter. The server contains a search index and a peer-to-peer index sharing interface. All messages are stored in an elasticsearch index.

Source: github/loklak/loklak_server

The project uses Elasticsearch to index all the data that it collects. It uses NodeBuilder to create Elasticsearch node and process the index. It is flexible enough to join an existing cluster instead of creating a new one, just by changing the configuration file.

Conclusion

This blog post tries to explain how NodeBuilder can be used to create Elasticsearch nodes and how they can be configured using Elasticsearch Settings.

It also demonstrates the installation and basic usage of elasticsearch-head, which is a great library to visualise and check queries against an Elasticsearch cluster.

The official Elasticsearch documentation is a good source of reference for its Java API and all other aspects.

Building the Scheduler UI

{ Repost from my personal blog @ https://blog.codezero.xyz/building-the-scheduler-ui }

If you hadn’t already noticed, Open Event has got a shiny new feature. A graphical and an Interactive scheduler to organize sessions into their respective rooms and timings.

As you can see in the above screenshot, we have a timeline on the left. And a lot of session boxes to it’s right. All the boxes are re-sizable and drag-drop-able. The columns represent the different rooms (a.k.a micro-locations). The sessions can be dropped into their respective rooms. Above the timeline, is a toolbar that controls the date. The timeline can be changed for each date by clicking on the respective date button.

The Clear overlaps button would automatically check the timeline and remove any sessions that are overlapping each other. The Removed sessions will be moved to the unscheduled sessions pane at the left.

The Add new micro-location button can be used to instantly add a new room. A modal dialog would open and the micro-location will be instantly added to the timeline once saved.

The Export as iCal allows the organizer to export all the sessions of that event in the popular iCalendar format which can then be imported into various calendar applications.

The Export as PNG saves the entire timeline as a PNG image file. Which can then be printed by the organizers or circulated via other means if necessary.

Core Dependencies

The scheduler makes use of some javascript libraries for the implementation of most of the core functionality

  • Interact.js – For drag-and-drop and resizing
  • Lodash – For array/object manipulations and object cloning
  • jQuery – For DOM Manipulation
  • Moment.js – For date time parsing and calculation
  • Swagger JS – For communicating with our API that is documented according to the swagger specs.
Retrieving data via the API

The swagger js client is used to obtain the sessions data using the API. The client is asynchronously initialized on page load. The client can be accessed from anywhere using the javascript function initializeSwaggerClient.

The swagger initialization function accepts a callback which is called if the client is initialized. If the client is not initialized, the callback is called after that.

var swaggerConfigUrl = window.location.protocol + "//" + window.location.host + "/api/v2/swagger.json";  
window.swagger_loaded = false;  
function initializeSwaggerClient(callback) {  
    if (!window.swagger_loaded) {
        window.api = new SwaggerClient({
            url: swaggerConfigUrl,
            success: function () {
                window.swagger_loaded = true;
                if (callback) {
                    callback();
                }

            }
        });
    } else {
        if (callback) {
            callback();
        }
    }
}

For getting all the sessions of an event, we can do,

initializeSwaggerClient(function () {  
    api.sessions.get_session_list({event_id: eventId}, function (sessionData) {
        var sessions = sessionData.obj;  // Here we have an array of session objects
    });
});

In a similar fashion, all the micro-locations of an event can also be loaded.

Processing the sessions and micro-locations

Each session object is looped through, it’s start time and end time are parsed into moment objects, duration is calculated, and it’s distance from the top in the timeline is calculated in pixels. The new object with additional information, is stored in an in-memory data store, with the date of the event as key, for use in the timeline.

The time configuration is specified in a separate time object.

var time = {  
    start: {
        hours: 0,
        minutes: 0
    },
    end: {
        hours: 23,
        minutes: 59
    },
    unit: {
        minutes: 15,
        pixels: 48,
        count: 0
    },
    format: "YYYY-MM-DD HH:mm:ss"
};

The smallest unit of measurement is 15 minutes and 48px === 15 minutes in the timeline.

Each day of the event is stored in a separate array in the form of Do MMMM YYYY(eg. 2nd May 2013).

The array of micro-location objects is sorted alphabetically by the room name.

Displaying sessions and micro-locations on the timeline

According to the day selected, the sessions for that day are displayed on the timeline. Based on their time, the distance of the session div from the top of the timeline is calculated in pixels and the session box is positioned absolutely. The height of the session in pixels is calculated from it’s duration and set.

For pixels-minutes conversion, the following are used.

/**
 * Convert minutes to pixels based on the time unit configuration
 * @param {number} minutes The minutes that need to be converted to pixels
 * @returns {number} The pixels
 */
function minutesToPixels(minutes) {  
    minutes = Math.abs(minutes);
    return (minutes / time.unit.minutes) * time.unit.pixels;
}

/**
 * Convert pixels to minutes based on the time unit configuration
 * @param {number} pixels The pixels that need to be converted to minutes
 * @returns {number} The minutes
 */
function pixelsToMinutes(pixels) {  
    pixels = Math.abs(pixels);
    return (pixels / time.unit.pixels) * time.unit.minutes;
}
Adding interactivity to the session elements

Interact.js is used to provide interactive capabilities such as drag-drop and resizing.

To know how to use Interact.js, you can checkout some previous blog posts on the same, Interact.js + drag-drop and Interact.js + resizing.

Updating the session information in database on every change

We have to update the session information in database whenever it is moved or resized. Every time a session is moved or resized, a jQuery event is triggered on $(document) along with the session object as the payload.

We listen to this event, and make an API request with the new session object to update the session information in the database.


The scheduler UI is more complex than said in this blog post. To know more about it, you can checkout the scheduler’s javascript code atapp/static/js/admin/event/scheduler.js.