Using Marshmallow Fields in Open Event API Server

he nextgen Open Event API Server  provides API endpoints to fetch the data, and to modify and update it. These endpoints have been written using flask-rest-jsonapi, which is a flask extension to build APIs around the specifications provided by JSONAPI 1.0. This extension helps you, quoting from their website:

flask-rest-jsonAPI’s data abstraction layer lets us expose the resources in a flexible way. This is achieved by using Marshmallow fields by marshmallow-jsonapi. This blog post explains how we use the marshmallow fields for building API endpoints in Open Event API Server.

The marshmallow library is used to serialize, deserialize and validate input data. Marshmallow uses classes to define output schemas. This makes it easier to reuse and configure the and also extend the schemas. When we write the API Schema for any database model from the Open Event models, all the columns have to be added as schema fields in the API class.

This is the API Server’s event schema using marshmallow fields:

These are the Marshmallow Field classes for various types of data. You can pass the following parameters when creating a field object. The ones which are used in the API Server as described below. For the rest, you can read more on marshmallow docs.

Let’s take a look at each of these fields. Each of the following snippets sample writing fields for API Schema.

identifier = fields.Str(dump_only=True)
  • This is a field of data-type String.
  • dump_only :  This field will be skipped during deserialization as it is set to True here. Setting this true essentially means marking `identifier` as read-only( for HTTP API) 
name = fields.Str(required=True)
  • This again is a field of data-type String.
  • This is a required field and a ValidationError is raised if found missing during deserialization. Taking a look at the database backend:

Since this field is set to non-nullable in the database model, it is made required in API Schema.

 external_event_url = fields.Url(allow_none=True)
  • This is a field of datatype URL.
  • Since this is not a required field, NULL values are allowed for this field in the database model. To reflect the same in the API, we have to add allow_none=True. If missing=None is unset, it defaults to false.
ends_at = fields.DateTime(required=Truetimezone=True)
  • Field of datatype DateTime
  • It is a required field for an event and the time used here is timezone aware.
latitude = fields.Float(validate=lambda n: -90 <= n <= 90allow_none=True)
  • Field of datatype Float.
  • In marshmallow fields, we can use validator clauses on the input value of a field using the validate: parameter. It returns a boolean, which when false raises a Validation Error. These validators are called during deserialization.
is_map_shown = fields.Bool(default=False)
  • Field of datatype boolean.
  • Default value for the marshmallow fields can be set by defining using default: Here, is_map_shown attribute is set to false as default for an event.
privacy = fields.Str(default="public")
  • privacy is set to default “public”.

When the input value for a field is missing during serialization, the default value will be used. This parameter can either be a value or a callable.

As described in the examples above, you can write the field as field.<data-type>(*parameters to marshmallow.fields.Field constructor*).

The parameters passed to the class constructor must reflect the column definition in the database model, else you might run into unexpected errors.


An example to quote from Open Event development would be that null values were not being allowed to be posted even for nullable columns. This behavior was because allow_none defaults to false in schema, and it has to be explicitly set to True in order to receive null values. ( Issue for the same: 
Make non-required attributes nullable and the Pull Request made for fix.)

Fields represent a database model column and are serialized and deserialized, so that these can be used in any format, like JSON objects which we use in API server. Each field corresponds to an attribute of the object type like location, starts-at, ends-at, event-url for an event. marshmallow allows us to define data-types for the fields, validate input data and reinforce column level constraints from database model.

This list is not exhaustive of all the parameters available for marshmallow fields. To read further about them and marshmallow, check out their documentation.

Additional Resources

Code involved in API Server:

Running ngrok To Use Local Open Event API Server Endpoints From Public Access URL

How to setup and run ngrok?
To run ngrok, you need to download it from the ngrok website.
The download page can be found 
here.

Once you have the zip installed, you’ll need to unzip it. On Linux or MacOS, run this in the terminal:

$ unzip /path/to/ngrok.zip

To expose the web server running on your local machine, run the following from inside the directory where you have unzipped ngrok:

./ngrok http 80

This syntax breakdowns to :
ngrok :: terminal command

http :: protocol of the server that is to be tunneled

( ngrok also lets you open and run TCP and TLS tunnels)

80 :: port on which the tunnel is to be run

( If you are not sure of the port on which your server is running, it might probably be 80 – the default for HTTP)

The Open Event API server runs on port 5000 and it provided HTTP API, so the command we’ll use here is

./ngrok http 5000

Once you run this command, ngrok opens its UI in the terminal itself. This will contain the public url of your tunnel along with other stats related to the requests being made and traffic on localhost.

Starting ngrok:Screenshot_20170718_155834.png

Public URL updated:

ngrok also offers a web interface where you can see the requests and other data which is shown in the terminal. For this go to http://localhost:4040/inspect/http. This web interface inspects and records each request made so that you can replay the requests again for debugging or cross-checking metrics. This feature can be turned off by passing an argument, so that the requests are not recorded anymore. While running a production server, it can help to both maintain security for the requests and also reduce request handling times when scaling. To read more about advanced options, please read the ngrok documentation.

Running Open Event API server on the public URL:
Since now we have localhost:5000 tunnelled over a public url, we’ll use that to make requests to the API server.

A GET request for /v1/events :

The request made to the public URL, which in this case here is: http://9a5ac170.ngrok.io is equivalent to this url: http://localhost:5000  running on my local setup of the Open Event API Server. When the request is made, the EventList class is used and ResourceList class’ method which is build for the url endpoint ‘event_list’ is called. This returns a list of events from the current database which is being used on my server, thus my local database.

A DELETE request for /v1/events/1

 In a similar fashion, when this request is made, event_id is parsed from view_kwargs and the following equivalent request is made: DELETE http://localhost:5000/v1/events/1 which deletes the  event with id = 1 and returns a success object as shown in the screenshot above.

ngrok tunnel is often initiated on the client-side firsthowever it can hash out a secure channel with the server, which is a very slick way to obtain a work around standard firewall configurations. One thing to keep in mind is that as soon as you quit the terminal UI, the tunnel will be closed and re-running ngrok will lead to the creation of a new public url, so you might have to share that again. To prevent this from happening, you can specify a custom url with a Cname configuration. You can also run ngrok with a custom url on your own domain. This can be used to run development servers for sub-projects under a particular domain. Adding auth improves security and can restrict usage to people from the same organization, let’s say.

You can also access the documentation page directly with the public url.

Adding auth to protect Open Event localhost:
Anyone with the public url can access your localhost and can make changes. To prevent this we can use the auth switch with ngrok command. This will enforce Basic Auth on each request.

ngrok http -auth="username:password" 5000

Apart from these, ngrok tunnels can be used for file system sharing, mobile device testing and also to build webhooks integrations.

Additional Resources

Working With Inter-related Resource Endpoints In Open Event API Server

For each resource object we have the endpoints related to it

– GET, POST for List

– GET, PATCH, DELETE for Detail

– GET, POST, PATCH, DELETE for Relationship

In this blogpost I will discuss how the resource inter-related endpoints work. These are the endpoints which involve two resource objects which are also related to another same resource object.

The discussion in this post is of the endpoints related to Sessions Model.

In the API server, there exists a relationship between event and sessions. Apart from these, session also has relationships with microlocations, tracks, speakers and session-types. Let’s take a look at the endpoints related with the above.

`/v1/tracks/<int:track_id>/sessions` is a list endpoint which can be used to list and create the sessions related to a particular track of an event. To get the list we define the query() method in ResourceList class as such:

The query method is executed for GET requests, so this if clause looks for track_id in view_kwargs dict. When the request is made to `/v1/tracks/<int:track_id>/sessions`, track id will be present as ‘track_id in the view_kwargs. The tracks are filtered based on the id passed here and then joined on the query with all sessions object from database.

For the POST method, we need to add the track_id from view_kwargs to pass into the track_id field of database model. This is achieved by using flask-rest-jsonapi’s before_create_object() method. The implementation for track_id is the following:

When a POST request is made to `/v1/tracks/<int:track_id>/sessions`, the view_kwargs dict will have ‘track_id’ in it. So if track_id is present in the url params, we first ensure that a track with the passed id exists, then only proceed to create a sessions object under the given track. Now the safe_query() method is a generic custom method written to check for such things. The model is passed along with the filter attribute, and a field to include in the error message. This method throws an ObjectNotFound exception if no such object exists for a given id.

We also need to take care of the permissions for these endpoints. As the decorators are called even before schema validation, it was difficult to get the event_id for permissions unless adding highly endpoint-specific code in the permission manager core, leading to loss of generality.  So the leave_if parameter of permission check was used to overcome this issue. Since the permissions manager isn’t fully developed yet, this is to be changed in the improved implementation of the permissions manager.

Similar implementations for micro locations and session types was done. All the same is not explained in this blogpost. For extended code, take a look at the source code of this schema.

For speaker relation, a few things were different. This is because speakers-sessions is a many-to-many relationship. Let’s take a look at this:

As it is a many-to-many relationship, a association_table was used with flask-sqlalchemy. So for the query() method, the same association table is queried after extracting the speaker_id from view_kwargs dict.

For the POST request on `/v1/speakers/<int:speaker_id>/sessions` , flask-rest-jsonapi’s after_create_object() method is used to insert the request in association_table. In this method the parameters are the following: self, obj, data, view_kwargs

Now view_kwargs contains the url parameters, so we make a check for speaker_id in view_wargs. If it is present, then before proceeding to insert data, we ensure that a speaker exists with that id using the safe_query() method as described above. After that, the obj argument of the method is used. This contains the object that was created in  previous method. So now once the sessions object has been created and we are sure that a speaker exists with given speaker_id, this is just to be appended to obj.speakers  so that this relationship tuple is inserted into the association table.

This updates the association table ‘speakers_session’ in this case. The other such endpoints are being worked upon in a similar fashion and will be consolidated as part of a set of robust APIs, along with the improved permissions manager for the Open Event API Server.

Additional Resources

Code, Issues and Pull Request involved

Using Flask-REST-JSONAPI’s Resource Manager In Open Event API Server

For the nextgen Open Event API Server, we are using flask-rest-jsonapi to write all the API endpoints. The flask-rest-jsonapi is based on JSON API 1.0 Specifications for JSON object responses.

In this blog post, I describe how I wrote API schema and endpoints for an already existing database model in the Open Event API Server. Following this blog post, you can learn how to write similar classes for your database models,

Let’s dive into how the API Schema is defined for any Resource in the Open Event API Server. Resource, here, is an object based on a database model. It provides a link between the data layer and your logical data abstraction. This ResourceManager has three classes.

  1. Resource List
  2. Resource Detail
  3. Resource Relationship

(We’ll take a look at the Speakers API.)

First, we see the already implemented 
Speaker Model :

class Speaker(db.Model):

   """Speaker model class"""

   __tablename__ = 'speaker'

   id = db.Column(db.Integer, primary_key=True)

   name = db.Column(db.String, nullable=False)

   photo = db.Column(db.String)

   website = db.Column(db.String)

   organisation = db.Column(db.String)

   is_featured = db.Column(db.Boolean, default=False)

   sponsorship_required = db.Column(db.Text)

   def __init__(self,

                name=None,

                photo_url=None,

                website=None,

                organisation=None,

                is_featured=False,

                sponsorship_required=None):

      self.name = name

      self.photo = photo_url

      self.website = website

      self.organisation = organisation

      self.is_featured = is_featured

      self.sponsorship_required = sponsorship_required

 

Here’s the Speaker API Schema:

class SpeakerSchema(Schema):
   class Meta:
       type_ = 'speaker'
       self_view = 'v1.speaker_detail'
       self_view_kwargs = {'id': '<id>'}
   id = fields.Str(dump_only=True)
   name = fields.Str(required=True)
   photo_url = fields.Url(allow_none=True)
   website = fields.Url(allow_none=True)
   organisation = fields.Str(allow_none=True)
   is_featured = fields.Boolean(default=False)
   sponsorship_required = fields.Str(allow_none=True)
class SpeakerList(ResourceList):  
   schema = SpeakerSchema
   data_layer = {'session': db.session,
                 'model': Speaker}
class SpeakerDetail(ResourceDetail):
   schema = SpeakerSchema
   data_layer = {'session': db.session,
                 'model': Speaker}
class SpeakerRelationship(ResourceRelationship):
   schema = SpeakerSchema
   data_layer = {'session': db.session,
                 'model': Speaker}

 

Last piece of code is listing the actual endpoints in __init__ file for flask-rest-jsonapi

api.route(SpeakerList, 'speaker_list', '/events/<int:event_id>/speakers', '/sessions/<int:session_id>/speakers', '/users/<int:user_id>/speakers')
api.route(SpeakerDetail, 'speaker_detail', '/speakers/<int:id>')
api.route(SpeakerRelationship, 'speaker_event', '/speakers/<int:id>/relationships/event')
api.route(SpeakerRelationship, 'speaker_user', '/speakers/<int:id>/relationships/user')
api.route(SpeakerRelationship, 'speaker_session', '/speakers/<int:id>/relationships/sessions')

 

How to write API schema from database model?

Each column of the database model is a field in the API schema. These are marshmallow fields and can be of several data types – String, Integer, Float, DateTime, Url.

Three class definitions follow the Schema class.

  • List:
    SpeakerList class is the basis of endpoints:
api.route(SpeakerList, 'speaker_list', '/events/<int:event_id>/speakers',
          '/sessions/<int:session_id>/speakers',                                                             
          '/users/<int:user_id>/speakers')

This class will contain methods that generate a list of speakers based on the id that is passed in view_kwargs. Let’s say that ‘/sessions/<int:session_id>/speakers’ is requested. As the view_kwargs here contains sesssion_id, the query methods in SpeakerList class will fetch a list of speaker profiles related to  the sessions identified by session_id.

The flask-rest-jsonapi allows GET and POST methods for ResourceList. When using these endpoints for POST, the before_create_object and before_post methods can be written. These methods are overridden from the base ResourceList class in flask-rest-jsonapi/resource.py when they are defined in Speaker’s class.

  • Detail: 

SpeakerDetail class provides these endpoints:

 api.route(SpeakerDetail, 'speaker_detail', '/speakers/<int:id>')

The Resource Detail provides methods to facilitate GET, PATCH and DELETE requests provided for the endpoints. Methods like: before_get_object, before_update_object, after_update_object are derived from ResourceDetail class. The endpoints return an object of the resource based on the view_kwargs in a JSON response.

  • Relationship:

SpeakerRelationship class, as you might have guesses, provides:

api.route(SpeakerRelationship, 'speaker_event', '/speakers/<int:id>/relationships/event')

api.route(SpeakerRelationship, 'speaker_user', '/speakers/<int:id>/relationships/user')

api.route(SpeakerRelationship, 'speaker_session', '/speakers/<int:id>/relationships/sessions')

SpeakerRelationship class provides methods to create, update and delete relationships between the speaker and related resources – events, users and sessions in this case.

The above is a bare bone API schema example. The actual implementation in Open Event Server has lots of helper methods too to cater to our specific needs.

Additional Resources:

Using wrapper div around HTML buttons to add extra functionality in Open Event Server

Open Event server had a bug wherein clicking on the notification of an invitation caused a server error. When invitations for a role in an event were sent, they showed up in the notifications header. Clicking on the notification there took the user to the notification page where there were options to Accept or Decline the invitation. The bug was that when the user clicked on either of the Accept/Decline button, the notification was not being marked read which semantically it should have been. Since the invite link expires after acceptance/decline, due to the persistence of the invitation in the notifications page, when the user clicked on the Accept/Decline button, it ran into a 404 error.
The Accept/Decline buttons already have a href attached to each one of them which triggered functions of invitation manager class. The aim here was to add one other thing to happen when any of these buttons was clicked. This bug was resolved by adding a wrapper around these buttons and adding the same functionality to this as that of the ‘Mark as Read’ button.

Adding a class to both the buttons

<a href='{accept_link}' class='btn btn-success btn-sm invite'>Accept</a>
<a href='{decline_link}' class='btn btn-danger btn-sm invite'>Decline</a>


Adding JavaScript to the invite button

if ($(e.target).is('.invite')) {
            var read_button = $(e.target).parents(".notification").find('a.read-btn');
            $.getJSON(read_button.attr('href'), function (data) {
                       read_button.parents('.notification').removeClass('info'); // show notification as read
                read_button.remove(); // delete mark as read button
});
Using parseInt() with Radix

Another error in the same issue was that sometimes the notification count went in negatives. This was resolved by adding a simple clause to check when notification count is greater than 0.

notif_count = ((notif_count - 1) > 0 ) ? (notif_count - 1) : 0;

 

To set count as the innerHTML of a div, which in this case was the notification count bubble, one uses parseInt();

div.innerHTML = parseInt(notif_count);

This might work but codacy gives an error. The error here is because of a radix not being passed to the parseInt() function.

What is a radix?
Radix simply denotes the integer value of the base of the numeration system. It is basically the value of a single digit in a given number.

For example, numbers written in binary notation have radix 2 while those written in octal notation have radix 8.

Passing radix to the parseInt() function specifies the number system in which the input is to be parsed. Though the radix can be hinted at by other means too, it is always a good practice to pass the radix explicitly.

// leading 0 => radix 8 (octal) 
var number = parseInt('0101');
// leading ‘0x’ => radix 16 (hexadecimal) 
var number = parseInt('0x0101');
// Numbers starting with anything else assumes a radix of 10 
var number = parseInt('101');
// specifying the radix explicitly, here radix of 2 => (binary) 
var number = parseInt('0101', 2);


If you ignore this argument, parseInt() will try to choose the most proper numeral system, but this can back-fire due to browser inconsistencies. For example:

parseInt("023");  // 23 in one browser (radix = 10)
parseInt("023");  // 19 in other browser (radix = 8)


Providing the radix is vital if you are to guarantee accuracy with variable input (basic range, binary, etc). This can be ensured by using a JavaScript linter for your code, which will throw an error for unintended results.

Issues :
Exception object does not have code attribute
Internal server error on attempt to import the data

Pull Request :
Fix internal server error on importing zip

Additional Resources:

CSS Trick: How to Object-fit an Image inside its Container Element in the Open Event Front-end

I came across this piece of css when the Nextgen Conference Logo on eventyay home-page had its aspect ratio not maintained. As you can see in this picture that the image is stretched to fill in the parent container’s size.

The CSS behind this image was:

.event-holder img {
    width: 100%;
    height: 165px;
    border: none;
}

 
Let’s see how object fit helped me to fix this problem.

What is object-fit ?

The object-fit property of an element describes how it is fitted or placed inside its container element. This container box has its boundaries defined by the max-height and max-width attribute of the object in question.
In the html code for the above logo we had:

img {
    width: 100%;
    height: 165px; 
}

 
The object in this example is an image ( img ) which is to be fitted inside a box of height 165px with 100% width.

The object-fit property can refer to any element like video or embedded item in the page, but it’s mostly applied to images.

object-fit provides us with fine grained control over how the object resizes to fill inside its container div. Essentially object-fit lets the image ( in this context, but can be applied to any object ) fill the box withmaintaining aspect ratio and/or filling up the entire area established by height and weight.

Here’s a short example for different values of this attribute:
( The
image used here is a 4096px*2660px image, placed inside a div of height 100px and width 300px. )

object-fit: fill src="download.png" class="fill"/> object-fit: contain src="download.png" class="contain"/> object-fit: cover src="download.png" class="cover"/> object-fit: none src="download.png" class="none"/> object-fit: scale-down src="download.png" class="scale-down"/>

 

img {
  width: 300px;
  height: 100px;
  border: 1px solid yellow;
  background: blue;
}
.fill {
  object-fit: fill;
}
.contain {
  object-fit: contain;
}
.cover {
  object-fit: cover;
}
.none {
  object-fit: none;
}
.scale-down {
  object-fit: scale-down;
}

As from the above illustration, it is evident that what I needed to fix aspect ratio on home page was to use object-fit: cover. We got this result by just adding one line of code. Here’s the final CSS:

.event-holder img {
    width: 100%;
    height: 165px;
    object-fit: cover;
    border: none;
}

 

And the final image, which is pleasing and aesthetic:


Quick cheat-sheet for object-fit values

fill

  • stretches the image to fit the content box
  • aspect-ratio disregarded

contain

  • increases or decreases the size of the image to fill the box
  • aspect-ratio preserved

cover

  • fill the height and width of box
  • aspect ratio preserved
  • often the image gets cropped

none

  • height and width of the container box ignored
  • image retains its original size

scale-down

  • image takes smallest concrete object size between none and contain

Additional Resources

Deploy Static Web Pages In Six Keystrokes

I added two fairly young projects – Query Server and YayDoc to the projects list on http://labs.fossasia.org/. I pulled the code from GitHub, made the changes and it worked fine. Now to get it reviewed from a co-developer, I needed to host my changes somewhere on the web.

The fossasia-labs repository runs on gh-pages by GitHub. Hence, one way of hosting my changes was to use gh-pages on my fork but I tried this tool instead to deploy my site in six keystrokes.

This is what it took to deploy the static webpage right from my command line. Let’s dive into how this tool is as easy as it gets.

What is surge?
surge is a web-publishing tool aimed at front-end developers to help them get their static web pages up and running easily. It can be used to deploy HTML, CSS and JS with the ease of a single command.

How to use surge?
surge is quite an easy tool to use.  It has been developed as a npm package. Now for folks who don’t know what npm is – npm is the JavaScript package manager (Curious?).

To have surge running, you need to have Node.js installed. Run these in the terminal:

sudo apt-get update 
sudo apt-get install nodejs
sudo apt-get install npm 

Now you have Nodejs as well as npm installed. Let’s move on to the main course – installing surge.

npm install --global surge

You have installed surge!
(You may need to preface this command with sudo.)

So let’s go to the directory where we have our files to deploy. Here I have the labs.fossasia.org repository which we’ll try to deploy.

To clone this repo, run this command:

git clone [email protected]:fossasia/labs.fossasia.org.git

After cding into the directory named labs.fossasia.org type

surge

and hit enter.

You’ll be prompted to sign up with your email. Choose a password. After that you’ll see something similar to this.  

Properties of the directory – path and size are listed here. Also, as you can see in the picture, a domain is listed. This is a randomly generated domain by surge. You can stick with it too, or just delete it and type whatever domain you like. surge will deploy your directory to that domain, provided that it is available.

In this example, I thought to escape elfin-education and go with my-labs.surge.sh .

Press enter after typing in the desired domain name and you’ll see surge uploading files to the domain. After it successfully deploys, you’ll get a message :


That’s it. Finally it’s time to check my-labs.surge.sh .

Saving your Domain with CNAME

Next up we take a look at making surge remember the domain.

You’ll be prompted for a domain name, every time you run surge inside the same directory (this is the default behavior). This can be avoided by simply adding a CNAME file to your directory root. Let’s say that you want to stick with ‘my-labs.surge.sh’ in the above example. You can add it to the CNAME file by running this in the terminal.

  echo my-labs.surge.sh > CNAME  

surge also offers adding your own custom domain for deployments. To know about this and read further about surge, visit surge.sh .


Additional Resources

Automatic Imports of Events to Open Event from online event sites with Query Server and Event Collect

One goal for the next version of the Open Event project is to allow an automatic import of events from various event listing sites. We will implement this using Open Event Import APIs and two additional modules: Query Server and Event Collect. The idea is to run the modules as micro-services or as stand-alone solutions.

Query Server
The query server is, as the name suggests, a query processor. As we are moving towards an API-centric approach for the server, query-server also has API endpoints (v1). Using this API you can get the data from the server in the mentioned format. The API itself is quite intuitive.

API to get data from query-server

GET /api/v1/search/<search-engine>/query=query&format=format

Sample Response Header

 Cache-Control: no-cache
 Connection: keep-alive
 Content-Length: 1395
 Content-Type: application/xml; charset=utf-8
 Date: Wed, 24 May 2017 08:33:42 GMT
 Server: Werkzeug/0.12.1 Python/2.7.13
 Via: 1.1 vegur

The server is built in Flask. The GitHub repository of the server contains a simple Bootstrap front-end, which is used as a testing ground for results. The query string calls the search engine result scraper scraper.py that is based on the scraper at searss. This scraper takes search engine, presently Google, Bing, DuckDuckGo and Yahoo as additional input and searches on that search engine. The output from the scraper, which can be in XML or in JSON depending on the API parameters is returned, while the search query is stored into MongoDB database with the query string indexing. This is done keeping in mind the capabilities to be added in order to use Kibana analyzing tools.

The frontend prettifies results with the help of PrismJS. The query-server will be used for initial listing of events from different search engines. This will be accessed through the following API.

The query server app can be accessed on heroku.

➢ api/list​: To provide with an initial list of events (titles and links) to be displayed on Open Event search results.

When an event is searched on Open Event, the query is passed on to query-server where a search is made by calling scraper.py with appending some details for better event hunting. Recent developments with Google include their event search feature. In the Google search app, event searches take over when Google detects that a user is looking for an event.

The feed from the scraper is parsed for events inside query server to generate a list containing Event Titles and Links. Each event in this list is then searched for in the database to check if it exists already. We will be using elastic search to achieve fuzzy searching for events in Open Event database as elastic search is planned for the API to be used.

One example of what we wish to achieve by implementing this type of search in the database follows. The user may search for

-Google Cloud Event Delhi
-Google Event, Delhi
-Google Cloud, Delhi
-google cloud delhi
-Google Cloud Onboard Delhi
-Google Delhi Cloud event

All these searches should match with “Google Cloud Onboard Event, Delhi” with good accuracy. After removing duplicates and events which already exist in the database from this list have been deleted, each event is rendered on search frontend of Open Event as a separate event. The user can click on any of these event, which will make a call to event collect.

Event Collect

The event collect project is developed as a separate module which has two parts

● Site specific scrapers
In its present state, event collect has scrapers for eventbrite and ticket-leap which, given a query, scrape eventbrite (and ticket-leap respectively) search results and downloads JSON files of each event using Loklak‘s API.
The scrapers can be developed in any form or any number of scrapers/scraping tools can be added as long as they are in alignment with the Open Event Import API’s data format. Writing tests for these against the concurrent API formats will take care of this. This part will be covered by using a json-validator​ to check against a pre-generated schema.

● REST APIs
The scrapers are exposed through a set of APIs, which will include, but not limited to,
➢ api/fetch-event : ​to scrape any event given the link and compose the data in a predefined JSON format which will be generated based on Open Event Import API. When this function is called on an event link, scrapers are invoked which collect event data such as event, meta, forms etc. This data will be validated against the generated JSON schema. The scraped JSON and directory structure for media files:
➢ api/export : to export all the JSON data containing event information into Open Event Server. As and when the scraping is complete, the data will be added into Open Event’s database as a new event.

How the Import works

The following graphic shows how the import works.




Let’s dive into the workflow. So as the diagram illustrates, the ‘search​’ functionality makes a call to api/list API endpoint provided by query-server which returns with events’ ‘Title’ and ‘Event Link’ from the parsed XML/JSON feed. This list is displayed as Open Event’s search results. Now the results having been displayed, the user can click on any of the events. When the user clicks on any event, the event is searched for in Open Event’s database. Two things happen now:

  • The event page loads if the event is found.
  • If the event does not already exist in the database, clicking on any event will

➢ Insert this event’s title and link in the database and get the event_id

➢ Make a call to api/fetch-event in event-collect which then invokes a site-specific scraper to fetch data about the event the user has chosen

➢ When the data is scraped, it is imported into Open Event database using the previously generated event_id. The page will be loaded using jquery ajax ​as and when the scraping is done.​When the imports are done, the search page refreshes with the new results. The Open Event Orga Server exposes a well documented REST API that can be used by external services to access the data.