Automatic Imports of Events to Open Event from online event sites with Query Server and Event Collect

One goal for the next version of the Open Event project is to allow an automatic import of events from various event listing sites. We will implement this using Open Event Import APIs and two additional modules: Query Server and Event Collect. The idea is to run the modules as micro-services or as stand-alone solutions.

Query Server
The query server is, as the name suggests, a query processor. As we are moving towards an API-centric approach for the server, query-server also has API endpoints (v1). Using this API you can get the data from the server in the mentioned format. The API itself is quite intuitive.

API to get data from query-server

GET /api/v1/search/<search-engine>/query=query&format=format

Sample Response Header

 Cache-Control: no-cache
 Connection: keep-alive
 Content-Length: 1395
 Content-Type: application/xml; charset=utf-8
 Date: Wed, 24 May 2017 08:33:42 GMT
 Server: Werkzeug/0.12.1 Python/2.7.13
 Via: 1.1 vegur

The server is built in Flask. The GitHub repository of the server contains a simple Bootstrap front-end, which is used as a testing ground for results. The query string calls the search engine result scraper scraper.py that is based on the scraper at searss. This scraper takes search engine, presently Google, Bing, DuckDuckGo and Yahoo as additional input and searches on that search engine. The output from the scraper, which can be in XML or in JSON depending on the API parameters is returned, while the search query is stored into MongoDB database with the query string indexing. This is done keeping in mind the capabilities to be added in order to use Kibana analyzing tools.

The frontend prettifies results with the help of PrismJS. The query-server will be used for initial listing of events from different search engines. This will be accessed through the following API.

The query server app can be accessed on heroku.

➢ api/list​: To provide with an initial list of events (titles and links) to be displayed on Open Event search results.

When an event is searched on Open Event, the query is passed on to query-server where a search is made by calling scraper.py with appending some details for better event hunting. Recent developments with Google include their event search feature. In the Google search app, event searches take over when Google detects that a user is looking for an event.

The feed from the scraper is parsed for events inside query server to generate a list containing Event Titles and Links. Each event in this list is then searched for in the database to check if it exists already. We will be using elastic search to achieve fuzzy searching for events in Open Event database as elastic search is planned for the API to be used.

One example of what we wish to achieve by implementing this type of search in the database follows. The user may search for

-Google Cloud Event Delhi
-Google Event, Delhi
-Google Cloud, Delhi
-google cloud delhi
-Google Cloud Onboard Delhi
-Google Delhi Cloud event

All these searches should match with “Google Cloud Onboard Event, Delhi” with good accuracy. After removing duplicates and events which already exist in the database from this list have been deleted, each event is rendered on search frontend of Open Event as a separate event. The user can click on any of these event, which will make a call to event collect.

Event Collect

The event collect project is developed as a separate module which has two parts

● Site specific scrapers
In its present state, event collect has scrapers for eventbrite and ticket-leap which, given a query, scrape eventbrite (and ticket-leap respectively) search results and downloads JSON files of each event using Loklak‘s API.
The scrapers can be developed in any form or any number of scrapers/scraping tools can be added as long as they are in alignment with the Open Event Import API’s data format. Writing tests for these against the concurrent API formats will take care of this. This part will be covered by using a json-validator​ to check against a pre-generated schema.

● REST APIs
The scrapers are exposed through a set of APIs, which will include, but not limited to,
➢ api/fetch-event : ​to scrape any event given the link and compose the data in a predefined JSON format which will be generated based on Open Event Import API. When this function is called on an event link, scrapers are invoked which collect event data such as event, meta, forms etc. This data will be validated against the generated JSON schema. The scraped JSON and directory structure for media files:
➢ api/export : to export all the JSON data containing event information into Open Event Server. As and when the scraping is complete, the data will be added into Open Event’s database as a new event.

How the Import works

The following graphic shows how the import works.




Let’s dive into the workflow. So as the diagram illustrates, the ‘search​’ functionality makes a call to api/list API endpoint provided by query-server which returns with events’ ‘Title’ and ‘Event Link’ from the parsed XML/JSON feed. This list is displayed as Open Event’s search results. Now the results having been displayed, the user can click on any of the events. When the user clicks on any event, the event is searched for in Open Event’s database. Two things happen now:

  • The event page loads if the event is found.
  • If the event does not already exist in the database, clicking on any event will

➢ Insert this event’s title and link in the database and get the event_id

➢ Make a call to api/fetch-event in event-collect which then invokes a site-specific scraper to fetch data about the event the user has chosen

➢ When the data is scraped, it is imported into Open Event database using the previously generated event_id. The page will be loaded using jquery ajax ​as and when the scraping is done.​When the imports are done, the search page refreshes with the new results. The Open Event Orga Server exposes a well documented REST API that can be used by external services to access the data.