Automatic Imports of Events to Open Event from online event sites with Query Server and Event Collect

One goal for the next version of the Open Event project is to allow an automatic import of events from various event listing sites. We will implement this using Open Event Import APIs and two additional modules: Query Server and Event Collect. The idea is to run the modules as micro-services or as stand-alone solutions.

Query Server
The query server is, as the name suggests, a query processor. As we are moving towards an API-centric approach for the server, query-server also has API endpoints (v1). Using this API you can get the data from the server in the mentioned format. The API itself is quite intuitive.

API to get data from query-server

GET /api/v1/search/<search-engine>/query=query&format=format

Sample Response Header

 Cache-Control: no-cache
 Connection: keep-alive
 Content-Length: 1395
 Content-Type: application/xml; charset=utf-8
 Date: Wed, 24 May 2017 08:33:42 GMT
 Server: Werkzeug/0.12.1 Python/2.7.13
 Via: 1.1 vegur

The server is built in Flask. The GitHub repository of the server contains a simple Bootstrap front-end, which is used as a testing ground for results. The query string calls the search engine result scraper scraper.py that is based on the scraper at searss. This scraper takes search engine, presently Google, Bing, DuckDuckGo and Yahoo as additional input and searches on that search engine. The output from the scraper, which can be in XML or in JSON depending on the API parameters is returned, while the search query is stored into MongoDB database with the query string indexing. This is done keeping in mind the capabilities to be added in order to use Kibana analyzing tools.

The frontend prettifies results with the help of PrismJS. The query-server will be used for initial listing of events from different search engines. This will be accessed through the following API.

The query server app can be accessed on heroku.

➢ api/list​: To provide with an initial list of events (titles and links) to be displayed on Open Event search results.

When an event is searched on Open Event, the query is passed on to query-server where a search is made by calling scraper.py with appending some details for better event hunting. Recent developments with Google include their event search feature. In the Google search app, event searches take over when Google detects that a user is looking for an event.

The feed from the scraper is parsed for events inside query server to generate a list containing Event Titles and Links. Each event in this list is then searched for in the database to check if it exists already. We will be using elastic search to achieve fuzzy searching for events in Open Event database as elastic search is planned for the API to be used.

One example of what we wish to achieve by implementing this type of search in the database follows. The user may search for

-Google Cloud Event Delhi
-Google Event, Delhi
-Google Cloud, Delhi
-google cloud delhi
-Google Cloud Onboard Delhi
-Google Delhi Cloud event

All these searches should match with “Google Cloud Onboard Event, Delhi” with good accuracy. After removing duplicates and events which already exist in the database from this list have been deleted, each event is rendered on search frontend of Open Event as a separate event. The user can click on any of these event, which will make a call to event collect.

Event Collect

The event collect project is developed as a separate module which has two parts

● Site specific scrapers
In its present state, event collect has scrapers for eventbrite and ticket-leap which, given a query, scrape eventbrite (and ticket-leap respectively) search results and downloads JSON files of each event using Loklak‘s API.
The scrapers can be developed in any form or any number of scrapers/scraping tools can be added as long as they are in alignment with the Open Event Import API’s data format. Writing tests for these against the concurrent API formats will take care of this. This part will be covered by using a json-validator​ to check against a pre-generated schema.

● REST APIs
The scrapers are exposed through a set of APIs, which will include, but not limited to,
➢ api/fetch-event : ​to scrape any event given the link and compose the data in a predefined JSON format which will be generated based on Open Event Import API. When this function is called on an event link, scrapers are invoked which collect event data such as event, meta, forms etc. This data will be validated against the generated JSON schema. The scraped JSON and directory structure for media files:
➢ api/export : to export all the JSON data containing event information into Open Event Server. As and when the scraping is complete, the data will be added into Open Event’s database as a new event.

How the Import works

The following graphic shows how the import works.




Let’s dive into the workflow. So as the diagram illustrates, the ‘search​’ functionality makes a call to api/list API endpoint provided by query-server which returns with events’ ‘Title’ and ‘Event Link’ from the parsed XML/JSON feed. This list is displayed as Open Event’s search results. Now the results having been displayed, the user can click on any of the events. When the user clicks on any event, the event is searched for in Open Event’s database. Two things happen now:

  • The event page loads if the event is found.
  • If the event does not already exist in the database, clicking on any event will

➢ Insert this event’s title and link in the database and get the event_id

➢ Make a call to api/fetch-event in event-collect which then invokes a site-specific scraper to fetch data about the event the user has chosen

➢ When the data is scraped, it is imported into Open Event database using the previously generated event_id. The page will be loaded using jquery ajax ​as and when the scraping is done.​When the imports are done, the search page refreshes with the new results. The Open Event Orga Server exposes a well documented REST API that can be used by external services to access the data.

Continue ReadingAutomatic Imports of Events to Open Event from online event sites with Query Server and Event Collect

Dynamically marshalling output in Flask Restplus

Do you use Flask-Restplus ? Have you felt the need of dynamically modifying API output according to condition. If yes, then this post is for you.

In this post, I will show how to use decorators to restrict GET API output. So let’s start.

This is the basic code to create an API. Here we have created a get_speaker API to get a single item from Speaker model.

from flask_restplus import Resource, Model, fields, Namespace
from models import Speaker

api = Namespace('speakers', description='Speakers', path='/')

SPEAKER = Model('Name', {
	'id': fields.Integer(),
	'name': fields.String(),
	'phone': fields.String()
})

class DAO:
	def get(speaker_id):
		return Speaker.query.get(speaker_id)

@api.route('/speakers/<int:speaker_id>')
class Speaker(Resource):
    @api.doc('get_speaker')
    @api.marshal_with(SPEAKER)
    def get(self, speaker_id):
        """Fetch a speaker given its id"""
        return DAO.get(speaker_id)

Now our need is to change the returned API data according to some condition. Like if user is authenticated then only return phone field of the SPEAKER model. One way to do this is to create condition statements in get method that marshals the output according to the situation. But if there are lots of methods which require this, then this is not a good way.

So let’s create a decorator which can change the marshal decorator at runtime. It will accept parameters as which models to marshal in case of authenticated and non-authenticated cases.

from flask_login import current_user
from flask_restplus import marshal_with

def selective_marshal_with(fields, fields_private):
    """
    Selective response marshalling. Doesn't update apidoc.
    """
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            if current_user.is_authenticated:
                model = fields
            else:
                model = fields_private
            func2 = marshal_with(model)(func)
            return func2(*args, **kwargs)
        return wrapper
    return decorator

The above code adds a wrapper over the API function which checks if the user is authenticated. If the user is authenticated, fields model is used for marshalling else fields_private is used for marshalling.

So let’s create the private model for SPEAKER. We will call it SPEAKER_PRIVATE.

from flask_restplus import Model, fields

SPEAKER_PRIVATE = Model('NamePrivate', {
	'id': fields.Integer(),
	'name': fields.String()
})

The final step is attaching the selective_marshal_with decorator to the get() method.

@api.route('/speakers/<int:speaker_id>')
class Speaker(Resource):
    @api.doc('get_speaker', model=SPEAKER)
    @selective_marshal_with(SPEAKER, SPEAKER_PRIVATE)
    def get(self, speaker_id):
        """Fetch a speaker given its id"""
        return DAO.get(speaker_id)

You will notice that I removed @api.marshal_with(SPEAKER). This was to disable automatic marshalling of output by flask-restplus. To compensate for this, I have added model=SPEAKER inapi.doc. It will not auto-marshal the output but will still show the swagger documentation.

That concludes this. The get method will now switch marshal field w.r.t to the authentication level of the user. As you may notice, the selective_marhsal_with function is generic and can be used with other models and APIs too.

 

{{ Repost from my personal blog http://aviaryan.in/blog/gsoc/dynamic-marshal-restplus.html }}

Continue ReadingDynamically marshalling output in Flask Restplus

ETag based caching for GET APIs

Many client applications require caching of data to work with low bandwidth connections. Many of them do it to provide faster loading time to the client user. The Webapp and Android app had similar requirements. Previously they provided caching using a versions API that would keep track of any modifications made to Events or Services. The response of the API would be something like this:

[{
  "event_id": 6,
  "event_ver": 1,
  "id": 27,
  "microlocations_ver": 0,
  "session_ver": 4,
  "speakers_ver": 3,
  "sponsors_ver": 2,
  "tracks_ver": 3
}]

The number corresponding to "*_ver" tells the number of modifications done for that resource list. For instance, "tracks_ver": 3 means there were three revisions for tracks inside the event (/events/:event_id/tracks). So when the client user starts his app, the app would make a request to the versions API, check if it corresponds to the local cache and update accordingly. It had some shortcomings, like checking modifications for a individual resources. And if a particular service (microlocation, track, etc.) resource list inside an event needs to be checked for updates, a call to the versions API would be needed.

ETag based caching for GET APIs

The concept of ETag (Entity Tag) based caching is simple. When a client requests (GET) a resource or a resource list, a hash of the resource/resource list is calculated at the server. This hash, called the ETag is sent with the response to the client, preferably as a header. The client then caches the response data and the ETag alongside the resource. Next time when the client makes a request at the same endpoint to fetch the resource, he sets an If-None-Match header in the request. This header contains the value of ETag the client saved before. The server grabs the resource requested by the client, calculates its hash and checks if it is equal to the value set for If-None-Match. If the value of the hash is same, then it means the resource has not changed, so a response with resource data is not needed. If it is different, then the server returns the response with resource data and a new ETag associated with that resource.

Little modifications were needed to deal with ETags for GET requests. Flask-Restplus includes a Resource class that defines a resource. It is a pluggable view. Pluggable views need to define a dispatch_request method that returns the response.

import json
from hashlib import md5

from flask.ext.restplus import Resource as RestplusResource

# Custom Resource Class
class Resource(RestplusResource):
    def dispatch_request(self, *args, **kwargs):
        resp = super(Resource, self).dispatch_request(*args, **kwargs)

        # ETag checking.
        # Check only for GET requests, for now.
        if request.method == 'GET':
            old_etag = request.headers.get('If-None-Match', '')
            # Generate hash
            data = json.dumps(resp)
            new_etag = md5(data).hexdigest()

            if new_etag == old_etag:
                # Resource has not changed
                return '', 304
            else:
                # Resource has changed, send new ETag value
                return resp, 200, {'ETag': new_etag}

        return resp

To add support for ETags, I sub-classed the Resource class to extend the dispatch_request method. First, I grabbed the response for the arguments provided to RestplusResource‘s dispatch_request method. old_etag contains the value of ETag set in the If-None-Match header. Then hash for the resp response is calculated. If both ETags are equal then an empty response is returned with 304 HTTP status (Not Modified). If they are not equal, then a normal response is sent with the new value of ETag.

[smg:~] $ curl -i http://127.0.0.1:8001/api/v2/events/1/tracks/1 
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 1061
ETag: ada4d057f76c54ce027aaf95a3dd436b
Server: Werkzeug/0.11.9 Python/2.7.6
Date: Thu, 21 Jul 2016 09:01:01 GMT

{"description": "string", "sessions": [{"id": 1, "title": "Fantastische Hardware Bauen & L\u00f6ten Lernen mit Mitch (TV-B-Gone) // Learn to Solder with Cool Kits!"}, {"id": 2, "title": "Postapokalyptischer Schmuck / Postapocalyptic Jewellery"}, {"id": 3, "title": "Query Service Wikidata "}, {"id": 4, "title": "Unabh\u00e4ngige eigene Internet-Suche in wenigen Schritten auf dem PC installieren"}, {"id": 5, "title": "Nitrokey Sicherheits-USB-Stick"}, {"id": 6, "title": "Heart Of Code - a hackspace for women* in Berlin"}, {"id": 7, "title": "Free Software Foundation Europe e.V."}, {"id": 8, "title": "TinyBoy Project - a 3D printer for education"}, {"id": 9, "title": "LED Matrix Display"}, {"id": 11, "title": "Schnittmuster am PC erstellen mit Valentina / Valentina Digital Pattern Design"}, {"id": 12, "title": "PC mit Gedanken steuern - Brain-Computer Interfaces"}, {"id": 14, "title": "Functional package management with GNU Guix"}], "color": "GREEN", "track_image_url": "http://website.com/item.ext", "location": "string", "id": 1, "name": "string"}

[smg:~] $ curl -i --header 'If-None-Match: ada4d057f76c54ce027aaf95a3dd436b' http://127.0.0.1:8001/api/v2/events/1/tracks/1 
HTTP/1.0 304 NOT MODIFIED
Connection: close
Server: Werkzeug/0.11.9 Python/2.7.6
Date: Thu, 21 Jul 2016 09:01:27 GMT

ETag based caching has a drawback. Since the hash is calculated for every GET request it increases the load on servers. So if four clients request the same resource, the server calcuates hashes four times. This can be solved by calculating and saving the ETag during creation and modification of resources, and then getting and sending this ETag directly.

Continue ReadingETag based caching for GET APIs

Permission Decorators

A follow-up to one of my previous posts: Organizer Server Permissions System.

I recently had a requirement to create permission decorators for use in our REST APIs. There had to be separate decorators for Event and Services.

Event Permission Decorators

Understanding Event permissions is simple: Any user can create an event. But access to an event is restricted to users that have Event specific Roles (e.g. Organizer, Co-organizer, etc) for that event. The creator of an event is its Organizer, so he immediately gets access to that event. You can read about these roles in the aforementioned post.

So for Events, create operation does not require any permissions, but read/update/delete operations needed a decorator. This decorator would restrict access to users with event roles.

def can_access(func):
    """Check if User can Read/Update/Delete an Event.
    This is done by checking if the User has a Role in an Event.
    """
    @wraps(func)
    def wrapper(*args, **kwargs):
        user = UserModel.query.get(login.current_user.id)
        event_id = kwargs.get('event_id')
        if not event_id:
            raise ServerError()
        # Check if event exists
        get_object_or_404(EventModel, event_id)
        if user.has_role(event_id):
            return func(*args, **kwargs)
        else:
            raise PermissionDeniedError()
    return wrapper

The has_role(event_id) method of the User class determines if the user has a Role in an event.

# User Model class

    def has_role(self, event_id):
        """Checks if user has any of the Roles at an Event.
        """
        uer = UsersEventsRoles.query.filter_by(user=self, event_id=event_id).first()
        if uer is None:
            return False
        else:
            return True

Reading one particular event (/events/:id [GET]) can be restricted to users, but a GET request to fetch all the events (/events [GET]) should only be available to staff (Admin and Super Admin). So a separate decorator to restrict access to Staff members was needed.

def staff_only(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        user = UserModel.query.get(login.current_user.id)
        if user.is_staff:
            return func(*args, **kwargs)
        else:
            raise PermissionDeniedError()
    return wrapper

Service Permission Decorators

Service Permissions for a user are defined using Event Roles. What Role a user has in an Event determines what Services he has access to in that Event. Access here means permission to Create, Read, Update and Delete services. The User model class has four methods to determine the permissions for a Service in an event.

user.can_create(service, event_id)
user.can_read(service, event_id)
user.can_update(service, event_id)
user.can_delete(service, event_id)

So four decorators were needed to put alongside POST, GET, PUT and DELETE method handlers. I’ve pasted snippet for the can_update decorator. The rest are similar but with their respective permission methods for User class object.

def can_update(DAO):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            user = UserModel.query.get(login.current_user.id)
            event_id = kwargs.get('event_id')
            if not event_id:
                raise ServerError()
            # Check if event exists
            get_object_or_404(EventModel, event_id)
            service_class = DAO.model
            if user.can_update(service_class, event_id):
                return func(*args, **kwargs)
            else:
                raise PermissionDeniedError()
        return wrapper
    return decorator

This decorator is a little different than can_access event decorator in a way that it takes an argument, DAO. DAO is Data Access Object. A DAO includes a database Model and methods to create, read, update and delete object of that model. The db model for a DAO would be the Service class for the object. You can look that the model class is taken from the DAO and used as the service class.

The can_create, can_read and can_delete decorators look exactly the same except they use their (obvious) permission methods on the User class object.

Continue ReadingPermission Decorators

Paginated APIs in Flask

Week 2 of GSoC I had the task of implementing paginated APIs in Open Event project. I was aware that DRF provided such feature in Django so I looked through the Internet to find some library for Flask. Luckily, I didn’t find any so I decided to make my own.

A paginated API is page-based API. This approach is used as the API data can be very large sometimes and pagination can help to break it into small chunks. The Paginated API built in the Open Event project looks like this –

{
    "start": 41,
    "limit": 20,
    "count": 128,
    "next": "/api/v2/events/page?start=61&limit=20",
    "previous": "/api/v2/events/page?start=21&limit=20",
    "results": [
    	{
    		"data": "data"
    	},
    	{
    		"data": "data"
    	}
    ]
}

Let me explain what the keys in this JSON mean –

  1. start – It is the position from which we want the data to be returned.
  2. limit – It is the max number of items to return from that position.
  3. next – It is the url for the next page of the query assuming current value of limit
  4. previous – It is the url for the previous page of the query assuming current value of limit
  5. count – It is the total count of results available in the dataset. Here as the ‘count’ is 128, that means you can go maximum till start=121 keeping limit as 20. Also when you get the page with start=121 and limit=20, 8 items will be returned.
  6. results – This is the list of results whose position lies within the bounds specified by the request.

Now let’s see how to implement it. I have simplified the code to make it easier to understand.

from flask import Flask, abort, request, jsonify
from models import Event

app = Flask(__name__)

@app.route('/api/v2/events/page')
def view():
	return jsonify(get_paginated_list(
		Event, 
		'/api/v2/events/page', 
		start=request.args.get('start', 1), 
		limit=request.args.get('limit', 20)
	))

def get_paginated_list(klass, url, start, limit):
    # check if page exists
    results = klass.query.all()
    count = len(results)
    if (count < start):
        abort(404)
    # make response
    obj = {}
    obj['start'] = start
    obj['limit'] = limit
    obj['count'] = count
    # make URLs
    # make previous url
    if start == 1:
        obj['previous'] = ''
    else:
        start_copy = max(1, start - limit)
        limit_copy = start - 1
        obj['previous'] = url + '?start=%d&limit=%d' % (start_copy, limit_copy)
    # make next url
    if start + limit > count:
        obj['next'] = ''
    else:
        start_copy = start + limit
        obj['next'] = url + '?start=%d&limit=%d' % (start_copy, limit)
    # finally extract result according to bounds
    obj['results'] = results[(start - 1):(start - 1 + limit)]
    return obj

Just to be clear, here I am assuming you are using SQLAlchemy for the database. The klass parameter in the above code is the SqlAlchemy db.Model class on which you want to query upon for the results. The url is the base url of the request, here ‘/api/v2/events/page’ and it used in setting the previous and next urls. Other things should be clear from the code.

So this was how to implement your very own Paginated API framework in Flask (should say Python). I hope you found this post interesting.

Until next time.

Ciao

 

{{ Repost from my personal blog http://aviaryan.in/blog/gsoc/paginated-apis-flask.html }}

Continue ReadingPaginated APIs in Flask

Errors and Error handlers for REST API

Errors are an essential part of a REST API system. Error instances must follow a particular structure so the client developer can correctly handle them at the client side. We had set a proper error structure at the beginning of creating REST APIs. It’s as follows:

{
    "error": {
        "message": "Error description",
        "code": 400
    }
}

Any error occurring during client server communication would follow the above format. Code is the returned status code and message is a brief description of the error. To raise an error we used an _error_abort() function which was an abstraction over Flask-RESTplus abort(). We defined an error structure inside _error_abort() and passed it to abort().

def _error_abort(code, message):
    error = {
        'code': code,
        'message': message,
    }
    abort(code, error=error)

This method had its limitations. Since the error handlers were not being overidden, only errors raised through _error_abort() had the defined structure. So if an Internal Server error occurred, the returned error response wouldn’t follow the format.

To overcome this, we wrote our own exceptions for errors and created error handlers to handle them. We first made the response structure more detailed, so the client developer can understand what kind of error is being returned.

{
    "error": {
        "code": 400,
        "message": "'name' is a required parameter",
        "status": "INVALID_FIELD",
        "field": "name"
    }
}

The above is an example of a validation error.

The “status” key helps making the error more unique. For example we use 400 status code for both validation errors and invalid service errors (“Service not belonging to said event”). But both have different statuses: “INVALID_FIELD” and “INVALID_SERVICE”.

The “field” key is only useful in the case validation errors, where it names the field that did not pass the validation. For other errors it remains null.

I first documented what kind of errors we would need.

Code Status Field Description
401 NOT_AUTHORIZED null Invalid token or user not authenticated
400 INVALID_FIELD “field_name” Missing or invalid field
400 INVALID_SERVICE null Service ID mentioned in path does not belong to said Event
404 NOT_FOUND null Event or Service not found
403 PERMISSION_DENIED null User role not allowed to perform such action
500 SERVER_ERROR null Internal server error

Next part was creating exception classes for each one these. I created a base error class that extended the python Exception class.

class BaseError(Exception):
    """Base Error Class"""

    def __init__(self, code=400, message='', status='', field=None):
        Exception.__init__(self)
        self.code = code
        self.message = message
        self.status = status
        self.field = field

    def to_dict(self):
        return {'code': self.code,
                'message': self.message,
                'status': self.status,
                'field': self.field, }

The to_dict() method would help when returning the response in error handlers.

I then extended this base class to other error classes. Here are three of them:

class NotFoundError(BaseError):
    def __init__(self, message='Not found'):
        BaseError.__init__(self)
        self.code = 404
        self.message = message
        self.status = 'NOT_FOUND'


class NotAuthorizedError(BaseError):
    def __init__(self, message='Unauthorized'):
        BaseError.__init__(self)
        self.code = 401
        self.message = message
        self.status = 'NOT_AUTHORIZED'


class ValidationError(BaseError):
    def __init__(self, field, message='Invalid field'):
        BaseError.__init__(self)
        self.code = 400
        self.message = message
        self.status = 'INVALID_FIELD'
        self.field = field

I then defined the error handlers for the api:

@api.errorhandler(NotFoundError)
@api.errorhandler(NotAuthorizedError)
@api.errorhandler(ValidationError)
@api.errorhandler(InvalidServiceError)
def handle_error(error):
    return error.to_dict(), getattr(error, 'code')

For overriding the default error handler, Flask-RESTplus let’s you create one with the same decorator, but without passing and argument to it.

@api.errorhandler
def default_error_handler(error):
    """Returns Internal server error"""
    error = ServerError()
    return error.to_dict(), getattr(error, 'code', 500)

I had set the default error to be the internal server error.

class ServerError(BaseError):
    def __init__(self, message='Internal server error'):
        BaseError.__init__(self)
        self.code = 500
        self.message = message
        self.status = 'SERVER_ERROR'

Now raising any of these error classes would activate the error handlers and a proper response would be sent to the client.

Continue ReadingErrors and Error handlers for REST API

Better fields and validation in Flask Restplus

We at Open Event Server project are using flask-restplus for API. Apart from auto-generating of Swagger specification, another great plus point of restplus is how easily we can set input and output models and the same is automatically shown in Swagger UI. We can also auto-validate the input in POST/PUT requests to make sure that we get what we want.

@api.expect(EVENT_POST, validate=True)
def put(self, id):
    """Modify object at id"""
    pass

As can be seen above, the validate param for namespace.expect decorator allows us to auto-validate the input payloads. This used to work well until one day I realized there were a few problems.

  1. When a field was defined as say for example field.Integer, then it will accept only Integer values, not even null.
  2. If there is a string field and it has required param set to True, then also it is possible to set empty string as its value and the in-built validator won’t catch it.
  3. Even if I somehow managed to hack my way to support null in field, it will also support null even if required=True.
  4. We had no control on what error message was returned.
EVENT = api.model('Event', {
    'id': fields.Integer,
    'name': fields.String(required=True)
})

Specially problem #1 was a huge one as it questioned the whole foundation of the API. So we realized it will be better if we don’t use namespace.expect and use a custom validator. For custom validator, we first had to create custom fields that this validator can benefit from. Luckily flask-restplus comes with a great API for creating custom fields. So we quickly created custom fields for all common fields (Integer, String) and more specific fields like Email, Uri and Color. Creating these specific fields were a huge advantage as now we can show proper example for each field types in the Swagger UI.

class Email(fields.String):
    """
    Email field
    """
    __schema_type__ = 'string'
    __schema_format__ = 'email'
    __schema_example__ = 'email@domain.com'

Consider the above code; now when we use Email as a field for a value, then the example shown for it in Swagger UI will be ‘email@domain.com’. Quite cool, right?

Now we needed a way to validate these fields. For that, what we did was to create a validate method in each of the field-classes. This validate method would get the value and check if it was valid. Consider the following code –

import re
EMAIL_REGEX = re.compile(r'S+@S+.S+')

class Email():
	def validate(self, value):
		if not value:
		    return False if self.required else True
		if not EMAIL_REGEX.match(value):
		    return False
		return True

Once each of the field had their validate methods, we created a validate_payload() function that uses the API model and compares it with the payload. It will first check if all required keys are present in the payload or not. When that is true, it finally validates each field’s value using their field’s classvalidate method.

from flask import abort
from flask_restplus import fields
from custom_fields import CustomField

def validate_payload(payload, api_model):
    # check if any reqd fields are missing in payload
    for key in api_model:
        if api_model[key].required and key not in payload:
            abort(400, 'Required field '%s' missing' % key)
    # check payload
    for key in payload:
        field = api_model[key]
        if isinstance(field, fields.List):
            field = field.container
            data = payload[key]
        else:
            data = [payload[key]]
        if isinstance(field, CustomField) and hasattr(field, 'validate'):
            for i in data:
                if not field.validate(i):
                    abort(400, 'Validation of '%s' field failed' % key)

The CustomField is the base class that each of the custom fields mentioned above inherit. So checking if field was an instance of CustomField is enough to know if it is a custom field or not. Other thing that may look weird in the above code is use of fields.List. If you look closely, I have added this to support custom fields inside lists. So if you have used a custom field in a list, it will also work too. But obviously, this only supports single level lists for now. The thing is we didn’t needed more than that so I let it go. :stuck_out_tongue_winking_eye:

This basically sums up how we are validating input payloads at Open Event. Of course this is very basic but we will keep on improving it as the project progresses. Stay tuned to opev blog if you want to be in touch with the progress of the project.

Links to full code at the time of writing this post are –

  1. Custom Fields
  2. Validate Payload

I hope you found this post useful. Thanks for reading.

 

{{ Repost from my personal blog http://aviaryan.in/blog/gsoc/restplus-validation-custom-fields.html }}

Continue ReadingBetter fields and validation in Flask Restplus

Swagger

Swagger is a specification for describing REST APIs. The main aim of Swagger is to provide a REST API definition format that is readable by both machines and humans.

You can think of two entities in a REST API: One the provider of API, and another the client using the API. Swagger essentially covers the gap between them by providing a format that is easy to use by the client and easy for the provider to define.

If not using Swagger, one would most certainly be creating the API first, writing the documentation (human-readable) with it. The client developer would read the documentation and use APIs as required. With Swagger, the specification can be considered as the document itself, helping both the client developer and the provider.

Here’s an example spec I wrote for our GET APIs at Organizer Server: https://gist.github.com/shivamMg/dacada0b45585bcd9cd0fbe4a722eddf

The format, although readable doesn’t really look what a client developer would be asking for.

Remember that the format is machine readable? What does it mean exactly?

Since the Swagger spec is a defined format, the provider can document it and people can write programs that understand the Swagger specification. Swagger itself comes with a set of tools (http://swagger.io/tools/) that use Swagger definitions created by the API provider to create SDKs for the clients to use.

Swagger-UI

One of our most used tools at our server is the Swagger UI (http://swagger.io/swagger-ui/).

It reads an API spec written in Swagger to generate corresponding UI that people can use to explore the APIs. Every API endpoint can have responses and parameters associated with it. For instance our Event endpoint at Server (“/events/:event_id”).

You can see how the UI displays the Model Schema, required parameters and possible response message.

Screenshot from 2016-06-07 19:08:21

Apart from documentation, Swagger UI provides the “Try it out!” tool that lets you make requests to the server for the corresponding API. This feature is incredibly useful for POST requests. No need for long curl commands in the terminal.

Here’s the Swagger config from our demo application: https://open-event.herokuapp.com/api/v2/swagger.json

The Swagger UI for this config can be found at https://open-event.herokuapp.com/api/v2

The UI for the example spec (gist) I linked before can be browsed here.

Swagger-js

https://github.com/swagger-api/swagger-js

Swagger-js is JS library that reads an API spec written in Swagger and provides an interface to the client developer to interact with the API. We will be using Swagger-js for Open Event Webapp.

Here’s an example to show you how it works. Let’s say the following endpoint returns (json) a list of events.

/events

The client developer can create a GET request and render the list with HTML.

$.getJSON("http://example.com/api/v2/events", function(data) {

  var events = "";
  $.each(data, function(i, event) {
    events += "<li id='event_" + i + "'>" + event.name + "</li>";
  });

  $("ul#events").html(events);
});

What if the provider moves the endpoint to “/event/all”? The client developer would need to change every instance of the URL to http://example.com/event/all.

Let’s now take the case of Swagger-js. With Swagger-js the client developer would essentially be writing programs to interact with the API Swagger spec defined by the provider, instead of directly consuming the API.

window.client = new SwaggerClient({
    url: "http://example.com/api/v2/swagger.json",
    success: function() {
      client.event.getEvents({
        responseContentType: 'application/json'
      }, function(data) {

        /* Create `events` string same as before */

        $("ul#events").html(events);
      });
    }
  });

 

The APIs at the Organizer server are getting more complex. Swagger helps in keeping the spec well defined.

That’s all for now. Hope you enjoyed the read.

Continue ReadingSwagger

REST API Authentication in Flask

Recently I had the challenge of restricting unauthorized personnel from accessing some views in Flask. Sure the naive way will be asking the username and password in the json itself and checking the records in the database. The request will be something like this-

{
	"username": "open_event_user",
	"password": "password"
}

But I wanted to do something better. So I looked up around the Internet and found that it is possible to accept Basic authorization credentials in Flask (sadly it isn’t documented). For those who don’t know what Basic authorization is a way to send plain username:password combo as header in a request after obscuring them with base64 encoding. So for the above username and password, the corresponding header will be –

{
	"Authorization": "Basic b3Blbl9ldmVudF91c2VyOnBhc3N3b3Jk"
}

where the hashed string is base64 encoded form of string “open_event_user:password”.

Now back to the topic, so the next job is to validate the views by checking the Basic auth credentials in header and call abort() if credentials are missing or wrong. For this, we can easily create a helper function that aborts a view if there is something wrong with the credentials.

from flask import request, Flask, abort
from models import UserModel

app = Flask(__name__)

def validate_auth():
	auth = request.authorization
	if not auth:  # no header set
		abort(401)
	user = UserModel.query.filter_by(username=auth.username).first()
	if user is None or user.password != auth.password:
		abort(401)

@app.route('/view')
def my_view():
	validate_auth()
	# stuff on success
	# more stuff

This works but wouldn’t it be nice if we could specify validate_auth function as a decorator. This will give us the advantage of only having to set it once in a model view with all auth-required methods. Right ? So here we go

def requires_auth(f):
	@wraps(f)
	def decorated(*args, **kwargs):
		auth = request.authorization
		if not auth:  # no header set
			abort(401)
		user = UserModel.query.filter_by(username=auth.username).first()
		if user is None or user.password != auth.password:
			abort(401)
		return f(*args, **kwargs)
	return decorated

@app.route('/view')
@requires_auth
def my_view():
	# stuff on success
	# more stuff

I renamed the function from validate_auth to requires_auth because it suits the context better.

At this point, the above code may look perfect but it doesn’t work when you are accessing the API through Swagger web UI. This is because it is not possible to set base64 encoded authorization header from the swagger UI. For those who are wondering “what the hell is swagger”, I will define Swagger as a tool for API based projects which creates a nice web UI to live-test the API and also exports a schema of the API that can be used to understand API definitions.

Now how do we get requires_auth to work when a request is sent through swagger UI ? It was a little tricky and took me a couple of hours but I finally got it. The trick therefore is to check for active sessions when there are no authorization headers set (as in the case of swagger UI). If an active session is found, it means that the user is authenticated. Here I would like to suggest using Flask-Login extension which makes session and login management a child’s play. Always use it if your flask project deals with login, user accounts and stuff.

Now back to the task in hand, here is how we can set the requires_auth function to check for existing sessions.

from flask import request, abort, g
from flask.ext import login

def requires_auth(f):
	@wraps(f)
	def decorated(*args, **kwargs):
		auth = request.authorization
		if not auth:  # no header set
			if login.current_user.is_authenticated:  # check active session
				g.user = login.current_user
				return f(*args, **kwargs)
			else:
				abort(401)
		user = UserModel.query.filter_by(username=auth.username).first()
		if user is None or user.password != auth.password:
			abort(401)
		g.user = user
		return f(*args, **kwargs)
	return decorated

Pretty easy right !! Also notice that I am saving the user who was currently authenticated in flask’s global variable g. Now the authenticated user can be accessed from views as g.user. Cool, isn’t it ? Now if there is a need to add a more secure form of authorization like ‘Token’ based, you can easily update therequires_auth decorator to get the same results.

I hope this article provided valuable insight into managing REST API authorizations in Flask. I will keep posting more awesome things I learn in my GSoC journey.

That’s it. Sayonara.

 

{{ Repost from my personal blog http://aviaryan.in/blog/gsoc/auth-flask-done-right.html }}

Continue ReadingREST API Authentication in Flask

Organizer Server and REST APIs

The Open Event Organizer Server is a server application written in Flask. It provides an admin interface for the organizers of events to manage events and related services like Sessions, Tracks, etc. Additionally it provides GET APIs for developers to read data from the server. These APIs are consumed by the Open Event Webapp and Android Client to display details to the users. The existing APIs could only fetch data. My is to create REST APIs that write data to the server.

Developers are divided into groups with every group handling one aspect of the project. Avi Aryan and I would be working on REST APIs. Currently I’m working on porting the existing GET APIs to newer spec. Justin decided on our stack and technologies we would be working with (link). We are using Flask-Restplus extension for building APIs.

The write API is not going to be the only big change to the server. The new authorization system is also going to change a lot. It would include more user roles, with each of them having different set of permissions. Apart from the Administrator and Organizer, a user can be:

  • Attendee
  • Moderator
  • Speaker
  • Track Organizer

Other changes to User Management are mentioned in the docs.

Besides these, adding support for OAuth 2.0 is also on the list. This would let users sign up through Social Media platforms.

I had thought of a possible work flow for my group.

  1. Port existing GET APIs with newer spec on Flask-Restplus.
  2. Add the user roles mentioned above to the authorization system. A user role defines what type of services a user has access to. For example, an Attendee has write access to feedback and rating system for Sessions but he does not have permissions to create or modify Tracks. This needs to be done before creating write (POST/PUT/DELETE) APIs, so that access to services can be defined according to the user roles. The GET APIs are public and do not require any such permissions.
  3. Create write APIs.
  4. Set up user authentication system to register and sign in users.
  5. Add support for OAuth 2.0.

Many of these changes require other changes to the server. Like the feedback and rating system for Attendees has not been implemented yet and would first require definitions for its database models. Existing models would also require changes. A lot of work to do.

I have previously worked with Python and Django. This project brings a lot of new stuff for me. Python2, Flask, Swagger and RESTful APIs. DukeJustin and Mario are going to be mentors for the Organizer Server. All in all, this summer is going to be an exciting one 😀.

About Me:

I’m Shivam, a 3rd year Computer Engineering student from College of Technology, Pantnagar. This is my first attempt at GSOC and I’m grateful to FOSSASIA for accepting my proposal. I started out with FOSSASIA by contributing to their Open Source projects.

Continue ReadingOrganizer Server and REST APIs