Performing Database Migrations using DbFlow

In Open Event Organizer Android App we decided to add database migrations for every change in database while development. Two of the reasons behind this –

  1. The users have some version of the app installed and then during development the developers had to modify the database, let’s say, add a table or modify existing ones. This makes the existing database incompatible with the new app. On adding database migration the upgradation of database takes place automatically and prevent the situation of reinstalling the app.
  2. Migrations makes it possible to rollback or upgrade to some particular database state. Thus, help in debugging certain changes in isolation.

Let’s discuss the implementation details. Consider the scenario when a user adds a new table named SpeakersCall. For creating migration for this change we need to add migration class inside OrgaDatabase class annotated with @Database. We will break it down to look closely at each step.

  1. Use @Migration annotation in DbFlow library and specify the new database version (by incrementing the existing version) and the database class.
  2. Extend BaseMigration and override migrate method.
  3. The logic used inside the migrate method can be different for different tasks. In the present case we first need to delete any existing table (if exists) with the name SpeakersCall and then recreate that table in database.
  4. Create an array of java classes which are created/modified.
  5. We wrap the SQL query inside a Database Wrapper class which prevents it from running recursively.
  6. FlowManager uses reflection to look up and construct the generated database holder class used in defining the structure for all databases used in this application. So we will getModelAdapter for the class to be recreated and use creation query returned by Model Adapter and execute it.
@Migration(version = 15, database = OrgaDatabase.class)
public static class MigrationTo15 extends BaseMigration {

@Override
public void migrate(@NonNull DatabaseWrapper databaseWrapper) {
Timber.d(“Running migration for DB version 14”);

Class<?>[] recreated = new Class[] {SpeakersCall.class};

for (Class<?> recreate: recreated) {
ModelAdapter modelAdapter = FlowManager.getModelAdapter(recreate);
databaseWrapper.execSQL(DROP_TABLE + modelAdapter.getTableName());
databaseWrapper.execSQL(modelAdapter.getCreationQuery());
}
}
}

Similarly, we can write migration for changing a column of table(s).

Adding device names’ support for check-ins to Open Event Server

The Open Event Server provides backend support to Open Event Organizer Android App which is used to check-in attendees in an event. When checking in attendees, it is important for any event organizer to keep track of the device that was used to check someone in. For this, we provide an option in the Organizer App settings to set the device name. But this device name should have support in the server as well.

The problem is to be able to add device name data corresponding to each check-in time. Currently attendees model has an attribute called `checkin-times`, which is a csv of time strings. For each value in the csv, there has to be a corresponding device name value. This could be achieved by providing a similar csv key-value pair for “device-name-checkin”.

The constraints that we need to check for while handling device names are as follows:

  • If there’s `device_name_checkin` in the request, there must be `is_checked_in` and `checkin_times` in the data as well.
  • Number of items in checkin_times csv in data should be equal to the length of the device_name_checkin csv.
  • If there’s checkin_times in data, and device-name-checkin is absent, it must be set to `-` indicating no set device name.
if ‘device_name_checkin’ in data and data[‘device_name_checkin’] is not None:
  if ‘is_checked_in’ not in data or not data[‘is_checked_in’]:
       raise UnprocessableEntity(
           {‘pointer’: ‘/data/attributes/device_name_checkin’},
           “Attendee needs to be checked in first”
       )
  elif ‘checkin_times’ not in data or data[‘checkin_times’] is None:
      raise UnprocessableEntity(
          {‘pointer’: ‘/data/attributes/device_name_checkin’},
           “Check in Times missing”
      )
  elif len(data[‘checkin_times’].split(“,”)) != len(data[‘device_name_checkin’].split(“,”)):
     raise UnprocessableEntity(
           {‘pointer’: ‘/data/attributes/device_name_checkin’},
           “Check in Times missing for the corresponding device name”
         )
 if ‘checkin_times’ in data:
   if ‘device_name_checkin’ not in data or data[‘device_name_checkin’] is None:
       data[‘device_name_checkin’] = ‘-‘

The case is a little different for a PATCH request since we need to check for the number of items differently like this:

if ‘device_name_checkin’ in data and data[‘device_name_checkin’] is not None:
            if obj.device_name_checkin is not None:
               data[‘device_name_checkin’] = ‘{},{}’.format(obj.device_name_checkin, data[‘device_name_checkin’])                                                   
            if len(data[‘checkin_times’].split(“,”)) != len(data[‘device_name_checkin’].split(“,”)):
               raise UnprocessableEntity(
                   {‘pointer’: ‘/data/attributes/device_name_checkin’},
                   “Check in Time missing for the corresponding device name”)

Since we expect only the latest value to be present in a PATCH request, we first add it to the object by formatting using:

'{},{}'.format(obj.device_name_checkin, data['device_name_checkin'])

and then compare the length of the obtained CSVs for check in times and device names, so that corresponding to each check in time, we have either a device name or the default fill in value ‘-’.

That’s all. Read the full code here.

Requests and Responses:

Resources

  1. SQLAlchemy Docs
    https://docs.sqlalchemy.org/en/latest/
  2. Alembic Docs
    http://alembic.zzzcomputing.com/en/latest/
  3. Flask REST JSON API Classical CRUD operation
    https://flask-rest-jsonapi.readthedocs.io/en/latest/quickstart.html#classical-crud-operations

Implementing Checkout Times for Attendees on Open Event Server

As of this writing, Open Event Server did not have the functionality to add, manipulate and delete checkout times of attendees. Event organizers should have access to log and update attendee checkout times. So it was decided to implement this functionality in the server. This boiled down to having an additional attribute checkout_times in the ticket holder model of the server.

So the first step was to add a string column named checkout_times in the ticket holder database model, since this was going to be a place for comma-separated values (CSV) of attendee checkout times. An additional boolean attribute named is_checked_out was also added to convey whether an attendee has checked out or not. After the addition of these attributes in the model, we saved the file and performed the required database migration:

To create the migration file for the above changes:

$ python manage.py db migrate

To upgrade the database instance:

$ python manage.py db upgrade

Once the migration was done, the API schema file was modified accordingly:

class AttendeeSchemaPublic(SoftDeletionSchema):
    """
    Api schema for Ticket Holder Model
    """
    
    checkout_times = fields.Str(allow_none=True)  # ←
    is_checked_out = fields.Boolean()  # ←
    

After the schema change, the attendees API file had to have code to incorporate these new fields. The way it works is that when we receive an update request on the server, we add the current time in the checkout times CSV to indicate a checkout time, so the checkout times field is essentially read-only:

from datetime import datetime
...
class AttendeeDetail(ResourceDetail):
    def before_update_object(self, obj, data, kwargs):
        
        if 'is_checked_out' in data and data['is_checked_out']:
        ...
        else:
            if obj.checkout_times and data['checkout_times'] not in \
obj.checkout_times.split(","):
                data['checkout_times'] = '{},{},{}'.format(
                    obj.checkout_times,
                    data['checkout_times'],
                    datetime.utcnow())

 

This completes the implementation of checkout times, so now organizers can process attendee checkouts on the server with ease.

Resources

Migrating Event Ratings of Open Event with Stored Procedures

Many developers know about procedural languages and have used them in some form or another, but this is really an unpopular tool, despite its power. There are many advantages (and few disadvantages) of these languages, which we will learn about soon. Having a right amount of database-stored procedure code with the help of these languages can really enhance the speed and responsiveness of an application. This article will teach us how procedural languages can be utilized in database management and how they were used recently for a bug fix in Open Event Server.

PostgreSQL, like any other powerful, relational database management system (RDBMS), provides the functionality to create and use stored procedures. Essentially, a stored procedure is database logic code which is saved on the database server. This code can be executed directly in the database, and can (and is!) often used to shift business logic from the application layer of a software to the database layer. This simple shift often has many advantages – including faster execution (as code executes at a lower stack level) and better security. When firing database queries from the application layer (i.e., the code that programmers write for storing programmable objects, performing business logic and so on), it often happens that parameters from the programming language itself are passed in to SQL, which then generates a complete SQL query. For example, here’s how a novice query might look like:

import psycopg2
conn = psycopg2.connect(dbname="oevent", user="john", password="start")
cur = conn.cursor()
name = "Sam"
cur.execute("SELECT * FROM users WHERE name='%s'" % name)  # DANGEROUS!

This is an extremely “exposed” code that can be exploited for malicious access, with a technique called SQL injection. This technique essentially “injects” malicious code via these passed parameters, like the variable name mentioned in the above code. With having stored procedures for business logic, there is no room for SQL injection. They solve this problem by writing the query beforehand, and having the parameterized data as a different entity. The pre-processed query within the corresponding stored procedure now looks like

SELECT * FROM users WHERE name=?

 

The database driver sends the name of this stored procedure (or, in standard parameterised queries, just the query text itself) and a list of parameters, as distinct separate entities in the protocol. More details on how stored procedures enhance security can be found here.

After learning so much about the advantages of stored procedures (which are enabled by procedural languages), let’s write one! Postgres supports multiple languages for writing stored procedures; here we will use PL/pgSQL, which is the most popular choice for Postgres. This procedural language, inspired (heavily) by Oracle’s PL/SQL language, looks very similar to SQL. To use this procedural language, we have to first install it. In Postgres, procedural languages are installed per-database, not server-wide. We can use the popular Postgres client psql for this purpose, or simply the createlang command on the command line:

$ createlang plpgsql yourdb

 

Now let’s create a simple procedure that prints the corresponding grades for the event ratings. In the psql shell, type the following:

CREATE OR REPLACE FUNCTION grade(rating NUMERIC) RETURNS TEXT AS
$$
BEGIN
IF rating < 1 THEN
RETURN 'D';
ELSIF rating < 2 THEN
RETURN 'C';
ELSIF rating < 3 THEN
RETURN 'B';
ELSIF rating < 4 THEN
RETURN 'A';
ELSE
RETURN 'A+';
END IF;
END;
$$ STRICT LANGUAGE plpgsql IMMUTABLE;

 

The first line defines the function signature – its name, parameters’ type and return type. The line specifies which procedural language to use. The IMMUTABLE keyword specifies that the stored procedure – this function, cannot modify the database by itself. The STRICT part says that the function should always return null when any of the arguments are null. So, the function is not executed when there are null arguments, rather a null result is assumed automatically. Finally, everything within the $$ quotes is the logic of our function, in PL/pgSQL syntax. As we can see, the syntax is almost identical to that of plain SQL.

Let us now create some dummy data to test the above stored procedure. In the same psql shell, type the following:

CREATE TABLE ratings(id INTEGER, rating NUMERIC);
INSERT INTO ratings VALUES (1, 2.3), (2, 0.5), (3, 1.7), (4, 4.5), (5, 3.3);

 

The ratings table now looks as follows:

yourdb=# SELECT * FROM ratings;
id | rating
----+--------
1 |    2.3
2 |    0.5
3 |    1.7
4 |    4.5
5 |    3.3
(5 rows)

To test our stored procedure, we fire the following query:

SELECT id, grade(rating) FROM ratings;

The result is:

id | grade
----+-------
1 | B
2 | D
3 | C
4 | A+
5 | A
(5 rows)

Thus, our stored procedure works perfectly! This shows how you can write code for your application in the database itself, the code which would otherwise be present in the application layer. For writing more complex stored procedures, you can always consult the excellent PostgreSQL documentation on this topic!

PL/pgSQL turned out to be especially useful for a recent bug patch in Open Event Server. The rating attribute of event feedbacks was initially (and incorrectly) of String type. When converting it to Float type, I had migrated the schema, but the migration of existing data was nontrivial. Since the earlier ratings were of type String, we weren’t really sure whether the values in the rating column will all contain numbers. Furthermore, we also wanted to round those numbers to the “nearest 0.5”, so that ratings like 4.4 and 2.2 get converted to more conventional values like 4.5 and 2.0 respectively. To accomplish this, I had to add a stored procedure in the related migration file. Here’s how it looked:

class ReplaceableObject(object):
    def __init__(self, name, sqltext):
        self.name = name
        self.sqltext = sqltext

update_rating_func = ReplaceableObject(
    "update_rating(rating text)",
    """
    RETURNS text AS $$
    DECLARE
        r NUMERIC;
    BEGIN
        r = cast(rating as NUMERIC);
        IF r < 0 OR r > 5 THEN
            RETURN '0';
        END IF;
        r = round(r*2) / 2;
        RETURN cast(r as VARCHAR);
    EXCEPTION WHEN invalid_text_representation THEN
        RETURN '0';
    END;
    $$
    STRICT
    LANGUAGE plpgsql IMMUTABLE;
    """)

We created a simple class ReplaceableObject so that the name of the function is, sort of, a separate attribute from its definition, as required by the migration library alembic. Once the stored procedure was defined as above, the upgrade() function of the migration file was modified accordingly:

def upgrade():
    op.create_or_replace_sp(update_rating_func)
    op.execute("UPDATE feedback SET rating=update_rating(rating)")
    op.execute("DROP FUNCTION update_rating(text)")

    op.alter_column('feedback', 'rating',
        existing_type=sa.VARCHAR(),
        type_=sa.Float(),
        existing_nullable=False,
        postgresql_using='rating::double precision')

Notice how the UPDATE query calls our stored procedure update_rating() to make existing ratings of the form we desire. Once that is done, the procedure is no longer needed, so it is DROP-ed and then, finally, the type is changed to Float. So indeed, procedural languages are used for real!

Also, what’s more exciting is that one can write Postgres stored procedures in Python itself! Python is among the four procedural languages supported by PostgreSQL, the other two being Perl and Tcl. Here’s a simple Python example of the grading code we wrote earlier:

CREATE OR REPLACE FUNCTION grade(rating NUMERIC) RETURNS TEXT AS
$$
    if rating < 1:
        return 'D'
    elif rating < 2:
        return 'C'
    elif rating < 3:
        return 'B'
    elif rating < 4:
        return 'A'
    else
        return 'A+'
$$ STRICT LANGUAGE plpython3u IMMUTABLE;

This should work identical to the PL/pgSQL equivalent code above. But don’t forget to change the command for installing the procedural language, for Python it looks like:

$ createlang plpython3u yourdb

Needless to say, the 3 here stands for Python3.

Even with all this flexibility and power, one must note that procedural languages are not designed to replace programming languages. Procedural code can easily become difficult to maintain, and one often needs superuser privileges to create stored procedures. Moreover, portability is a mess with them. That is especially true when moving from one database system (like Postgres) to another (like Microsoft SQL Server). Therefore, for optimum application performance and code maintainability, ensure that there is a healthy balance between your business logic residing in stored procedures and the application layer.

This post covered a lot of information about code residing in the database and its use cases. If you want to know more about best practices involving separation of business logic, please check out this thread on StackExchange’s software engineering website.

Resources

Adding Statistics of Event-Type on Open Event Server

The Open Event Server enables organizers to manage events from concerts to conferences and meet-ups. It offers features for events with several tracks and venues. In this blog, we will talk about how to add statistics of event-type on Open Event Server. The focus is on to get number of events of a specific event type.

Number of events of a specific event type

Now, let’s try to understand this API. Here, we are using flask Blueprints to add this API to the API index.

  1. First of all, we are using the decorator of event_statistics which will append this API route with that of mentioned in the Blueprint event_statistics.
  2. We will just allow logged in user to access this API using JWT (JSON Web Token)
  3. To return the response having all the event types alongwith number of events of it, requires a lot of queries if tried to fulfilled by SQLALchemy ORM. So instead of using ORM we will query using SQL command so that we query the number of all the events of different event types in just one query, which will eventually reduces the time of server to return the response.
  4. In function event_types_count we are using db.engine.execute to run the SQL command of getting the statistics of events respective to event types.
  5. The response will include id of event_type, name of event_type and count of events of corresponding event_type.
  6. Finally, we jsonify the list having objects of statistics of events respective to event types.

Similarly, event topics statistics can be implemented to return the number of events of all the event topics.

Resources

DetachedInstanceError: Dealing with Celery, Flask’s app context and SQLAlchemy in the Open Event Server

In the open event server project, we had chosen to go with celery for async background tasks. From the official website,

What is celery?

Celery is an asynchronous task queue/job queue based on distributed message passing.

What are tasks?

The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing.

After the tasks had been set up, an error constantly came up whenever a task was called

The error was:

DetachedInstanceError: Instance <User at 0x7f358a4e9550> is not bound to a Session; attribute refresh operation cannot proceed

The above error usually occurs when you try to access the session object after it has been closed. It may have been closed by an explicit session.close() call or after committing the session with session.commit().

The celery tasks in question were performing some database operations. So the first thought was that maybe these operations might be causing the error. To test this theory, the celery task was changed to :

@celery.task(name='lorem.ipsum')
def lorem_ipsum():
    pass

But sadly, the error still remained. This proves that the celery task was just fine and the session was being closed whenever the celery task was called. The method in which the celery task was being called was of the following form:

def restore_session(session_id):
    session = DataGetter.get_session(session_id)
    session.deleted_at = None
    lorem_ipsum.delay()
    save_to_db(session, "Session restored from Trash")
    update_version(session.event_id, False, 'sessions_ver')


In our app, the app_context was not being passed whenever a celery task was initiated. Thus, the celery task, whenever called, closed the previous app_context eventually closing the session along with it. The solution to this error would be to follow the pattern as suggested on http://flask.pocoo.org/docs/0.12/patterns/celery/.

def make_celery(app):
    celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
    celery.conf.update(app.config)
    task_base = celery.Task

    class ContextTask(task_base):
        abstract = True

        def __call__(self, *args, **kwargs):
            if current_app.config['TESTING']:
                with app.test_request_context():
                    return task_base.__call__(self, *args, **kwargs)
            with app.app_context():
                return task_base.__call__(self, *args, **kwargs)

    celery.Task = ContextTask
    return celery

celery = make_celery(current_app)


The __call__ method ensures that celery task is provided with proper app context to work with.

 

Migrating FOSSASIA blog from Drupal to WordPress

Last week I migrated FOSSASIA’s blog from Drupal to WordPress and it was an amazing learning experience.

The steps one can use for migration are as follows:

Create a WordPress website:

In order to convert your drupal website to wordpress, you need to have a wordpress site where the data will be imported. By WordPress site, I mean a local installation where you can test whether the migration worked or not.

Truncate default posts/pages/comments:

Once you have your WP installation ready, truncate the default pages,comments etc from your wordpress database.

TRUNCATE TABLE wordpress.wp_comments;
TRUNCATE TABLE wordpress.wp_links;
TRUNCATE TABLE wordpress.wp_postmeta;
TRUNCATE TABLE wordpress.wp_posts;
TRUNCATE TABLE wordpress.wp_term_relationships;
TRUNCATE TABLE wordpress.wp_term_taxonomy;
TRUNCATE TABLE wordpress.wp_terms;
Selection_068
WordPress Database

Get hold of the Drupal mysql DB:

Import your Drupal DB to your local mysql installation where you have your WP database. Why? because you need to do a lot of “data transfer”!

Selection_069
Drupal Database

Execute a lot of scripts (Just kidding!):

There are some pretty useful online references which provide the required mysql scripts to migrate the data from Drupal to WordPress DB with proper formatting. Look here and here.

Depending on the kind of data you have you might need to do some modifications. e.g. depending on whether you have tags or categories/sub-categories in your data, you might have to modify the following command to suite your needs.

INSERT INTO wordpress.wp_terms (term_id, name, slug, 
term_group)
SELECT
	d.tid,
	d.name,
	REPLACE(LOWER(d.name), ' ', '-'),
	0
FROM drupal.taxonomy_term_data d
INNER JOIN drupal.taxonomy_term_hierarchy h USING(tid);

Recheck if entire data has been imported correctly:

Once you execute the scripts error free. Check if you imported the DB data (users/taxonomies/posts) correctly. Since WP and Drupal store passwords differently, you would have to ask your users/authors/admins to change their passwords on the migrated blog. We are almost there!! (not quite).

Transfer media files to WP and map them to Media:

You would have to transfer your media (pics, videos, attachments etc) to your WordPress installation from Drupal site. Selection_066

Put them under wp-content/uploads/old or any other suitable directory name under wp-content/uploads/.

Once you are done with it! In order to add the files to “Media” under Admin Panel, you can use plugins like Add from Server which map your files to folder “Media” expects your files to be in.

Change the permalinks (optional):

Depending on default permalinks of your Drupal blog, you might have to change the permalink format.

To do that, go to <Your_WP_Site>/wp-admin/options-permalink.php

You can change the permalink structure from one of the many options you are provided. Selection_067

Add themes as you may. Upload your WordPress site online. And we are done!!

The new face of blog.fossasia.org looks like this! Selection_070