Patching an Attribute Type Across a Flask App

Recently, it was discovered by a contributor that the rating attribute for event feedbacks in Open Event was of type String. The type was incorrect, indeed. After a discussion, developers came concluded that it should be of type Float. In this post, I explain how to perform this simple migration task of changing a data type across a typical Flask app’s stack.

To begin this change, we first, we modify the database model. The model file for feedbacks (feedback.py) looks like the following:

from app.models import db


class Feedback(db.Model):
"""Feedback model class"""
__tablename__ = 'feedback'
id = db.Column(db.Integer, primary_key=True)
rating = db.Column(db.String, nullable=False)  # ←-- should be float
comment = db.Column(db.String, nullable=True)
user_id = db.Column(db.Integer,
db.ForeignKey('users.id', ondelete='CASCADE'))
event_id = db.Column(db.Integer,
db.ForeignKey('events.id', ondelete='CASCADE'))

def __init__(self, rating=None, comment=None, event_id=None,                                        user_id=None):
self.rating = rating  # ←-- cast here for safety
self.comment = comment
self.event_id = event_id
self.user_id = user_id





The change here is quite straightforward, and spans just 2 lines:

rating = db.Column(db.Float, nullable=False)

and

self.rating = float(rating)

We now perform the database migration using a couple of manage.py commands on the terminal. This file is different for different projects, but the migration commands essentially look the same. For Open Event Server, the manage.py file is at the root of the project directory (as is conventional). After cd’ing to the root, we execute the following commands:

$ python manage.py db migrate

and then

$ python manage.py db upgrade

These commands update our Open Event database so that the rating is now stored as a Float. However, if we execute these commands one after the other, we note that an exception is thrown:

sqlalchemy.exc.ProgrammingError: column "rating" cannot be cast automatically to type float
HINT:  Specify a USING expression to perform the conversion.
'ALTER TABLE feedback ALTER COLUMN rating TYPE FLOAT USING rating::double precision'

This happens because the migration code is ambiguous about what precision to use after converting to type float. It hints us to utilize the USING clause of PostgreSQL to do that. We accomplish this manually by using the psql client to connect to our database and command it the type change:

$ psql oevent
psql (10.1)
Type "help" for help.

oevent=# ALTER TABLE feedback ALTER COLUMN rating TYPE FLOAT USING rating::double precision

We now exit the psql shell and run the above migration commands again. We see that the migration commands pass successfully this time, and a migration file is generated. For our migration, the file looks like the following:

from alembic import op
import sqlalchemy as sa


# These values would be different for your migrations.
revision = '194a5a2a44ef'

down_revision = '4cac94c86047'


def upgrade():
op.alter_column('feedback', 'rating',
existing_type=sa.VARCHAR(),
type_=sa.Float(),
existing_nullable=False)


def downgrade():
op.alter_column('feedback', 'rating',
existing_type=sa.Float(),
type_=sa.VARCHAR(),
existing_nullable=False)

This is an auto-generated file (built by the database migration tool Alembic) and we need to specify the extra commands we used while migrating our database. Since we did use an extra command to specify the precision, we need to add it here. PostgreSQL USING clause can be added to alembic migration files via the postgresql_using keyword argument. Thus, the edited version of the upgrade function looks like the following:

def upgrade():
op.alter_column('feedback', 'rating',
existing_type=sa.VARCHAR(),
type_=sa.Float(),
existing_nullable=False,
postgresql_using='rating::double precision')

This completes our work on database migration. Migration files are useful for a variety of purposes – they allow us to easily get to a previous database state, or a new database state as suggested by a project collaborator. Just like git, they allow for easy version control and collaboration.

We didn’t finish this work after database migration. We also decided to impose limits on the rating value. We concluded that 0-5 would be a good range for rating. Furthermore, we also decided to round off the rating value to the “nearest 0.5”, so if the input rating is 2.3, it becomes 2.5. Also, if it is 4.1, it becomes 4.0. This was decided because such values are conventional for ratings across numerous web and mobile apps. So this will hopefully enable easier adoption for new users.

For the validation part, marshmallow came to rescue. It is a simple object serialization and deserialization tool for Python. So it basically allows to convert complex Python objects to JSON data for communicating over HTTP and vice-versa, among other things. It also facilitates pre-processing input data and therefore, allows clean validation of payloads. In our case, marshmallow was specifically used to validate the range of the rating attribute of feedbacks. The original feedbacks schema file looked like the following:

from marshmallow_jsonapi import fields
from marshmallow_jsonapi.flask import Schema, Relationship

from app.api.helpers.utilities import dasherize


class FeedbackSchema(Schema):
"""
Api schema for Feedback Model
"""
class Meta:
"""
Meta class for Feedback Api Schema
"""
type_ = 'feedback'
self_view = 'v1.feedback_detail'
self_view_kwargs = {'id': '<id>'}
inflect = dasherize

id = fields.Str(dump_only=True)
rating = fields.Str(required=True)  # ← need to validate this
comment = fields.Str(required=False)
event = Relationship(attribute='event',
self_view='v1.feedback_event',
self_view_kwargs={'id': '<id>'},
related_view='v1.event_detail',
related_view_kwargs={'feedback_id': '<id>'},
schema='EventSchemaPublic',
type_='event')




To validate the rating attribute, we use marshmallow’s Range class:

from marshmallow.validate import Range

Now we change the line

rating = fields.Str(required=True)

to

rating = fields.Float(required=True, validate=Range(min=0, max=5))

So with marshmallow, just about 2 lines of work implements rating validation for us!

After the validation part, what’s remaining is the rounding-off business logic. This is simple mathematics, and for getting to the “nearest 0.5” number, the formula goes as follows:

rating * 2 --> round off --> divide the result by 2

We will use Python’s built-in function (BIF) to accomplish this. To implement the business logic, we go back to the feedback model class and modify its constructor. Before this type change, the constructor looked like the following:

def __init__(self, rating=None, comment=None, event_id=None, user_id=None):
self.rating = rating
self.comment = comment
self.event_id = event_id
self.user_id = user_id

We change this by first converting the input rating to float, rounding it off and then finally assigning the result to feedback’s rating attribute. The new constructor is shown below:

def __init__(self, rating=None, comment=None, event_id=None, user_id=None):
rating = float(rating)
self.rating = round(rating*2, 0) / 2  # Rounds to nearest 0.5

self.comment = comment
self.event_id = event_id
self.user_id = user_id

This completes the rounding-off part and ultimately concludes rating’s type change from String to Float. We saw how a simple high-level type change requires editing code across multiple files and the use of different tools in between. In doing so, we thus also learned the utility of alembic and marshmallow in database migration and data validation, respectively.


Resources

Continue ReadingPatching an Attribute Type Across a Flask App

Open Event Server – Change a Column from NULL to NOT NULL

FOSSASIA‘s Open Event Server uses alembic migration files to handle all database operations and updating. Whenever the database is changed a corresponding migration python script is made so that the database will migrate accordingly for other developers as well. But often we forget that the automatically generated script usually just add/deletes columns or alters the column properties. It does not handle the migration of existing data in that column. This can lead to huge data loss or error in migration as well.

For example :

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.alter_column('ticket_holders', 'lastname',
                    existing_type=sa.VARCHAR(),
                    nullable=False)
    # ### end Alembic commands ###

Here, the goal was to change the column “ticket_holders” from nullable to not nullable. The script that alembic autogenerated just uses op.alter_column().

It does not count for the already existing data. So, if the column has any entries which are null, this migration will lead to an error saying that the column contains null entries and hence cannot be “NOT NULL”.

How to Handle This?

Before altering the column definition we can follow the following steps :

  1. Look for all the null entries in the column
  2. Give some arbitrary default value to those
  3. Now we can safely alter the column definition

Let’s see how we can achieve this. For connecting with the database we will use SQLAlchemy. First, we get a reference to the table and the corresponding column that we wish to alter.

ticket_holders_table = sa.sql.table('ticket_holders',
                                        sa.Column('lastname', sa.VARCHAR()))

 

Since we need the “last_name” column from the table “ticket_holders”, we specify it in the method argument.

Now, we will give an arbitrary default value to all the originally null entries in the column. In this case, I chose to use a space character.

op.execute(ticket_holders_table.update()
               .where(ticket_holders_table.c.lastname.is_(None))
               .values({'lastname': op.inline_literal(' ')}))

op.execute() can execute direct SQL commands as well but we chose to go with SQLAlchemy which builds an optimal SQL command from our modular input. One such example of a complex SQL command being directly executed is :

op.execute('INSERT INTO event_types(name, slug) SELECT DISTINCT event_type_id, lower(replace(regexp_replace(event_type_id, \'& |,\', \'\', \'g\'), \' \', \'-\')) FROM events where not exists (SELECT 1 FROM event_types where event_types.name=events.event_type_id) and event_type_id is not null;'))

Now that we have handled all the null data, it is safe to alter the column definition. So we proceed to execute the final statement –

op.alter_column('ticket_holders', 'lastname',
                    existing_type=sa.VARCHAR(),
                    nullable=False)

Now the entire migration script will run without any error. The final outcome would be –

  1. All the null “last_name” entries would be replaced by a space character
  2. The “last_name” column would now be a NOT NULL column.

References

Continue ReadingOpen Event Server – Change a Column from NULL to NOT NULL

Open Event Server: Working with Migration Files

FOSSASIA‘s Open Event Server uses alembic migration files to handle all database operations and updations.  From creating tables to updating tables and database, all works with help of the migration files.
However, many a times we tend to miss out that automatically generated migration files mainly drops and adds columns rather than just changing them. One example of this would be:

def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.add_column('session', sa.Column('submission_date', sa.DateTime(), nullable=True))
    op.drop_column('session', 'date_of_submission')

Here, the idea was to change the has_session_speakers(string) to is_session_speakers_enabled (boolean), which resulted in the whole dropping of the column and creation of a new boolean column. We realize that, on doing so we have the whole data under  has_session_speakers lost.

How to solve that? Here are two ways we can follow up:

  • op.alter_column:
    ———————————-

When update is as simple as changing the column names, then we can use this. As discussed above, usually if we migrate directly after changing a column in our model, then the automatic migration created would drop the old column and create a new column with the changes. But on doing this in the production will cause huge loss of data which we don’t want. Suppose we want to just change the name of the column of start_time to starts_at. We don’t want the entire column to be dropped. So an alternative to this is using op.alter_column. The two main necessary parameters of the op.alter_column is the table name and the column which you are willing to alter. The other parameters include the new changes. Some of the commonly used parameters are:

  1. nullable Optional: specify True or False to alter the column’s nullability.
  2. new_column_name – Optional; specify a string name here to indicate the new name within a column rename operation.
  3. type_Optional: a TypeEngine type object to specify a change to the column’s type. For SQLAlchemy types that also indicate a constraint (i.e. Boolean, Enum), the constraint is also generated.
  4. autoincrement –  Optional: set the AUTO_INCREMENT flag of the column; currently understood by the MySQL dialect.
  5. existing_typeOptional: a TypeEngine type object to specify the previous type. This is required for all column alter operations that don’t otherwise specify a new type, as well as for when nullability is being changed on a column.

    So, for example, if you want to change a column name from “start_time” to “starts_at” in events table you would write:
    op.alter_column(‘events’, ‘start_time’, new_column_name=’starts_at’)
def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.alter_column('sessions_version', 'end_time', new_column_name='ends_at')
    op.alter_column('sessions_version', 'start_time', new_column_name='starts_at')
    op.alter_column('events_version', 'end_time', new_column_name='ends_at')
    op.alter_column('events_version', 'start_time', new_column_name='starts_at')


Here,
session_version and events_version are the tables name altering columns start_time to starts_at and end_time to ends_at with the op_alter_column parameter new_column_name.

  • op.execute:
    ——————–

Now with alter_column, most of the alteration in the column name or constraints or types is achievable. But there can be a separate scenario for changing the column properties. Suppose I change a table with column “aspect_ratio” which was a string column and had values “on” and “off” and want to convert the type to Boolean True/False. Just changing the column type using alte_column() function won’t work since we need to also modify the whole data. So, sometimes we need to execute raw SQL commands. To do that, we can use the op.execute() function.
The way it is done:

def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.execute("ALTER TABLE image_sizes ALTER full_aspect TYPE boolean USING CASE 
            full_aspect WHEN 'on' THEN TRUE ELSE FALSE END", execution_options=None)

    op.execute("ALTER TABLE image_sizes ALTER icon_aspect TYPE boolean USING CASE 
            icon_aspect WHEN 'on' THEN TRUE ELSE FALSE END", execution_options=None)

    op.execute("ALTER TABLE image_sizes ALTER thumbnail_aspect TYPE boolean USING CASE 
            thumbnail_aspect WHEN 'on' THEN TRUE ELSE FALSE END"execution_options=None)

For a little more advanced use of op.execute() command will be:

op.alter_column('events', 'type', new_column_name='event_type_id')
    op.alter_column('events_version', 'type', new_column_name='event_type_id')
    op.execute('INSERT INTO event_types(name, slug) SELECT DISTINCT event_type_id, 
                lower(replace(regexp_replace(event_type_id, \'& |,\', \'\', \'g\'),
                \' \', \'-\')) FROM events where not exists (SELECT 1 FROM event_types 
                where event_types.name=events.event_type_id) and event_type_id is not
                null;')
    op.execute('UPDATE events SET event_type_id = (SELECT id FROM event_types WHERE 
                event_types.name=events.event_type_id)')
    op.execute('ALTER TABLE events ALTER COLUMN event_type_id TYPE integer USING 
                event_type_id::integer')

In this example:

  • op.alter_column() renames the column type to event_type_id of events table
  • op.execute() does the following:
  • Inserts into column name of event_types table the value of event_type_idN (which previously contained the name of the event_type) from events table, and
  • Inserts into slug column of event_types table the value of event_type_id where all letters are changed to lowercase; “& ” and “,” to “”; and spaces to “-”.
    1. Checks whether a type with that name already exists so as to disallow any duplicate entries in the event_types table.
    2. Checks whether the event_type_id is null because name of event_types table cannot be null.

You can learn more on Alembic migrations here: http://alembic.zzzcomputing.com/en/latest/ops.html

Continue ReadingOpen Event Server: Working with Migration Files

Challenges with Migrations

So I am sitting in front of my laptop one day working on some issue. I remember it to be adding a database table. Since this is the first time I am actually creating a table I didn’t know how much this issue is going to haunt me! So I created a table SessionType just like other previously created tables and I make a PR happy to solve an issue. But just when I thought everything was going smooth this error comes up…

“Table session_type does not exist”

I understand that you have to again create the database so I run python create_db.py. But still I get the error. I search about it and find that every time I make changes to the database tables I have to update and generate a new migration script. Thus I run python manage.py db migrate and I get THE error:

“Cant locate revision identified by *some random revision number* ”

I literally went mad trying to solve this issue. Moreover whenever I would switch branches and the database tables were different I would get this error. Finally I found out a way to solve all the migration related problems (most of them :p). I added a script to drop the entire database.Since right now we are in the development stage it doesnt matter if the data in the database is deleted. So you run :

  1. python drop_db.py
  2. python create_db.py
  3. Use super_admin password to create the database
  4. But No! Migration is a little mean bastard!It wont leave you like that. You have to make the current revision and the head equal. Thus if your current is referring to some other revision ID then stamp the database table with the head.
  5. Run python manage.py db stamp head
  6. Now run python manage.py db migrate

 

Finally! The issue is solved. 🙂

 

 

Continue ReadingChallenges with Migrations