FOSSASIA‘s Open Event Server uses alembic migration files to handle all database operations and updating. Whenever the database is changed a corresponding migration python script is made so that the database will migrate accordingly for other developers as well. But often we forget that the automatically generated script usually just add/deletes columns or alters the column properties. It does not handle the migration of existing data in that column. This can lead to huge data loss or error in migration as well.
For example :
def upgrade(): # ### commands auto generated by Alembic - please adjust! ### op.alter_column('ticket_holders', 'lastname', existing_type=sa.VARCHAR(), nullable=False) # ### end Alembic commands ###
Here, the goal was to change the column “ticket_holders” from nullable to not nullable. The script that alembic autogenerated just uses op.alter_column().
It does not count for the already existing data. So, if the column has any entries which are null, this migration will lead to an error saying that the column contains null entries and hence cannot be “NOT NULL”.
How to Handle This?
Before altering the column definition we can follow the following steps :
- Look for all the null entries in the column
- Give some arbitrary default value to those
- Now we can safely alter the column definition
Let’s see how we can achieve this. For connecting with the database we will use SQLAlchemy. First, we get a reference to the table and the corresponding column that we wish to alter.
ticket_holders_table = sa.sql.table('ticket_holders', sa.Column('lastname', sa.VARCHAR()))
Since we need the “last_name” column from the table “ticket_holders”, we specify it in the method argument.
Now, we will give an arbitrary default value to all the originally null entries in the column. In this case, I chose to use a space character.
op.execute(ticket_holders_table.update() .where(ticket_holders_table.c.lastname.is_(None)) .values({'lastname': op.inline_literal(' ')}))
op.execute() can execute direct SQL commands as well but we chose to go with SQLAlchemy which builds an optimal SQL command from our modular input. One such example of a complex SQL command being directly executed is :
op.execute('INSERT INTO event_types(name, slug) SELECT DISTINCT event_type_id, lower(replace(regexp_replace(event_type_id, \'& |,\', \'\', \'g\'), \' \', \'-\')) FROM events where not exists (SELECT 1 FROM event_types where event_types.name=events.event_type_id) and event_type_id is not null;'))
Now that we have handled all the null data, it is safe to alter the column definition. So we proceed to execute the final statement –
op.alter_column('ticket_holders', 'lastname', existing_type=sa.VARCHAR(), nullable=False)
Now the entire migration script will run without any error. The final outcome would be –
- All the null “last_name” entries would be replaced by a space character
- The “last_name” column would now be a NOT NULL column.