Migrating Event Ratings of Open Event with Stored Procedures
Many developers know about procedural languages and have used them in some form or another, but this is really an unpopular tool, despite its power. There are many advantages (and few disadvantages) of these languages, which we will learn about soon. Having a right amount of database-stored procedure code with the help of these languages can really enhance the speed and responsiveness of an application. This article will teach us how procedural languages can be utilized in database management and how they were used recently for a bug fix in Open Event Server.
PostgreSQL, like any other powerful, relational database management system (RDBMS), provides the functionality to create and use stored procedures. Essentially, a stored procedure is database logic code which is saved on the database server. This code can be executed directly in the database, and can (and is!) often used to shift business logic from the application layer of a software to the database layer. This simple shift often has many advantages – including faster execution (as code executes at a lower stack level) and better security. When firing database queries from the application layer (i.e., the code that programmers write for storing programmable objects, performing business logic and so on), it often happens that parameters from the programming language itself are passed in to SQL, which then generates a complete SQL query. For example, here’s how a novice query might look like:
import psycopg2 conn = psycopg2.connect(dbname="oevent", user="john", password="start") cur = conn.cursor() name = "Sam" cur.execute("SELECT * FROM users WHERE name='%s'" % name) # DANGEROUS!
This is an extremely “exposed” code that can be exploited for malicious access, with a technique called SQL injection. This technique essentially “injects” malicious code via these passed parameters, like the variable name mentioned in the above code. With having stored procedures for business logic, there is no room for SQL injection. They solve this problem by writing the query beforehand, and having the parameterized data as a different entity. The pre-processed query within the corresponding stored procedure now looks like
SELECT * FROM users WHERE name=?
The database driver sends the name of this stored procedure (or, in standard parameterised queries, just the query text itself) and a list of parameters, as distinct separate entities in the protocol. More details on how stored procedures enhance security can be found here.
After learning so much about the advantages of stored procedures (which are enabled by procedural languages), let’s write one! Postgres supports multiple languages for writing stored procedures; here we will use PL/pgSQL, which is the most popular choice for Postgres. This procedural language, inspired (heavily) by Oracle’s PL/SQL language, looks very similar to SQL. To use this procedural language, we have to first install it. In Postgres, procedural languages are installed per-database, not server-wide. We can use the popular Postgres client psql for this purpose, or simply the createlang command on the command line:
$ createlang plpgsql yourdb
Now let’s create a simple procedure that prints the corresponding grades for the event ratings. In the psql shell, type the following:
CREATE OR REPLACE FUNCTION grade(rating NUMERIC) RETURNS TEXT AS $$ BEGIN IF rating < 1 THEN RETURN 'D'; ELSIF rating < 2 THEN RETURN 'C'; ELSIF rating < 3 THEN RETURN 'B'; ELSIF rating < 4 THEN RETURN 'A'; ELSE RETURN 'A+'; END IF; END; $$ STRICT LANGUAGE plpgsql IMMUTABLE;
The first line defines the function signature – its name, parameters’ type and return type. The line specifies which procedural language to use. The IMMUTABLE keyword specifies that the stored procedure – this function, cannot modify the database by itself. The STRICT part says that the function should always return null when any of the arguments are null. So, the function is not executed when there are null arguments, rather a null result is assumed automatically. Finally, everything within the $$ quotes is the logic of our function, in PL/pgSQL syntax. As we can see, the syntax is almost identical to that of plain SQL.
Let us now create some dummy data to test the above stored procedure. In the same psql shell, type the following:
CREATE TABLE ratings(id INTEGER, rating NUMERIC); INSERT INTO ratings VALUES (1, 2.3), (2, 0.5), (3, 1.7), (4, 4.5), (5, 3.3);
The ratings table now looks as follows:
yourdb=# SELECT * FROM ratings; id | rating ----+-------- 1 | 2.3 2 | 0.5 3 | 1.7 4 | 4.5 5 | 3.3 (5 rows)
To test our stored procedure, we fire the following query:
SELECT id, grade(rating) FROM ratings;
The result is:
id | grade ----+------- 1 | B 2 | D 3 | C 4 | A+ 5 | A (5 rows)
Thus, our stored procedure works perfectly! This shows how you can write code for your application in the database itself, the code which would otherwise be present in the application layer. For writing more complex stored procedures, you can always consult the excellent PostgreSQL documentation on this topic!
PL/pgSQL turned out to be especially useful for a recent bug patch in Open Event Server. The rating attribute of event feedbacks was initially (and incorrectly) of String type. When converting it to Float type, I had migrated the schema, but the migration of existing data was nontrivial. Since the earlier ratings were of type String, we weren’t really sure whether the values in the rating column will all contain numbers. Furthermore, we also wanted to round those numbers to the “nearest 0.5”, so that ratings like 4.4 and 2.2 get converted to more conventional values like 4.5 and 2.0 respectively. To accomplish this, I had to add a stored procedure in the related migration file. Here’s how it looked:
class ReplaceableObject(object): def __init__(self, name, sqltext): self.name = name self.sqltext = sqltext update_rating_func = ReplaceableObject( "update_rating(rating text)", """ RETURNS text AS $$ DECLARE r NUMERIC; BEGIN r = cast(rating as NUMERIC); IF r < 0 OR r > 5 THEN RETURN '0'; END IF; r = round(r*2) / 2; RETURN cast(r as VARCHAR); EXCEPTION WHEN invalid_text_representation THEN RETURN '0'; END; $$ STRICT LANGUAGE plpgsql IMMUTABLE; """)
We created a simple class ReplaceableObject so that the name of the function is, sort of, a separate attribute from its definition, as required by the migration library alembic. Once the stored procedure was defined as above, the upgrade() function of the migration file was modified accordingly:
def upgrade(): op.create_or_replace_sp(update_rating_func) op.execute("UPDATE feedback SET rating=update_rating(rating)") op.execute("DROP FUNCTION update_rating(text)") op.alter_column('feedback', 'rating', existing_type=sa.VARCHAR(), type_=sa.Float(), existing_nullable=False, postgresql_using='rating::double precision')
Notice how the UPDATE query calls our stored procedure update_rating() to make existing ratings of the form we desire. Once that is done, the procedure is no longer needed, so it is DROP-ed and then, finally, the type is changed to Float. So indeed, procedural languages are used for real!
Also, what’s more exciting is that one can write Postgres stored procedures in Python itself! Python is among the four procedural languages supported by PostgreSQL, the other two being Perl and Tcl. Here’s a simple Python example of the grading code we wrote earlier:
CREATE OR REPLACE FUNCTION grade(rating NUMERIC) RETURNS TEXT AS $$ if rating < 1: return 'D' elif rating < 2: return 'C' elif rating < 3: return 'B' elif rating < 4: return 'A' else return 'A+' $$ STRICT LANGUAGE plpython3u IMMUTABLE;
This should work identical to the PL/pgSQL equivalent code above. But don’t forget to change the command for installing the procedural language, for Python it looks like:
$ createlang plpython3u yourdb
Needless to say, the 3 here stands for Python3.
Even with all this flexibility and power, one must note that procedural languages are not designed to replace programming languages. Procedural code can easily become difficult to maintain, and one often needs superuser privileges to create stored procedures. Moreover, portability is a mess with them. That is especially true when moving from one database system (like Postgres) to another (like Microsoft SQL Server). Therefore, for optimum application performance and code maintainability, ensure that there is a healthy balance between your business logic residing in stored procedures and the application layer.
This post covered a lot of information about code residing in the database and its use cases. If you want to know more about best practices involving separation of business logic, please check out this thread on StackExchange’s software engineering website.