postgres – blog.fossasia.org

Adding Tickets Relationship with Discount Codes in Open Event Server

Post author:schedutron
Post published:August 21, 2018
Post category:API Event FOSSASIA Open Event
Post comments:0 Comments

Recently (as of this writing), it was discovered that the relationship between discount codes and tickets was not implemented yet in Open Event Server. It turns out that the server has two types of discount codes – discount codes for entire events and discount codes for individual tickets of a specific event. More information on how discount code themselves are implemented in the server can be found in this blog post from 2017 – Discount Codes in Open Event Server.

So, for implementing the relationship of discount codes with tickets, it was decided to be present only for discount codes that have the DiscountCodeSchemaTicket schema, since those are the discount codes that are used for individual tickets. As a first step, the `tickets` attribute of the discount code model was removed, as it was redundant. The already implemented used_for attribute did the same job, and with better validation. At the same time, discount code was added as an attribute.

In the ticket model file:

discount_code_id = db.Column(db.Integer, db.ForeignKey('discount_codes.id', ondelete='CASCADE'))
discount_code = db.relationship('DiscountCode', backref="tickets")

Also, in the __init__ constructor:

def __init__(self, ..., discount_code_id=None, ...):
    ...
    ...
    self.discount_code_id = discount_code_id

After that, we added a discount_code_id field in the ticket schema file:

discount_code_id = fields.Integer(allow_none=True)

In this file, we also removed the redundant tickets field.

Now, we migrated the Open Event Server database via the following commands:

$ python manage.py db migrate

then

$ python manage.py db upgrade

Next, in the discount code schema file, we added the tickets relationship. Note that this is a one-to-many relationship. One discount code (for tickets) can be mapped to many tickets. Here is the code for that relationship, in the discount code schema file, under the DiscountCodeSchemaTicket class:

tickets = Relationship(attribute='tickets',
self_view='v1.discount_code_tickets',
self_view_kwargs={'id': '<id>'},
related_view='v1.ticket_list',
related_view_kwargs={'discount_code_id': '<id>'},
schema='TicketSchemaPublic',
many=True,
type_='ticket')

For this, we, of course, imported the TicketSchemaPublic in this file first. After that, we created a DiscountCodeTicketRelationship class in the discount codes API file:

class DiscountCodeTicketRelationship(ResourceRelationship):
    """
    DiscountCode Ticket Relationship
    """
    decorators = (jwt_required,)
    methods = ['GET', 'PATCH']
    schema = DiscountCodeSchemaTicket
    data_layer = {'session': db.session, 'model': DiscountCode}

The next step was to add the query code to fetch the tickets related to a particular discount code from the database. For this, we added the following snippet to the query() method of the TicketList class in the tickets API file:

if view_kwargs.get('discount_code_id'):
    discount_code = safe_query(self, DiscountCode, 'id', view_kwargs['discount_code_id'], 'discount_code_id')
    # discount_code - ticket :: one-to-many relationship
    query_ = self.session.query(Ticket).filter_by(discount_code_id=discount_code.id)

The only thing that remains now is adding the API routes for this relationship. We do that in the project’s __init__.py file:

api.route(TicketList, 'ticket_list', '/events/<int:event_id>/tickets',
'/events/<event_identifier>/tickets', '/ticket-tags/<int:ticket_tag_id>/tickets',
'/access-codes/<int:access_code_id>/tickets', '/orders/<order_identifier>/tickets',
'/discount-codes/<int:discount_code_id>/tickets')

…

api.route(DiscountCodeTicketRelationship, 'discount_code_tickets',
'/discount-codes/<int:id>/relationships/tickets')

Many routes already map to TicketList, we added one for that comes from discount codes API. Now we can use Postman to check this relationship, and it indeed works as expected, as seen below!

Here’s the end:

References:

Migrating Event Ratings of Open Event with Stored Procedures

Post author:schedutron
Post published:August 21, 2018
Post category:API Event Management FOSSASIA GSoC Open Event
Post comments:0 Comments

Many developers know about procedural languages and have used them in some form or another, but this is really an unpopular tool, despite its power. There are many advantages (and few disadvantages) of these languages, which we will learn about soon. Having a right amount of database-stored procedure code with the help of these languages can really enhance the speed and responsiveness of an application. This article will teach us how procedural languages can be utilized in database management and how they were used recently for a bug fix in Open Event Server.

PostgreSQL, like any other powerful, relational database management system (RDBMS), provides the functionality to create and use stored procedures. Essentially, a stored procedure is database logic code which is saved on the database server. This code can be executed directly in the database, and can (and is!) often used to shift business logic from the application layer of a software to the database layer. This simple shift often has many advantages – including faster execution (as code executes at a lower stack level) and better security. When firing database queries from the application layer (i.e., the code that programmers write for storing programmable objects, performing business logic and so on), it often happens that parameters from the programming language itself are passed in to SQL, which then generates a complete SQL query. For example, here’s how a novice query might look like:

import psycopg2
conn = psycopg2.connect(dbname="oevent", user="john", password="start")
cur = conn.cursor()
name = "Sam"
cur.execute("SELECT * FROM users WHERE name='%s'" % name)  # DANGEROUS!

This is an extremely “exposed” code that can be exploited for malicious access, with a technique called SQL injection. This technique essentially “injects” malicious code via these passed parameters, like the variable name mentioned in the above code. With having stored procedures for business logic, there is no room for SQL injection. They solve this problem by writing the query beforehand, and having the parameterized data as a different entity. The pre-processed query within the corresponding stored procedure now looks like

SELECT * FROM users WHERE name=?

The database driver sends the name of this stored procedure (or, in standard parameterised queries, just the query text itself) and a list of parameters, as distinct separate entities in the protocol. More details on how stored procedures enhance security can be found here.

After learning so much about the advantages of stored procedures (which are enabled by procedural languages), let’s write one! Postgres supports multiple languages for writing stored procedures; here we will use PL/pgSQL, which is the most popular choice for Postgres. This procedural language, inspired (heavily) by Oracle’s PL/SQL language, looks very similar to SQL. To use this procedural language, we have to first install it. In Postgres, procedural languages are installed per-database, not server-wide. We can use the popular Postgres client psql for this purpose, or simply the createlang command on the command line:

$ createlang plpgsql yourdb

Now let’s create a simple procedure that prints the corresponding grades for the event ratings. In the psql shell, type the following:

CREATE OR REPLACE FUNCTION grade(rating NUMERIC) RETURNS TEXT AS
$$
BEGIN
IF rating < 1 THEN
RETURN 'D';
ELSIF rating < 2 THEN
RETURN 'C';
ELSIF rating < 3 THEN
RETURN 'B';
ELSIF rating < 4 THEN
RETURN 'A';
ELSE
RETURN 'A+';
END IF;
END;
$$ STRICT LANGUAGE plpgsql IMMUTABLE;

The first line defines the function signature – its name, parameters’ type and return type. The line specifies which procedural language to use. The IMMUTABLE keyword specifies that the stored procedure – this function, cannot modify the database by itself. The STRICT part says that the function should always return null when any of the arguments are null. So, the function is not executed when there are null arguments, rather a null result is assumed automatically. Finally, everything within the $$ quotes is the logic of our function, in PL/pgSQL syntax. As we can see, the syntax is almost identical to that of plain SQL.

Let us now create some dummy data to test the above stored procedure. In the same psql shell, type the following:

CREATE TABLE ratings(id INTEGER, rating NUMERIC);
INSERT INTO ratings VALUES (1, 2.3), (2, 0.5), (3, 1.7), (4, 4.5), (5, 3.3);

The ratings table now looks as follows:

yourdb=# SELECT * FROM ratings;
id | rating
----+--------
1 |    2.3
2 |    0.5
3 |    1.7
4 |    4.5
5 |    3.3
(5 rows)

To test our stored procedure, we fire the following query:

SELECT id, grade(rating) FROM ratings;

The result is:

id | grade
----+-------
1 | B
2 | D
3 | C
4 | A+
5 | A
(5 rows)

Thus, our stored procedure works perfectly! This shows how you can write code for your application in the database itself, the code which would otherwise be present in the application layer. For writing more complex stored procedures, you can always consult the excellent PostgreSQL documentation on this topic!

PL/pgSQL turned out to be especially useful for a recent bug patch in Open Event Server. The rating attribute of event feedbacks was initially (and incorrectly) of String type. When converting it to Float type, I had migrated the schema, but the migration of existing data was nontrivial. Since the earlier ratings were of type String, we weren’t really sure whether the values in the rating column will all contain numbers. Furthermore, we also wanted to round those numbers to the “nearest 0.5”, so that ratings like 4.4 and 2.2 get converted to more conventional values like 4.5 and 2.0 respectively. To accomplish this, I had to add a stored procedure in the related migration file. Here’s how it looked:

class ReplaceableObject(object):
    def __init__(self, name, sqltext):
        self.name = name
        self.sqltext = sqltext

update_rating_func = ReplaceableObject(
    "update_rating(rating text)",
    """
    RETURNS text AS $$
    DECLARE
        r NUMERIC;
    BEGIN
        r = cast(rating as NUMERIC);
        IF r < 0 OR r > 5 THEN
            RETURN '0';
        END IF;
        r = round(r*2) / 2;
        RETURN cast(r as VARCHAR);
    EXCEPTION WHEN invalid_text_representation THEN
        RETURN '0';
    END;
    $$
    STRICT
    LANGUAGE plpgsql IMMUTABLE;
    """)

We created a simple class ReplaceableObject so that the name of the function is, sort of, a separate attribute from its definition, as required by the migration library alembic. Once the stored procedure was defined as above, the upgrade() function of the migration file was modified accordingly:

def upgrade():
    op.create_or_replace_sp(update_rating_func)
    op.execute("UPDATE feedback SET rating=update_rating(rating)")
    op.execute("DROP FUNCTION update_rating(text)")

    op.alter_column('feedback', 'rating',
        existing_type=sa.VARCHAR(),
        type_=sa.Float(),
        existing_nullable=False,
        postgresql_using='rating::double precision')

Notice how the UPDATE query calls our stored procedure update_rating() to make existing ratings of the form we desire. Once that is done, the procedure is no longer needed, so it is DROP-ed and then, finally, the type is changed to Float. So indeed, procedural languages are used for real!

Also, what’s more exciting is that one can write Postgres stored procedures in Python itself! Python is among the four procedural languages supported by PostgreSQL, the other two being Perl and Tcl. Here’s a simple Python example of the grading code we wrote earlier:

CREATE OR REPLACE FUNCTION grade(rating NUMERIC) RETURNS TEXT AS
$$
    if rating < 1:
        return 'D'
    elif rating < 2:
        return 'C'
    elif rating < 3:
        return 'B'
    elif rating < 4:
        return 'A'
    else
        return 'A+'
$$ STRICT LANGUAGE plpython3u IMMUTABLE;

This should work identical to the PL/pgSQL equivalent code above. But don’t forget to change the command for installing the procedural language, for Python it looks like:

$ createlang plpython3u yourdb

Needless to say, the 3 here stands for Python3.

Even with all this flexibility and power, one must note that procedural languages are not designed to replace programming languages. Procedural code can easily become difficult to maintain, and one often needs superuser privileges to create stored procedures. Moreover, portability is a mess with them. That is especially true when moving from one database system (like Postgres) to another (like Microsoft SQL Server). Therefore, for optimum application performance and code maintainability, ensure that there is a healthy balance between your business logic residing in stored procedures and the application layer.

This post covered a lot of information about code residing in the database and its use cases. If you want to know more about best practices involving separation of business logic, please check out this thread on StackExchange’s software engineering website.

Resources

Enforcing Constraints Throughout a Flask Back-End

Post author:schedutron
Post published:August 21, 2018
Post category:API Event FOSSASIA GSoC Open Event
Post comments:0 Comments

Recently it was discovered that Open Event Server does not validate attendees’ tickets. Specifically, it was possible to create an arbitrary number of attendees who’d be attending an event on the same ticket! To fix this, a constraint had to be set up across different layers of Open Event Server, which is based on Flask and Postgres. This post will demonstrate how the constraint was added in the server, and these steps should apply in general to any Flask-based server with a relational back-end.

First of all, the immediate idea that comes after investigating such an issue, is to add a UNIQUE constraint to the database. For this specific case, the problem was in ticket_holders table of the Open Event database. There was originally no check imposed on the ticket_id and event_id columns.

As can be seen in the ticket_holders schema (using the \d+ ticket_holders command), there is no mention of uniqueness on either column. The initial guess was that the combination of ticket_id and event_id should be unique throughout the table to avoid multiple holders attending on the same ticket. However,imposing uniqueness on just the ticket_id column would’ve also worked. So, to be on the safer side, I moved ahead by adding uniqueness on both the columns.

To fix this, we need to make changes to the ticket_holder model. So, in the ticket_holder model file, we add a __table_args__ attribute to the TicketHolder class. This attribute represents the various constraints imposed on the ticket_holders table:

class TicketHolder(db.Model):
    __tablename__ = "ticket_holders"
    __table_args__ = (
db.UniqueConstraint('ticket_id', 'event_id', name='ticket_event'),
) # this is the constraint we add

    id = db.Column(db.Integer, primary_key=True)
    firstname = db.Column(db.String, nullable=False)
    lastname = db.Column(db.String, nullable=False)

…

…

…

The TicketHolder class has attributes named ticket_id and event_id, so to add a unique constraint over them, we pass their names to the UniqueConstraint constructor. Also, any suitable name can be given to the constraint, I chose ‘ticket_event’ to simply emphasize the relationship. Now that we’ve edited the database model file, we have to perform a database migration.

Before we command the migration, we have to remove the entries that potentially violate the constraint we just imposed. As a temporary fix, I connected to the database and deleted all non-unique rows via plain SQL. For a more consistent fix, I will implement this simple deletion code in the database migration file, if need be. So, once the non-unique rows are gone, we perform the database migration as follows:

$ python manage.py db migrate

And then,

$ python manage.py db upgrade

These commands may be different for different projects, but their purpose is the same – to update the database. The upgrade command generates a migration file which looks as follows:

from alembic import op
import sqlalchemy as sa
import sqlalchemy_utils


# revision identifiers, used by Alembic.
revision = '9d21de792967'
down_revision = '194a5a2a44ef'


def upgrade():
op.create_unique_constraint('ticket_event', 'ticket_holders', ['ticket_id', 'event_id'])

def downgrade():
op.drop_constraint('ticket_event', 'ticket_holders', type_='unique')

We can see that the upgrade() function has the command for adding our constraint. Once the database has been upgraded, we can revisit the schema of ticket_holders table (using the \d+ ticket_holders command again). Now we can see that our constraint is added very well in the table schema.

Now, if one tries to create multiple attendees that attend on the same ticket, s/he gets a 500 server error. Here are the related server logs:

2018-06-05 22:04:03.824 IST [46705] ERROR:  duplicate key value violates unique constraint "ticket_event"
2018-06-05 22:04:03.824 IST [46705] DETAIL:  Key (ticket_id, event_id)=(2, 6) already exists.
2018-06-05 22:04:03.824 IST [46705] STATEMENT:  UPDATE ticket_holders SET event_id=6 WHERE ticket_holders.id = 16
127.0.0.1 - - [05/Jun/2018 22:04:03] "POST /v1/attendees HTTP/1.1" 500 -
INFO:werkzeug:127.0.0.1 - - [05/Jun/2018 22:04:03] "POST /v1/attendees HTTP/1.1" 500 -

To get a more graceful error, we also need to make changes in the API schema. This will also allow to validate the data before it gets to the database. So, in the attendees.py file, we need to add a check. This check should extract the ticket and event ids from the data posted and see whether there is already an attendee in the database attending that event on the same ticket. If such an attendee is discovered, the check should raise an error and report it back to the API caller. The suitable place for this check is the before_post() method of the AttendeeListPost class. In any Flask app serving a REST API, such a method (perhaps of a different name) should exist in the API file corresponding to a model. Our check looks like the following within the before_post() method:

from flask_rest_jsonapi import ResourceList
from app.api.helpers.exceptions import ConflictException
from app.models import db
from app.models.ticket_holder import TicketHolder

…

…

…
class AttendeeListPost(ResourceList):
"""
List and create Attendees through direct URL
"""

def before_post(self, args, kwargs, data):
"""
Before post method to check for required relationship and proper permissions
:param args:
:param kwargs:
:param data:
:return:
"""
require_relationship(['ticket', 'event'], data)

…

…

…

if db.session.query(TicketHolder.id).filter_by(
ticket_id=int(data['ticket']), event_id=int(data['event'])
).scalar() is not None:
raise ConflictException(
{'pointer': '/data/attributes/ticket_id'},
"Attendee with this ticket already exists for the same event"
)

Once this check is implemented, we’re all good to go. Now, if an attendee is created that maps to a ticket belonging to an already existing attendee, the following error is sent back to the API caller:

{
"errors": [
{
"status": 409,
"source": {
"pointer": "/data/attributes/ticket_id"
},
"title": "Conflict",
"detail": "Attendee with this ticket already exists for the same event"
}
],
"jsonapi": {
"version": "1.0"
}
}

This completes our work of enforcing this constraint throughout our Flask server. This leads to a more consistent database and potentially avoids confusion at actual events!

Resources:

Open Event Server: Creating/Rebuilding Elasticsearch Index From Existing Data In a PostgreSQL DB Using Python

Post author:shubham-padia
Post published:November 1, 2017
Post category:FOSSASIA Open Event
Post comments:0 Comments

The Elasticsearch instance in the current Open Event Server deployment is currently just used to store the events and search through it due to limited resources.

The project uses a PostgreSQL database, this blog will focus on setting up a job to create the events index if it does not exist. If the indices exists, the job will delete all the previous the data and rebuild the events index.

Although the project uses Flask framework, the job will be in pure python so that it can run in background properly while the application continues its work. Celery is used for queueing up the aforementioned jobs. For building the job the first step would be to connect to our database:

from config import Config
import psycopg2
conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
cur = conn.cursor()

The next step would be to fetch all the events from the database. We will only be indexing certain attributes of the event which will be useful in search. Rest of them are not stored in the index. The code given below will fetch us a collection of tuples containing the attributes mentioned in the code:

cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()

We will be using the the bulk API, which is significantly fast as compared to adding an event one by one via the API. Elasticsearch-py, the official python client for elasticsearch provides the necessary functionality to work with the bulk API of elasticsearch. The helpers present in the client enable us to use generator expressions to insert the data via the bulk API. The generator expression for events will be as follows:

event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)

We will now delete the events index if it exists. The the event index will be recreated. The generator expression obtained above will be passed to the bulk API helper and the event index will repopulated. The complete code for the function will now be as follows:

@celery.task(name='rebuild.events.elasticsearch')
def cron_rebuild_events_elasticsearch():
   """
   Re-inserts all eligible events into elasticsearch
   :return:
   """
   conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
   cur = conn.cursor()
   cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()
   event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)
   es_store.indices.delete('events')
   es_store.indices.create('events')
   abc = helpers.bulk(es_store, event_data)

Currently we run this job on each week and also on each new deployment. Rebuilding the index is very important as some records may not be indexed when the continuous sync is taking place.

To know more about it please visit https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/