Open Event Server – Change a Column from NULL to NOT NULL

FOSSASIA‘s Open Event Server uses alembic migration files to handle all database operations and updating. Whenever the database is changed a corresponding migration python script is made so that the database will migrate accordingly for other developers as well. But often we forget that the automatically generated script usually just add/deletes columns or alters the column properties. It does not handle the migration of existing data in that column. This can lead to huge data loss or error in migration as well. For example : def upgrade():    # ### commands auto generated by Alembic - please adjust! ###    op.alter_column('ticket_holders', 'lastname',                    existing_type=sa.VARCHAR(),                    nullable=False)    # ### end Alembic commands ### Here, the goal was to change the column “ticket_holders” from nullable to not nullable. The script that alembic autogenerated just uses op.alter_column(). It does not count for the already existing data. So, if the column has any entries which are null, this migration will lead to an error saying that the column contains null entries and hence cannot be “NOT NULL”. How to Handle This? Before altering the column definition we can follow the following steps : Look for all the null entries in the column Give some arbitrary default value to those Now we can safely alter the column definition Let's see how we can achieve this. For connecting with the database we will use SQLAlchemy. First, we get a reference to the table and the corresponding column that we wish to alter. ticket_holders_table = sa.sql.table('ticket_holders',                                        sa.Column('lastname', sa.VARCHAR()))   Since we need the “last_name” column from the table “ticket_holders”, we specify it in the method argument. Now, we will give an arbitrary default value to all the originally null entries in the column. In this case, I chose to use a space character. op.execute(ticket_holders_table.update()               .where(ticket_holders_table.c.lastname.is_(None))               .values({'lastname': op.inline_literal(' ')})) op.execute() can execute direct SQL commands as well but we chose to go with SQLAlchemy which builds an optimal SQL command from our modular input. One such example of a complex SQL command being directly executed is : op.execute('INSERT INTO event_types(name, slug) SELECT DISTINCT event_type_id, lower(replace(regexp_replace(event_type_id, \'& |,\', \'\', \'g\'), \' \', \'-\')) FROM events where not exists (SELECT 1 FROM event_types where event_types.name=events.event_type_id) and event_type_id is not null;')) Now that we have handled all the null data, it is safe to alter the column definition. So we proceed to execute the final statement - op.alter_column('ticket_holders', 'lastname',                    existing_type=sa.VARCHAR(),                    nullable=False) Now the entire migration script will run without any error. The final outcome would be - All the null “last_name” entries would be replaced by a space character The “last_name” column would now be a NOT NULL column. References Alembic SQLAlchemy Open Event Server

Continue ReadingOpen Event Server – Change a Column from NULL to NOT NULL

Implementing Database Migrations to Badgeyay

Badgeyay project is divided into two parts i.e front-end of Ember JS and back-end with REST-API programmed in Python. We have integrated PostgreSQL as the object-relational database in Badgeyay and we are using SQLAlchemy SQL Toolkit and Object Relational Mapper tools for working with databases and Python. As we have Flask microframework for Python, so we are having Flask-SQLAlchemy as an extension for Flask that adds support for SQLAlchemy to work with the ORM. One of the challenging jobs is to manage changes we make to the models and propagate these changes in the database. For this purpose, I have added Added Migrations to Flask SQLAlchemy for handling database changes using the Flask-Migrate extension. In this blog, I will be discussing how I added Migrations to Flask SQLAlchemy for handling Database changes using the Flask-Migrate extension in my Pull Request. First, Let’s understand Database Models, Migrations, and Flask Migrate extension. Then we will move onto adding migrations using Flask-Migrate. Let’s get started and understand it step by step. What are Database Models? A Database model defines the logical design and structure of a database which includes the relationships and constraints that determine how data can be stored and accessed. Presently, we are having a User and file Models in the project. What are Migrations? Database migration is a process, which usually includes assessment, database schema conversion. Migrations enable us to manipulate modifications we make to the models and propagate these adjustments in the database. For example, if later on, we make a change to a field in one of the models, all we will want to do is create and do a migration, and the database will replicate the change. What is Flask Migrate? Flask-Migrate is an extension that handles SQLAlchemy database migrations for Flask applications using Alembic. The database operations are made available through the Flask command-line interface or through the Flask-Script extension. Now let’s add support for migration in Badgeyay. Step 1 : pip install flask-migrate   Step 2 : We will need to edit run.py and it will look like this : import os from flask import Flask from flask_migrate import Migrate // Imported Flask Migrate from api.db import db from api.config import config ...... db.init_app(app) migrate = Migrate(app, db) // It will allow us to run migrations ...... @app.before_first_request def create_tables(): db.create_all() if __name__ == '__main__': app.run()   Step 3 : Creation of Migration Directory. export FLASK_APP=run.py flask db init   This will create Migration Directory in the backend API folder. └── migrations ├── README ├── alembic.ini ├── env.py ├── script.py.mako └── versions   Step 4 : We will do our first Migration by the following command. flask db migrate   Step 5 : We will apply the migrations by the following command. flask db upgrade   Now we are all done with setting up Migrations to Flask SQLAlchemy for handling database changes in the badgeyay repository. We can verify the Migration by checking the database tables in the Database. This is how I have added…

Continue ReadingImplementing Database Migrations to Badgeyay

Auto Deployment of Badgeyay Backend by Heroku Pipeline

Badgeyay project is now divided into two parts i.e front-end of Ember JS and back-end with REST-API programmed in Python. One of the challenging job is that, it should support the uncoupled architecture. Now, we have to integrate Heroku deployed API with Github which should auto deploy every Pull Request made to the Development Branch and help in easing the Pull Request review process. In this blog, I’ll be discussing how I have configured Heroku Pipeline to auto deploy every Pull request made to the Development Branch and help in easing the Pull Request review process  in Badgeyay in my Pull Request. First, Let’s understand Heroku Pipeline and its features. Then we will move onto configuring the Pipeline file to run auto deploy PR.. Let’s get started and understand it step by step. What is Heroku Pipeline ? A pipeline is a group of Heroku apps that share the same codebase. Each app in a pipeline represents one of the following steps in a continuous delivery workflow: Review Development Staging Production A common Heroku continuous delivery workflow has the following steps: A developer creates a pull request to make a change to the codebase. Heroku automatically creates a review app for the pull request, allowing    developers to test the change. When the change is ready, it’s merged into the codebase Default branch. The Default branch is automatically deployed to staging for further testing. When it’s ready, the staging app is promoted to production, where the change is available to end users of the app. In badgeyay, I have used Review App and Development App steps for auto deployment of Pull Request. Pre - requisites: You should have admin rights of the Github Repository. You should be the owner of the Heroku deployed app. For creating a Review App , Below mentioned files are needed to be in the root of the project repository to trigger the Heroku Build. 1. App.json { "name": "BadgeYay-API", "description": "A fully functional REST API for badges generator using flask", "repository": "https://github.com/fossasia/badgeyay/backend/", "keywords": [ "badgeyay", "fossasia", "flask" ], "buildpacks": [ { "url": "heroku/python" } ] } 2. Procfile web: gunicorn --pythonpath backend/app/ main:app   Now, I have fulfilled all the prerequisites needed for integrating Github repository to Heroku Deployed Badgeyay API. Let’s move to Heroku Dashboard of the Badgeyay API and implement auto deployment of every Pull Request. Step 1 : Open the heroku Deployed App on the dashboard. Yow will see following tabs in top of the dashboard. Step 2 : Click on Deploy and first create a new pipeline by giving a name to it and choose a stage for the pipeline. Step 3 : Choose a Deployment Method. For the badgeyay project, I have  integrated Github for auto deployment of PR. Select the repository and connect with it. You will receive a pop-up which will ensure that repository is connected to Heroku. Step 4 : Enable automatic deploys for the Github repository. Step 5 : Now after adding the pipeline, present app get…

Continue ReadingAuto Deployment of Badgeyay Backend by Heroku Pipeline

Unit Tests for REST-API in Python Web Application

Badgeyay backend is now shifted to REST-API and to test functions used in REST-API, we need some testing technology which will test each and every function used in the API. For our purposes, we chose the popular unit tests Python test suite. In this blog, I’ll be discussing how I have written unit tests to test Badgeyay  REST-API. First, let’s understand what is unittests and why we have chosen it. Then we will move onto writing API tests for Badgeyay. These tests have a generic structure and thus the code I mention would work in other REST API testing scenarios, often with little to no modifications. Let’s get started and understand API testing step by step. What is Unittests? Unitests is a Python unit testing framework which supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework. The unittest module provides classes that make it easy to support these qualities for a set of tests. Why Unittests? We get two primary benefits from unit testing, with a majority of the value going to the first: Guides your design to be loosely coupled and well fleshed out. If doing test driven development, it limits the code you write to only what is needed and helps you to evolve that code in small steps. Provides fast automated regression for re-factors and small changes to the code. Unit testing also gives you living documentation about how small pieces of the system work. We should always strive to write comprehensive tests that cover the working code pretty well. Now, here is glimpse of how  I wrote unit tests for testing code in the REST-API backend of Badgeyay. Using unittests python package and requests modules, we can test REST API in test automation. Below is the code snippet for which I have written unit tests in one of my pull requests. def output(response_type, message, download_link): if download_link == '': response = [ { 'type': response_type, 'message': message } ] else: response = [ { 'type': response_type, 'message': message, 'download_link': download_link } ] return jsonify({'response': response})   To test this function, I basically created a mock object which could simulate the behavior of real objects in a controlled way, so in this case a mock object may simulate the behavior of the output function and return something like an JSON response without hitting the real REST API. Now the next challenge is to parse the JSON response and feed the specific value of the response JSON to the Python automation script. So Python reads the JSON as a dictionary object and it really simplifies the way JSON needs to be parsed and used. And here's the content of the backend/tests/test_basic.py file. #!/usr/bin/env python3 """Tests for Basic Functions""" import sys import json import unittest sys.path.append("../..") from app.main import * class TestFunctions(unittest.TestCase): """Test case for the client methods.""" def setup(self): app.app.config['TESTING'] = True self.app = app.app.test_client() # Test of Output function def test_output(self):…

Continue ReadingUnit Tests for REST-API in Python Web Application

Parallelizing Builds In Travis CI

Badgeyay project is now divided into two parts i.e front-end of emberJS and back-end with REST-API programmed in Python. Now, one of the challenging job is that, it should support the uncoupled architecture. It should therefore run tests for the front-end and backend i.e, of two different languages on isolated instances by making use of the isolated parallel builds. In this blog, I’ll be discussing how I have configured Travis CI to run the tests parallely in isolated parallel builds in Badgeyay in my Pull Request. First let’s understand what is Parallel Travis CI build and why we need it. Then we will move onto configuring the travis.yml file to run tests parallely. Let's get started and understand it step by step. Why Parallel Travis CI Build? The integration test suites tend to test more complex situations through the whole stack which incorporates front-end and back-end, they likewise have a tendency to be the slowest part, requiring various minutes to run, here and there even up to 30 minutes. To accelerate a test suite like that, we can split it up into a few sections utilizing Travis build matrix feature. Travis will decide the build matrix based on environment variables and schedule two builds to run. Now our objective is clear that we have to configure travis.yml to build parallel-y. Our project requires two buildpacks, Python and node_js, running the build jobs for both them would speed up things by a considerable amount.It seems be possible now to run several languages in one .travis.yml file using the matrix:include feature. Below is the code snippet of the travis.yml file  for the Badgeyay project in order to run build jobs in a parallel fashion. sudo: required dist: trusty # check different combinations of build flags which is able to divide builds into “jobs”. matrix: # Helps to run different languages in one .travis.yml file include: # First Job in Python. - language: python3 apt: packages: - python-dev python: - 3.5 cache: directories: - $HOME/backend/.pip-cache/ before_install: - sudo apt-get -qq update - sudo apt-get -y install python3-pip - sudo apt-get install python-virtualenv install: - virtualenv -p python3 ../flask_env - source ../flask_env/bin/activate - pip3 install -r backend/requirements/test.txt --cache-dir before_script: - export DISPLAY=:99.0 - sh -e /etc/init.d/xvfb start - sleep 3 script: - python backend/app/main.py >> log.txt 2>&1 & - python backend/app/main.py > /dev/null & - py.test --cov ../ ./backend/app/tests/test_api.py after_success: - bash <(curl -s https://codecov.io/bash) # Second Job in node js. - language: node_js node_js: - "6" addons: chrome: stable cache: directories: - $HOME/frontend/.npm env: global: # See https://git.io/vdao3 for details. - JOBS=1 before_install: - cd frontend - npm install - npm install -g ember-cli - npm i eslint-plugin-ember@latest --save-dev - npm config set spin false script: - npm run lint:js - npm test   Now, as we have added travis.yml and pushed it to the project repo. Here is the screenshot of passing Travis CI after parallel build jobs. The related PR of this work is https://github.com/fossasia/badgeyay/pull/512 Resources : Travis CI documentation -…

Continue ReadingParallelizing Builds In Travis CI

Deploying BadgeYaY with Docker on Docker Cloud

We already have a Dockerfile present in the repository but  there is problem in many lines of code.I studied about Docker and learned how It is deployed and I am now going to explain how I deployed BadgeYaY on Docker Cloud. To make deploying of Badgeyay easier we are now supporting Docker based installation. Before we start to deploy, let’s have a quick brief about what is docker and how it works ? What is Docker ? Docker is an open-source technology that allows you create, deploy, and run applications using containers. Docker allows you deploy technologies with many underlying components that must be installed and configured in a single, containerized instance.Docker makes it easier to create and deploy applications in an isolated environment. Now, let's start with how to deploy on docker cloud: Step 1 - Installing Docker Get the latest version of docker. See the offical site for installation info for your platform. Step 2 - Create Dockerfile With Docker, we can just grab a portable Python runtime as an image, no installation necessary. Then, our build can include the base Python image right alongside our app code, ensuring that our app, its dependencies, and the runtime, all travel together. These portable images are defined by something called a Dockerfile. In DockerFile, there are all the commands a user could call on the command line to assemble an image. Here’s is the Dockerfile of BadgeYaY. # The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. FROM python:3.6 # We copy just the requirements.txt first to leverage Docker cache COPY ./app/requirements.txt /app/ # The WORKDIR instruction sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions that follow it in the Dockerfile. WORKDIR /app # The RUN instruction will execute any commands in a new layer on top of the current image and commit the results. RUN pip install -r requirements.txt # The COPY instruction copies new files. COPY . /app # An ENTRYPOINT allows you to configure a container that will run as an executable. ENTRYPOINT [ "python" ] # The main purpose of a CMD is to provide defaults for an executing container. CMD [ "main.py" ]   Step 3 - Build New Docker Image sudo docker build -t badgeyay:latest .   When the command completed successfully, we can check the new image with the docker command below: sudo docker images   Step 4 - Run the app Let’s run the app in the background, in detached mode: sudo docker run -d -p 5000:5000 badgeyay   We get the long container ID for our app and then are kicked back to our terminal.Our container is running in the background.Now use docker container stop to end the process, using the CONTAINER ID, like so :   docker container stop 1fa4ab2cf395   Step 5 - Publish the app. Log in to the Docker public registry on your local machine. docker login   Upload your tagged image to the repository:…

Continue ReadingDeploying BadgeYaY with Docker on Docker Cloud

Setting up Codecov in Badgeyay

  BadgeYaY already has Travis CI and Codacy to test code quality and Pull Request but there was no support for testing Code Coverage in repository against every Pull Request. So I decided to go with setting up Codecov to test the code coverage. In this blog post, I’ll be discussing how I have set up codecov in BadgeYaY in my Pull Request. First, let’s understand what is codecov and why do we need it. For that we have to first understand what is code coverage then we will move on to how to add Codecov with help of Travis CI . Let’s get started and understand it step by step. What is Code Coverage ? Code coverage is a measurement used to express which lines of code were executed by a test suite. We use three primary terms to describe each lines executed. hit indicates that the source code was executed by the test suite. partial indicates that the source code was not fully executed by the test suite; there are remaining branches that were not executed. miss indicates that the source code was not executed by the test suite. Coverage is the ratio of hits / (hit + partial + miss). A code base that has 5 lines executed by tests out of 12 total lines will receive a coverage ratio of 41% . In BadgeYaY , Code Coverage is 100%. How CodeCov helps in Code Coverage ? Codecov focuses on integration and promoting healthy pull requests. Codecov delivers <<<or "injects">>> coverage metrics directly into the modern workflow to promote more code coverage, especially in pull requests where new features and bug fixes commonly occur. I am listing down top 5 Codecov Features: Browser Extension Pull Request Comments Commit Status Merging Reports Flags e.g. #unittests vs #functional We can change the configuration of how Codecov processes reports and expresses coverage information. Let’s see how we configure it according to BadgeYaY by integrating it with Travis CI. Now generally, the codecov works better with Travis CI. With the one line bash <(curl -s https://codecov.io/bash)   the code coverage can now be easily reported. Add a script for testing: "scripts": { - nosetests app/tests/test.py -v --with-coverage } Here is a particular example of travis.yml from the project repository of BadgeYaY: Script: - python app/main.py >> log.txt 2>&1 & - nosetts app/tests/test.py -v --with-coverage - python3 -m pyflakes after_success: - bash <(curl -s https://codecov.io/bash)   Let’s have a look at Codecov.yml to check exact configuration that I have used for BadgeYaY. Codecov: # yes: will delay sending notifications until all ci is finished notify: require_ci_to_pass: yes coverage: # how many decimal places to display in the UI: 0 <= value <= 4 precision: 2 # how coverage is rounded: down/up/nearest round: down # custom range of coverage colors from red -> yellow -> green range: "70...100" status: # measuring the overall project coverage project: yes # pull requests only: this commit status will measure the entire pull requests Coverage Diff. Checking…

Continue ReadingSetting up Codecov in Badgeyay

Resolving Internal Error on Badgeyay

Badgeyay is in development stage and is frequently seen to encounter bugs. One such bug is the Internal Server Error in Badgeyay. What was the bug? The bug was with the badge generator’s backend code. The generator was trying to server the zip file that was not present. After going through the log I noticed that it was because a folder was missing from Badgeyay’s directory.   I immediately filed an issue #58 which stated the bug and how could it be resolved. After being assigned to the issue I did my work and created a Pull Request that was merged soon. The Pull Request can be found here. Resolving the bug With the help of extensive error management and proper code and log analysis I was able to figure out a fix for this bug. It was in-fact due to a missing folder that was deleted by a subsequent code during zipfile/pdf generation. It was supposed to be recreated every time it was deleted. I quickly designed a function that solved this error for future usage of Badgeyay.   How was it resolved? First I started by checking if the “BADGES_FOLDER” was not present. And if it was not present then the folder was created using the commands below   if not os.path.exists(BADGES_FOLDER):    os.mkdir(BADGES_FOLDER)   Then, I added docstring to the remaining part of the code. It was used to empty all the files and folder inside the “BADGES_FOLDER”. We could have to delete two things, a folder or a file. So proper instructions are added to handle file deletion and folder deletion.   for file in os.listdir(BADGES_FOLDER):    file_path = os.path.join(BADGES_FOLDER, file)    try:        if os.path.isfile(file_path):            os.unlink(file_path)        elif os.path.isdir(file_path):            shutil.rmtree(file_path)    except Exception:        traceback.print_exc()   Here “os.unlink” is a function that is used to delete a file. And “shutil.rmtree” is a function that deletes the whole folder at once. It is similar to “sudo rm -rf /directory”. Proper error handling is done as well to ensure stability of program as well. Challenges There were many problems that I had to face during this bug. It was my first time solving a bug, so I was nervous. I had no knowledge about “shutil” library. I was a new-comer. But I took these problems as challenges and was able to fix this bug that caused the INTERNAL SERVER ERROR : 500 . Resources BadgeYay Repository : https://github.com/fossasia/badgeyay Pull Request for the same : https://github.com/fossasia/badgeyay/pull/59 Issue for the same : https://github.com/fossasia/badgeyay/issues/58 Learn about OS Module : https://docs.python.org/2/library/os.html Learn about SHUTIL module : https://docs.python.org/2/library/shutil.html Read about Error Handling : https://docs.python.org/3/tutorial/errors.html Learn how to delete folder and file in Python : https://stackoverflow.com/questions/6996603/how-to-delete-a-file-or-folder    

Continue ReadingResolving Internal Error on Badgeyay

UI automated testing using Selenium in Badgeyay

With all the major functionalities packed into the badgeyay web application, it was time to add some automation testing to automate the review process in case of known errors and check if code contribution by contributors is not breaking anything. We decided to go with Selenium for our testing requirements. What is Selenium? Selenium is a portable software-testing framework for web applications. Selenium provides a playback (formerly also recording) tool for authoring tests without the need to learn a test scripting language. In other words, Selenium does browser automation:, Selenium tells a browser to click some element, populate and submit a form, navigate to a page and any other form of user interaction. Selenium supports multiple languages including C#, Groovy, Java, Perl, PHP, Python, Ruby and Scala. Here, we are going to use Python (and specifically python 2.7). First things first: To install these package run this code on the CLI: pip install selenium==2.40 pip install nose Don’t forget to add them in the requirements.txt file Web Browser: We also need to have Firefox installed on your machine. Writing the Test An automated test automates what you'd do via manual testing - but it is done by the computer. This frees up time and allows you to do other things, as well as repeat your testing. The test code is going to run a series of instructions to interact with a web browser - mimicking how an actual end user would interact with an application. The script is going to navigate the browser, click a button, enter some text input, click a radio button, select a drop down, drag and drop, etc. In short, the code tests the functionality of the web application. A test for the web page title: import unittest from selenium import webdriver class SampleTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.driver = webdriver.Firefox() cls.driver.get('http://badgeyay-dev.herokuapp.com/') def test_title(self): self.assertEqual(self.driver.title, 'Badgeyay') @classmethod def tearDownClass(cls): cls.driver.quit()   Run the test using nose test.py Clicking the element For our next test, we click the menu button, and check if the menu becomes visible. elem = self.driver.find_element_by_css_selector(".custom-menu-content") self.driver.find_element_by_css_selector(".glyphicon-th").click() self.assertTrue(elem.is_displayed())   Uploading a CSV file: For our next test, we upload a CSV file and see if a success message pops up. def test_upload(self): Imagepath = os.path.abspath(os.path.join(os.getcwd(), 'badges/badge_1.png')) CSVpath = os.path.abspath(os.path.join(os.getcwd(), 'sample/vip.png.csv')) self.driver.find_element_by_name("file").send_keys(CSVpath) self.driver.find_element_by_name("image").send_keys(Imagepath) self.driver.find_element_by_css_selector("form .btn-primary").click() time.sleep(3) success = self.driver.find_element_by_css_selector(".flash-success") self.assertIn(u'Your badges has been successfully generated!', success.text)   The entire code can be found on: https://github.com/fossasia/badgeyay/tree/development/app/tests We can also use the Phantom.js package along with Selenium for UI testing purposes without opening a web browser. We use this for badgeyay to run the tests for every commit in Travis CI which cannot open a program window. Resources Selenium with Python by Baiju Muthukadan: http://selenium-python.readthedocs.io Getting started with UI autometed tests using (Selenium + Python) by Daniel Anggrianto: https://engineering.aweber.com/getting-started-with-ui-automated-tests-using-selenium-python/ Selenium Webdriver Python Tutorial For Web Automation by Meenakshi Agarwal: http://www.techbeamers.com/selenium-webdriver-python-tutorial/ How to Use Selenium with Python by Guru99: https://www.guru99.com/selenium-python.html

Continue ReadingUI automated testing using Selenium in Badgeyay

Open Event Server: Creating/Rebuilding Elasticsearch Index From Existing Data In a PostgreSQL DB Using Python

The Elasticsearch instance in the current Open Event Server deployment is currently just used to store the events and search through it due to limited resources. The project uses a PostgreSQL database, this blog will focus on setting up a job to create the events index if it does not exist. If the indices exists, the job will delete all the previous the data and rebuild the events index. Although the project uses Flask framework, the job will be in pure python so that it can run in background properly while the application continues its work. Celery is used for queueing up the aforementioned jobs. For building the job the first step would be to connect to our database: from config import Config import psycopg2 conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI) cur = conn.cursor()   The next step would be to fetch all the events from the database. We will only be indexing certain attributes of the event which will be useful in search. Rest of them are not stored in the index. The code given below will fetch us a collection of tuples containing the attributes mentioned in the code: cur.execute(        "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")    events = cur.fetchall()   We will be using the the bulk API, which is significantly fast as compared to adding an event one by one via the API. Elasticsearch-py, the official python client for elasticsearch provides the necessary functionality to work with the bulk API of elasticsearch. The helpers present in the client enable us to use generator expressions to insert the data via the bulk API. The generator expression for events will be as follows: event_data = ({'_type': 'event',                   '_index': 'events',                   '_id': event_[0],                   'name': event_[1],                   'description': event_[2] or None,                   'searchable_location_name': event_[3] or None,                   'organizer_name': event_[4] or None,                   'organizer_description': event_[5] or None}                  for event_ in events)   We will now delete the events index if it exists. The the event index will be recreated. The generator expression obtained above will be passed to the bulk API helper and the event index will repopulated. The complete code for the function will now be as follows:   @celery.task(name='rebuild.events.elasticsearch') def cron_rebuild_events_elasticsearch():    """    Re-inserts all eligible events into elasticsearch    :return:    """    conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)    cur = conn.cursor()    cur.execute(        "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")    events = cur.fetchall()    event_data = ({'_type': 'event',                   '_index': 'events',                   '_id': event_[0],                   'name': event_[1],                   'description': event_[2] or None,                   'searchable_location_name': event_[3] or None,                   'organizer_name': event_[4] or None,                   'organizer_description': event_[5] or None}                  for event_ in events)    es_store.indices.delete('events')    es_store.indices.create('events')    abc = helpers.bulk(es_store, event_data)   Currently we run this job on each week and also on each new deployment. Rebuilding the index is very important as some records may not be indexed when the continuous sync is taking place. To know more about it please visit https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/ Related links: Syncing Postgres to Elasticsearch, lessons learned: https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/ Elasticsearch Python Client: https://github.com/elastic/elasticsearch-py

Continue ReadingOpen Event Server: Creating/Rebuilding Elasticsearch Index From Existing Data In a PostgreSQL DB Using Python