Getting started with Docker Compose

In this post, I will talk about running multiple containers at once using Docker Compose. The problem ? Suppose you have a complex app with Database containers, Redis and what not. How are you going to start the app ? One way is to write a shell script that starts the containers one by one. docker run postgres:latest --name mydb -d docker run redis:3-alpine --name myredis -d docker run myapp -d Now suppose these containers have lots of configurations (links, volumes, ports, environment variables) that they need to function. You will have to write those parameters in the shell script. docker network create myapp_default docker run postgres:latest --name db -d -p 5432:5432 --net myapp_default docker run redis:3-alpine --name redis -d -p 6379:6379 \ --net myapp_default -v redis:/var/lib/redis/data docker run myapp -d -p 5000:5000 --net myapp_default -e SOMEVAR=value \ --link db:db --link redis:redis -v storage:/myapp/static Won’t it get un-manageable ? Won’t it be great if we had a cleaner way to running multiple containers. Here comes docker-compose to the rescue. Docker compose Docker compose is a python package which does the job of handling multiple containers for an application very elegantly. The main file of docker-compose is docker-compose.yml which is a YAML like syntax file with the settings/components required to run your app. Once you define that file, you can just do docker-compose up to start your app with all the components and settings. Pretty cool, right ? So let’s see the docker-compose.yml for the fictional app we have considered above. version: '2' services: db: image: postgres:latest ports: - '5432:5432' redis: image: 'redis:3-alpine' command: redis-server volumes: - 'redis:/var/lib/redis/data' ports: - '6379:6379' web: build: . environment: SOMEVAR: value links: - db:db - redis:redis volumes: - 'storage:/myapp/static' ports: - '5000:5000' volumes: redis: storage: Once this file is in the project’s root directory, you can use docker-compose up to start the application. It will run the services in the order in which they have been defined in the YAML file. Docker compose has a lot of commands that generally correspond to the parameters that docker runaccepts. You can see a full list on the official docker-compose reference. Conclusion It’s no doubt that docker-compose is a boon when you have to run complex applications. It personally use Compose in every dockerized application that I write. In GSoC 16, I dockerized Open Event. Here is the docker-compose.yml file if you are interested. PS - If you liked this post, you might find my other posts on Docker interesting. Do take a look and let me know your views.   {{ Repost from my personal blog http://aviaryan.in/blog/gsoc/docker-compose-starting.html }}

Continue ReadingGetting started with Docker Compose

Small Docker images using Alpine Linux

Everyone likes optimization, small file sizes and such.. Won’t it be great if you are able to reduce your Docker image sizes by a factor of 2 or more. Say hello to Alpine Linux. It is a minimal Linux distro weighing just 5 MBs. It also has basic linux tools and a nice package manager APK. APK is quite stable and has a considerable amount of packages. apk add python gcc In this post, my main motto is how to squeeze the best out of AlpineLinux to create the smallest possible Docker image. So let’s start. Step 1: Use AlpineLinux based images Ok, I know that’s obvious but just for the sake of completeness of this article, I will state that prefer using Alpine based images wherever possible. Python and Redis have their official Alpine based images whereas NodeJS has good unoffical Alpine-based images. Same goes for Postgres, Ruby and other popular environments. Step 2: Install only needed dependencies Prefer installing select dependencies over installing a package that contains lots of them. For example, prefer installing gcc and development libraries over buildpacks. You can find listing of Alpine packages on their website. Pro Tip - A great list of Debian v/s Alpine development packages is at alpine-buildpack-deps Docker Hub page (scroll down to Packages). It is a very complete list and you will always find the dependency you are looking for. Step 3: Delete build dependencies after use Build dependencies are required by components/libraries to build native extensions for the platform. Once the build is done, they are not needed. So you should delete the build-dependencies after their job is complete. Have a look at the following snippet. RUN apk add --virtual build-dependencies gcc python-dev linux-headers musl-dev postgresql-dev \ && pip install -r requirements.txt \ && apk del build-dependencies I am using --virtual to give a label to the pacakages installed on that instance and then when pip install is done, I am deleting them. Step 4: Remove cache Cache can take up lots of un-needed space. So always run apk add with --no-cache parameter. RUN apk add --no-cache package1 package2 If you are using npm for manaing project dependencies and bower for managing frontend dependencies, it is recommended to clear their cache too. RUN npm cache clean && bower cache clean Step 5: Learn from the experts Each and every image on Docker Hub is open source, meaning that it’s Dockerfile is freely available. Since the official images are made as efficient as possible, it’s easy to find great tricks on how to achieve optimum performance and compact size in them. So when viewing an image on DockerHub, don’t forget to peek into its Dockerfile, it helps more than you can imagine. Conclusion That’s all I have for now. I will keep you updated on new tips if I find any. In my personal experience, I found AlpineLinux to be worth using. I tried deploying Open Event Server on Alpine but faced some issues so ended up creating a…

Continue ReadingSmall Docker images using Alpine Linux

Writing your first Dockerfile

In this tutorial, I will show you how to write your first Dockerfile. I got to learn Docker because I had to implement a Docker deployment for our GSoC project Open Event Server. First up, what is Docker ? Basically saying, Docker is an open platform for people to build, ship and run applications anytime and anywhere. Using Docker, your app will be able to run on any platform that supports Docker. And the best part is, it will run in the same way on different platforms i.e. no cross-platform issues. So you build your app for the platform you are most comfortable with and then deploy it anywhere. This is the fundamental advantage of Docker and why it was created. So let’s start our dive into Docker. Docker works using Dockerfile (example), a file which specifies how Docker is supposed to build your application. It contains the steps Docker is supposed to follow to package your app. Once that is done, you can send this packaged app to anyone and they can run it on their system with no problems. Let’s start with the project structure. You will have to keep Dockerfile at the root of your project. A basic project will look as follows - - app.py - Dockerfile - requirements.txt - some_app_folder/ - some_file - some_file Dockerfile starts with a base image that decides on which image your app should be built upon. Basically “Images” are nothing but apps. So for example you want your run your application in Ubuntu 14.04 VM, you use ubuntu:14.04 as the base image. FROM ubuntu:14.04 MAINTAINER Your Name <your@email.com> These are usually the first two lines of a Dockerfile and they specify the base image and Dockerfile maintainer respectively. You can look into Docker Hub for more base images. Now that we have started our Dockerfile, it’s time to do something. Now think, if you are trying to run your app on a new system of Ubuntu, what will be the first step you will do… You update the package lists. RUN apt-get update You may possibly want to update the packages too. RUN apt-get update RUN apt-get upgrade -y Let’s explain what’s happening. RUN is a Docker command which instructs to run something on the shell. Here we are running apt-get update followed by apt-get upgrade -y on the shell. There is no need for sudo as Docker already runs commands with root user previledges. The next thing you will want to do now is to put your application inside the container (your Ubuntu VM). COPY command is just for that. RUN mkdir -p /myapp WORKDIR /myapp COPY . . Right now we were at the root of the ubuntu instance i.e. in parallel with /var, /home, /root etc. You surely don’t want to copy your files there. So we create a ‘myapp’ directory and set it as WORKDIR (project’s directory). From now on, all commands will run inside it. Now that copying the app has been done, you may want…

Continue ReadingWriting your first Dockerfile

Dynamically marshalling output in Flask Restplus

Do you use Flask-Restplus ? Have you felt the need of dynamically modifying API output according to condition. If yes, then this post is for you. In this post, I will show how to use decorators to restrict GET API output. So let’s start. This is the basic code to create an API. Here we have created a get_speaker API to get a single item from Speaker model. from flask_restplus import Resource, Model, fields, Namespace from models import Speaker api = Namespace('speakers', description='Speakers', path='/') SPEAKER = Model('Name', { 'id': fields.Integer(), 'name': fields.String(), 'phone': fields.String() }) class DAO: def get(speaker_id): return Speaker.query.get(speaker_id) @api.route('/speakers/<int:speaker_id>') class Speaker(Resource): @api.doc('get_speaker') @api.marshal_with(SPEAKER) def get(self, speaker_id): """Fetch a speaker given its id""" return DAO.get(speaker_id) Now our need is to change the returned API data according to some condition. Like if user is authenticated then only return phone field of the SPEAKER model. One way to do this is to create condition statements in get method that marshals the output according to the situation. But if there are lots of methods which require this, then this is not a good way. So let’s create a decorator which can change the marshal decorator at runtime. It will accept parameters as which models to marshal in case of authenticated and non-authenticated cases. from flask_login import current_user from flask_restplus import marshal_with def selective_marshal_with(fields, fields_private): """ Selective response marshalling. Doesn't update apidoc. """ def decorator(func): @wraps(func) def wrapper(*args, **kwargs): if current_user.is_authenticated: model = fields else: model = fields_private func2 = marshal_with(model)(func) return func2(*args, **kwargs) return wrapper return decorator The above code adds a wrapper over the API function which checks if the user is authenticated. If the user is authenticated, fields model is used for marshalling else fields_private is used for marshalling. So let’s create the private model for SPEAKER. We will call it SPEAKER_PRIVATE. from flask_restplus import Model, fields SPEAKER_PRIVATE = Model('NamePrivate', { 'id': fields.Integer(), 'name': fields.String() }) The final step is attaching the selective_marshal_with decorator to the get() method. @api.route('/speakers/<int:speaker_id>') class Speaker(Resource): @api.doc('get_speaker', model=SPEAKER) @selective_marshal_with(SPEAKER, SPEAKER_PRIVATE) def get(self, speaker_id): """Fetch a speaker given its id""" return DAO.get(speaker_id) You will notice that I removed @api.marshal_with(SPEAKER). This was to disable automatic marshalling of output by flask-restplus. To compensate for this, I have added model=SPEAKER inapi.doc. It will not auto-marshal the output but will still show the swagger documentation. That concludes this. The get method will now switch marshal field w.r.t to the authentication level of the user. As you may notice, the selective_marhsal_with function is generic and can be used with other models and APIs too.   {{ Repost from my personal blog http://aviaryan.in/blog/gsoc/dynamic-marshal-restplus.html }}

Continue ReadingDynamically marshalling output in Flask Restplus

Testing Docker Deployment using Travis

Hello. This post is about how to setup automated tests to check if your application’s docker deployment is working or not. I used it extensively while working on the Docker deployment of the Open Event Server. In this tutorial, we will use Travis CI as the testing service. To start testing your github project for Docker deployment, first add the repo to Travis. Then create a.travis.yml in the project’s root directory. In that file, add docker to services. services: - docker The above will enable docker in the testing environment. It will also include docker-compose by default. Next step is to build your app and run it. Since this is a pre-testing step, we will add it in the install directive. install: - docker build -t myapp . - docker run -d -p 127.0.0.1:80:4000 --name myapp myapp The 4000 in the above text is assuming your app runs on port 4000 inside the container. Also it is assumed that Dockerfile is in the root of the repo. So now that the docker app is running, it’s time to test it. script: - docker ps | grep -i myapp The above will test if our app is in one of the running docker processes. It is a basic test to see if the app is running or not. We can go ahead and test the app’s functionality with some sample requests. Create a file test.py with the following contents. import requests r = requests.get('http://127.0.0.1/') assert 'HomePage' in r.content, 'No homepage loaded' Then run it as a test. script: - docker ps | grep -i myapp - python test.py You can make use of the unittest module in Python to bundle and create more organized tests. The limit is the sky here. In the end, the .travis.yml will look something like the following language: python python: - "2.7" install: - docker build -t myapp . - docker run -d -p 127.0.0.1:80:4000 --name myapp myapp script: - docker ps | grep -i myapp - python test.py So this is it. A basic tutorial on testing Docker deployments using the awesome Travis CI service. Feel free to share it and comment your views.   {{ Repost from my personal blog http://aviaryan.in/blog/gsoc/docker-test.html }}

Continue ReadingTesting Docker Deployment using Travis

Downloading Files from URLs in Python

This post is about how to efficiently/correctly download files from URLs using Python. I will be using the god-send library requests for it. I will write about methods to correctly download binaries from URLs and set their filenames. Let’s start with baby steps on how to download a file using requests – import requests url = 'http://google.com/favicon.ico' r = requests.get(url, allow_redirects=True) open('google.ico', 'wb').write(r.content) The above code will download the media at http://google.com/favicon.ico and save it as google.ico. Now let’s take another example where url is https://www.youtube.com/watch?v=9bZkp7q19f0. What do you think will happen if the above code is used to download it ? If you said that a HTML page will be downloaded, you are spot on. This was one of the problems I faced in the Import module of Open Event where I had to download media from certain links. When the URL linked to a webpage rather than a binary, I had to not download that file and just keep the link as is. To solve this, what I did was inspecting the headers of the URL. Headers usually contain a Content-Type parameter which tells us about the type of data the url is linking to. A naive way to do it will be - r = requests.get(url, allow_redirects=True) print r.headers.get('content-type') It works but is not the optimum way to do so as it involves downloading the file for checking the header. So if the file is large, this will do nothing but waste bandwidth. I looked into the requests documentation and found a better way to do it. That way involved just fetching the headers of a url before actually downloading it. This allows us to skip downloading files which weren’t meant to be downloaded. import requests def is_downloadable(url): """ Does the url contain a downloadable resource """ h = requests.head(url, allow_redirects=True) header = h.headers content_type = header.get('content-type') if 'text' in content_type.lower(): return False if 'html' in content_type.lower(): return False return True print is_downloadable('https://www.youtube.com/watch?v=9bZkp7q19f0') # >> False print is_downloadable('http://google.com/favicon.ico') # >> True To restrict download by file size, we can get the filesize from the Content-Length header and then do suitable comparisons. content_length = header.get('content-length', None) if content_length and content_length > 2e8: # 200 mb approx return False So using the above function, we can skip downloading urls which don’t link to media. Getting filename from URL We can parse the url to get the filename. Example - http://aviaryan.in/images/profile.png. To extract the filename from the above URL we can write a routine which fetches the last string after backslash (/). url = 'http://aviaryan.in/images/profile.png' if url.find('/'): print url.rsplit('/', 1)[1] This will be give the filename in some cases correctly. However, there are times when the filename information is not present in the url. Example, something like http://url.com/download. In that case, the Content-Disposition header will contain the filename information. Here is how to fetch it. import requests import re def get_filename_from_cd(cd): """ Get filename from content-disposition """ if not cd: return None fname = re.findall('filename=(.+)', cd) if len(fname) == 0:…

Continue ReadingDownloading Files from URLs in Python

Import/Export feature of Open Event – Challenges

We have developed a nice import/export feature as a part of our GSoC project Open Event. It allows user to export an event and then further import it back. Event contains data like tracks, sessions, microlocations etc. When I was developing the basic part of this feature, it was a challenge on how to export and then further import the same data. I was in need of a format that completely stores data and is recognized by the current system. This is when I decided to use the APIs. API documentation of Open Event project is at http://open-event.herokuapp.com/api/v2. We have a considerably rich API covering most aspects of the system. For the export, I adopted this very simple technique. Call the corresponding GET APIs (tracks, sessions etc) for a database model internally. Save the data in separate json files. Zip them all and done. This was very simple and convenient. Now the real challenge came of importing the event from the data exported. As exported data was nothing but json, we could have created the event back by sending the data back as POST request. But this was not that easy because the data formats are not exactly the same for GET and POST requests. Example - Sessions GET – { "speakers": [ { "id": 1, "name": "Jay Sean" } ], "track": { "id": 1, "name": "Warmups" } } Sessions POST – { "speaker_ids": [1], "track_id": 1 } So the exported data can only be imported when it has been converted to POST form. Luckily, the only change between POST and GET APIs was of the related attributes where dictionary in GET was replaced with just the ID in POST/PUT. So when importing I had to make it so such that the dicts are converted to their POST counterparts. For this, all that I had to do was to list all dict-type keys and extract the id key from them. I defined a global variable as the following listing all dict keys and then wrote a function to extract the ids and convert the keys. RELATED_FIELDS = { 'sessions': [ ('track', 'track_id', 'tracks'), ('speakers', 'speaker_ids', 'speakers'), ] } Second challenge Now I realized that there was even a tougher problem, and that was how to re-create the relations. In the above json, you must have realized that a session can be related to speaker(s) and track. These relations are managed using the IDs of the items. When an event is imported, the IDs are bound to change and so the old IDs will become outdated i.e. a track which was at ID 62 when exported can be at ID 92 when it is imported. This will cause the relationships to break. So to counter this problem, I did the following - Import items in a specific order, independent first Store a map of old IDs v/s new IDs. When dependent items are to be created, get new ID from the map and relate with it. Let me explain the above…

Continue ReadingImport/Export feature of Open Event – Challenges

Setting up Celery with Flask

In this article, I will explain how to use Celery with a Flask application. Celery requires a broker to run. The most famous of the brokers is Redis. So to start using Celery with Flask, first we will have to setup the Redis broker. Redis can be downloaded from their site http://redis.io. I wrote a script that simplifies downloading, building and running the redis server. #!/bin/bash # This script downloads and runs redis-server. # If redis has been already downloaded, it just runs it if [ ! -d redis-3.2.1/src ]; then wget http://download.redis.io/releases/redis-3.2.1.tar.gz tar xzf redis-3.2.1.tar.gz rm redis-3.2.1.tar.gz cd redis-3.2.1 make else cd redis-3.2.1 fi src/redis-server When the above script is ran from the first time, the redis folder doesn't exist so it downloads the same, builds it and then runs it. In subsequent runs, it will skip the downloading and building part and just run the server. Now that the redis server is running, we will have to install its Python counterpart. pip install redis After the redis broker is set, now its time to setup the celery extension. First install celery by using pip install celery. Then we need to setup celery in the flask app definition. # in app.py def make_celery(app): # set redis url vars app.config['CELERY_BROKER_URL'] = environ.get('REDIS_URL', 'redis://localhost:6379/0') app.config['CELERY_RESULT_BACKEND'] = app.config['CELERY_BROKER_URL'] # create context tasks in celery celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL']) celery.conf.update(app.config) TaskBase = celery.Task class ContextTask(TaskBase): abstract = True def __call__(self, *args, **kwargs): with app.app_context(): return TaskBase.__call__(self, *args, **kwargs) celery.Task = ContextTask return celery celery = make_celery(current_app) Now that Celery is setup on our project, let’s define a sample task. @app.route('/task') def view(): background_task.delay(*args, **kwargs) return 'OK' @celery.task def background_task(*args, **kwargs): # code # more code Now to run the celery workers, execute celery worker -A app.celery That should be all. Now to run our little project, we can execute the following script. bash run_redis.sh & # to run redis celery worker -A app.celery & # to run celery workers python app.py If you are wondering how to run the same on Heroku, just use the free heroku-redis extension. It will start the redis server on heroku. Then to run the workers and app, set the Procfile as - web: sh heroku.sh Then set the heroku.sh as - #!/bin/bash celery worker -A app.celery & gunicorn app:app That’s a basic guide on how to run a Flask app with Celery and Redis. If you want more information on this topic, please see my post Ideas on Using Celery in Flask for background tasks.

Continue ReadingSetting up Celery with Flask

Ideas on using Celery with Flask for background tasks

Simply put, Celery is a background task runner. It can run time-intensive tasks in the background so that your application can focus on the stuff that matters the most. In context of a Flask application, the stuff that matters the most is listening to HTTP requests and returning response. By default, Flask runs on a single-thread. Now if a request is executed that takes several seconds to run, then it will block all other incoming requests as it is single-threaded. This will be a very bad-experience for the user who is using the product. So here we can use Celery to move time-hogging part of that request to the background. I would like to let you know that by “background”, Celery means another process. Celery starts worker processes for the running application and these workers receive work from the main application. Celery requires a broker to be used. Broker is nothing but a database that stores results of a celery task and provides a shared interface between main process and worker processes. The output of the work done by the workers is stored in the Broker. The main application can then access these results from the Broker. Using Celery to set background tasks in your application is as simple as follows - @celery.task def background_task(*args, **kwargs): # do stuff # more stuff Now the function background_task becomes function-able as a background task. To execute it as a background task, run - task = background_task.delay(*args, **kwargs) print task.state # task current state (PENDING, SUCCESS, FAILURE) Till now this may look nice and easy but it can cause lots of problems. This is because the background tasks run in different processes than the main application. So the state of the worker application differs from the real application. One common problem because of this is the lack of request context. Since a celery task runs in a different process, so the request context is not available. Therefore the request headers, cookies and everything else is not available when the task actually runs. I too faced this problem and solved it using an excellent snippet I found on the Internet. """ Celery task wrapper to set request context vars and global vars when a task is executed Based on http://xion.io/post/code/celery-include-flask-request-context.html """ from celery import Task from flask import has_request_context, make_response, request, g from app import app # the flask app __all__ = ['RequestContextTask'] class RequestContextTask(Task): """Base class for tasks that originate from Flask request handlers and carry over most of the request context data. This has an advantage of being able to access all the usual information that the HTTP request has and use them within the task. Pontential use cases include e.g. formatting URLs for external use in emails sent by tasks. """ abstract = True #: Name of the additional parameter passed to tasks #: that contains information about the original Flask request context. CONTEXT_ARG_NAME = '_flask_request_context' GLOBALS_ARG_NAME = '_flask_global_proxy' GLOBAL_KEYS = ['user'] def __call__(self, *args, **kwargs): """Execute task code with given…

Continue ReadingIdeas on using Celery with Flask for background tasks

Introduction to JWT

In this post, I will try to explain what is JWT, what are its advantages and why you should be using it. JWT stands for JSON Web Tokens. Let me explain what each word means. Tokens - Token is in tech terms a piece of data (claim) which gives access to certain piece of information and allows certain actions. Web - Web here means that it was designed to be used on the web i.e. web projects. JSON - JSON means that the token can contain json data. In JWT, the json is first serialized and then Base64 encoded. A JWT looks like a random sequence of strings separated by 2 dots. The yyyyy part which you see below has the Base64 encoded form of json data mentioned earlier. xxxxx.yyyyy.zzzzz The 3 parts in order are - Header - Header is the base64 encoded json which contains hashing algorithm on which the token is secured. Payload - Payload is the base64 encoded json data which needs to be shared through the token. The json can include some default keys like iss (issuer), exp (expiration time), sub (subject) etc. Particularly exp here is the interesting one as it allows specifying expiry time of the token. At this point you might be thinking that how is JWT secure if all we are doing is base64 encoding payload. After all, there are easy ways to decode base64. This is where the 3rd part (zzzzz) is used. Signature - Signature is a hashed string made up by the first two parts of the token (header and payload) and a secret. The secret should be kept confidential to the owner who is authenticating using JWT. This is how the signature is created. (assuming HMACSHA256 as the algorithm) HMACSHA256( xxxxx + "." + yyyyy, secret) How to use JWT for authentication Once you realize it, the idea of JWT is quite simple. To use JWT for authentication, what you do is you make the client POST their username and password to a certain url. If the combination is correct, you return a JWT including username in the “Payload”. So the payload looks like - { "username": "john.doe" } Once the client has this JWT, they can send the same in Header when accessing protected routes. The server can read the JWT from the header and verify its correctness by matching the signature (zzzzz part) with the encoded hash created using header+payload and secret (generated signature). If the strings match, it means that the JWT is valid and therefore the request can be given access to the routes. BTW, you won’t have to go through such a deal for using JWT for authentication, there are already a handful of libraries that can do these for you. Why use JWT over auth tokens ? As you might have noticed in the previous section, JWT has a payload field that can contain any type of information. If you include username in it, you will be able to identify the user just…

Continue ReadingIntroduction to JWT