In this post, I will talk about running multiple containers at once using Docker Compose.
The problem ?
Suppose you have a complex app with Database containers, Redis and what not. How are you going to start the app ? One way is to write a shell script that starts the containers one by one.
Now suppose these containers have lots of configurations (links, volumes, ports, environment variables) that they need to function. You will have to write those parameters in the shell script.
Won’t it get un-manageable ? Won’t it be great if we had a cleaner way to running multiple containers. Here comes docker-compose to the rescue.
Docker compose is a python package which does the job of handling multiple containers for an application very elegantly. The main file of docker-compose is docker-compose.yml which is a YAML like syntax file with the settings/components required to run your app. Once you define that file, you can just do docker-compose up to start your app with all the components and settings. Pretty cool, right ?
So let’s see the docker-compose.yml for the fictional app we have considered above.
Once this file is in the project’s root directory, you can use docker-compose up to start the application. It will run the services in the order in which they have been defined in the YAML file.
Docker compose has a lot of commands that generally correspond to the parameters that docker runaccepts. You can see a full list on the official docker-compose reference.
It’s no doubt that docker-compose is a boon when you have to run complex applications. It personally use Compose in every dockerized application that I write. In GSoC 16, I dockerized Open Event. Here is the docker-compose.yml file if you are interested.
PS – If you liked this post, you might find my other posts on Docker interesting. Do take a look and let me know your views.
Everyone likes optimization, small file sizes and such.. Won’t it be great if you are able to reduce your Docker image sizes by a factor of 2 or more. Say hello to Alpine Linux. It is a minimal Linux distro weighing just 5 MBs. It also has basic linux tools and a nice package manager APK. APK is quite stable and has a considerable amount of packages.
In this post, my main motto is how to squeeze the best out of AlpineLinux to create the smallest possible Docker image. So let’s start.
Step 1: Use AlpineLinux based images
Ok, I know that’s obvious but just for the sake of completeness of this article, I will state that prefer using Alpine based images wherever possible. Python and Redis have their official Alpine based images whereas NodeJS has good unoffical Alpine-based images. Same goes for Postgres, Ruby and other popular environments.
Step 2: Install only needed dependencies
Prefer installing select dependencies over installing a package that contains lots of them. For example, prefer installing gcc and development libraries over buildpacks. You can find listing of Alpine packages on their website.
Pro Tip – A great list of Debian v/s Alpine development packages is at alpine-buildpack-deps Docker Hub page (scroll down to Packages). It is a very complete list and you will always find the dependency you are looking for.
Step 3: Delete build dependencies after use
Build dependencies are required by components/libraries to build native extensions for the platform. Once the build is done, they are not needed. So you should delete the build-dependencies after their job is complete. Have a look at the following snippet.
I am using --virtual to give a label to the pacakages installed on that instance and then when pip install is done, I am deleting them.
Step 4: Remove cache
Cache can take up lots of un-needed space. So always run apk add with --no-cache parameter.
If you are using npm for manaing project dependencies and bower for managing frontend dependencies, it is recommended to clear their cache too.
Step 5: Learn from the experts
Each and every image on Docker Hub is open source, meaning that it’s Dockerfile is freely available. Since the official images are made as efficient as possible, it’s easy to find great tricks on how to achieve optimum performance and compact size in them. So when viewing an image on DockerHub, don’t forget to peek into its Dockerfile, it helps more than you can imagine.
That’s all I have for now. I will keep you updated on new tips if I find any. In my personal experience, I found AlpineLinux to be worth using. I tried deploying Open Event Server on Alpine but faced some issues so ended up creating a Dockerfile using debain:jessie. But for small projects, I would recommend Alpine. On large and complex projects however, you may face issues with Alpine at times. That maybe due to lack of packages, lack of library support or some other thing. But it’s not impossible to overcome those issues so if you try hard enough, you can get your app running on Alpine.
In this tutorial, I will show you how to write your first Dockerfile. I got to learn Docker because I had to implement a Docker deployment for our GSoC project Open Event Server.
First up, what is Docker ? Basically saying, Docker is an open platform for people to build, ship and run applications anytime and anywhere. Using Docker, your app will be able to run on any platform that supports Docker. And the best part is, it will run in the same way on different platforms i.e. no cross-platform issues. So you build your app for the platform you are most comfortable with and then deploy it anywhere. This is the fundamental advantage of Docker and why it was created.
So let’s start our dive into Docker.
Docker works using Dockerfile (example), a file which specifies how Docker is supposed to build your application. It contains the steps Docker is supposed to follow to package your app. Once that is done, you can send this packaged app to anyone and they can run it on their system with no problems.
Let’s start with the project structure. You will have to keep Dockerfile at the root of your project. A basic project will look as follows –
Dockerfile starts with a base image that decides on which image your app should be built upon. Basically “Images” are nothing but apps. So for example you want your run your application in Ubuntu 14.04 VM, you use ubuntu:14.04 as the base image.
These are usually the first two lines of a Dockerfile and they specify the base image and Dockerfile maintainer respectively. You can look into Docker Hub for more base images.
Now that we have started our Dockerfile, it’s time to do something. Now think, if you are trying to run your app on a new system of Ubuntu, what will be the first step you will do… You update the package lists.
You may possibly want to update the packages too.
Let’s explain what’s happening. RUN is a Docker command which instructs to run something on the shell. Here we are running apt-get update followed by apt-get upgrade -y on the shell. There is no need for sudo as Docker already runs commands with root user previledges.
The next thing you will want to do now is to put your application inside the container (your Ubuntu VM). COPY command is just for that.
Right now we were at the root of the ubuntu instance i.e. in parallel with /var, /home, /root etc. You surely don’t want to copy your files there. So we create a ‘myapp’ directory and set it as WORKDIR (project’s directory). From now on, all commands will run inside it.
Now that copying the app has been done, you may want to install it’s requirements.
You might be thinking why am I installing Python here. Isn’t it present by default !? Well let me tell you that base image ‘ubuntu’ is not the Ubuntu you are used with. It just contains the bare essentials, not stuff like python, gcc, ruby etc. So you will have to install it on your own.
Similarly if you are installing some Python package that requires gcc, it will not work. When you are struck in a issue like that, try googling the error message and most likely you will find an answer.
The last thing remaining now is to run your app. With this, your Dockerfile is complete.
Do you use Flask-Restplus ? Have you felt the need of dynamically modifying API output according to condition. If yes, then this post is for you.
In this post, I will show how to use decorators to restrict GET API output. So let’s start.
This is the basic code to create an API. Here we have created a get_speaker API to get a single item from Speaker model.
Now our need is to change the returned API data according to some condition. Like if user is authenticated then only return phone field of the SPEAKER model. One way to do this is to create condition statements in get method that marshals the output according to the situation. But if there are lots of methods which require this, then this is not a good way.
So let’s create a decorator which can change the marshal decorator at runtime. It will accept parameters as which models to marshal in case of authenticated and non-authenticated cases.
The above code adds a wrapper over the API function which checks if the user is authenticated. If the user is authenticated, fields model is used for marshalling else fields_private is used for marshalling.
So let’s create the private model for SPEAKER. We will call it SPEAKER_PRIVATE.
The final step is attaching the selective_marshal_with decorator to the get() method.
You will notice that I removed @api.marshal_with(SPEAKER). This was to disable automatic marshalling of output by flask-restplus. To compensate for this, I have added model=SPEAKER inapi.doc. It will not auto-marshal the output but will still show the swagger documentation.
That concludes this. The get method will now switch marshal field w.r.t to the authentication level of the user. As you may notice, the selective_marhsal_with function is generic and can be used with other models and APIs too.
Hello. This post is about how to setup automated tests to check if your application’s docker deployment is working or not. I used it extensively while working on the Docker deployment of the Open Event Server. In this tutorial, we will use Travis CI as the testing service.
To start testing your github project for Docker deployment, first add the repo to Travis. Then create a.travis.yml in the project’s root directory.
In that file, add docker to services.
The above will enable docker in the testing environment. It will also include docker-compose by default.
Next step is to build your app and run it. Since this is a pre-testing step, we will add it in the install directive.
The 4000 in the above text is assuming your app runs on port 4000 inside the container. Also it is assumed that Dockerfile is in the root of the repo.
So now that the docker app is running, it’s time to test it.
The above will test if our app is in one of the running docker processes. It is a basic test to see if the app is running or not.
We can go ahead and test the app’s functionality with some sample requests. Create a file test.py with the following contents.
Then run it as a test.
You can make use of the unittest module in Python to bundle and create more organized tests. The limit is the sky here.
In the end, the .travis.yml will look something like the following
So this is it. A basic tutorial on testing Docker deployments using the awesome Travis CI service.
This post is about how to efficiently/correctly download files from URLs using Python. I will be using the god-send library requests for it. I will write about methods to correctly download binaries from URLs and set their filenames.
Let’s start with baby steps on how to download a file using requests –
Now let’s take another example where url is https://www.youtube.com/watch?v=9bZkp7q19f0. What do you think will happen if the above code is used to download it ? If you said that a HTML page will be downloaded, you are spot on. This was one of the problems I faced in the Import module of Open Event where I had to download media from certain links. When the URL linked to a webpage rather than a binary, I had to not download that file and just keep the link as is. To solve this, what I did was inspecting the headers of the URL. Headers usually contain a Content-Type parameter which tells us about the type of data the url is linking to. A naive way to do it will be –
It works but is not the optimum way to do so as it involves downloading the file for checking the header. So if the file is large, this will do nothing but waste bandwidth. I looked into the requests documentation and found a better way to do it. That way involved just fetching the headers of a url before actually downloading it. This allows us to skip downloading files which weren’t meant to be downloaded.
To restrict download by file size, we can get the filesize from the Content-Length header and then do suitable comparisons.
So using the above function, we can skip downloading urls which don’t link to media.
To extract the filename from the above URL we can write a routine which fetches the last string after backslash (/).
This will be give the filename in some cases correctly. However, there are times when the filename information is not present in the url. Example, something like http://url.com/download. In that case, the Content-Disposition header will contain the filename information. Here is how to fetch it.
The url-parsing code in conjuction with the above method to get filename from Content-Dispositionheader will work for most of the cases. Use them and test the results.
These are my 2 cents on downloading files using requests in Python. Let me know of other tricks I might have overlooked.
We have developed a nice import/export feature as a part of our GSoC project Open Event. It allows user to export an event and then further import it back.
Event contains data like tracks, sessions, microlocations etc. When I was developing the basic part of this feature, it was a challenge on how to export and then further import the same data. I was in need of a format that completely stores data and is recognized by the current system. This is when I decided to use the APIs.
API documentation of Open Event project is at http://open-event.herokuapp.com/api/v2. We have a considerably rich API covering most aspects of the system. For the export, I adopted this very simple technique.
Call the corresponding GET APIs (tracks, sessions etc) for a database model internally.
Save the data in separate json files.
Zip them all and done.
This was very simple and convenient. Now the real challenge came of importing the event from the data exported. As exported data was nothing but json, we could have created the event back by sending the data back as POST request. But this was not that easy because the data formats are not exactly the same for GET and POST requests.
Sessions GET –
Sessions POST –
So the exported data can only be imported when it has been converted to POST form. Luckily, the only change between POST and GET APIs was of the related attributes where dictionary in GET was replaced with just the ID in POST/PUT. So when importing I had to make it so such that the dicts are converted to their POST counterparts. For this, all that I had to do was to list all dict-type keys and extract the id key from them. I defined a global variable as the following listing all dict keys and then wrote a function to extract the ids and convert the keys.
Now I realized that there was even a tougher problem, and that was how to re-create the relations. In the above json, you must have realized that a session can be related to speaker(s) and track. These relations are managed using the IDs of the items. When an event is imported, the IDs are bound to change and so the old IDs will become outdated i.e. a track which was at ID 62 when exported can be at ID 92 when it is imported. This will cause the relationships to break. So to counter this problem, I did the following –
Import items in a specific order, independent first
Store a map of old IDs v/s new IDs.
When dependent items are to be created, get new ID from the map and relate with it.
Let me explain the above –
The first step was to import/re-create the independent items first. Here independent items are tracks and speakers, and the dependent item is session. Now while creating the independent items, store their new IDs after create. Create a map of old ids v/s new ids and store it. This map will hold a clue to what became what after they were recreated from the json. Now the key final step is that when dependent items are to be created, find the indepedent related keys in their json using the above defined RELATED_FIELDS listing. Once they are found, extract their IDs and find the new ID corresponding to their old ID. Link the new ID with the dependent item and that would be all.
This post covers the main challenges I faced when developing the import/export feature and how I overcame them. I hope it will provide some help when you are dealing with similar problems.
In this article, I will explain how to use Celery with a Flask application. Celery requires a broker to run. The most famous of the brokers is Redis. So to start using Celery with Flask, first we will have to setup the Redis broker.
Redis can be downloaded from their site http://redis.io. I wrote a script that simplifies downloading, building and running the redis server.
When the above script is ran from the first time, the redis folder doesn’t exist so it downloads the same, builds it and then runs it. In subsequent runs, it will skip the downloading and building part and just run the server.
Now that the redis server is running, we will have to install its Python counterpart.
After the redis broker is set, now its time to setup the celery extension. First install celery by using pip install celery. Then we need to setup celery in the flask app definition.
Now that Celery is setup on our project, let’s define a sample task.
Now to run the celery workers, execute
That should be all. Now to run our little project, we can execute the following script.
If you are wondering how to run the same on Heroku, just use the free heroku-redis extension. It will start the redis server on heroku. Then to run the workers and app, set the Procfile as –
Simply put, Celery is a background task runner. It can run time-intensive tasks in the background so that your application can focus on the stuff that matters the most. In context of a Flask application, the stuff that matters the most is listening to HTTP requests and returning response.
By default, Flask runs on a single-thread. Now if a request is executed that takes several seconds to run, then it will block all other incoming requests as it is single-threaded. This will be a very bad-experience for the user who is using the product. So here we can use Celery to move time-hogging part of that request to the background.
I would like to let you know that by “background”, Celery means another process. Celery starts worker processes for the running application and these workers receive work from the main application. Celery requires a broker to be used. Broker is nothing but a database that stores results of a celery task and provides a shared interface between main process and worker processes. The output of the work done by the workers is stored in the Broker. The main application can then access these results from the Broker.
Using Celery to set background tasks in your application is as simple as follows –
Now the function background_task becomes function-able as a background task. To execute it as a background task, run –
Till now this may look nice and easy but it can cause lots of problems. This is because the background tasks run in different processes than the main application. So the state of the worker application differs from the real application.
One common problem because of this is the lack of request context. Since a celery task runs in a different process, so the request context is not available. Therefore the request headers, cookies and everything else is not available when the task actually runs. I too faced this problem and solved it using an excellent snippet I found on the Internet.
To run a task in Request context mode, do –
If you are wondering what the RequestContextTask class does, it simply stores all request context vars and global vars when a background task is called (task.delay()) and then unpacks those values to their proper places when the task is about to be run. The above snippet can be easily extended to store any value.
Another challenge that some people may face is the occasional Parsing/Serialization error. This happens because the data being sent to/from a function that is to be background executed is too complex.
Serialization is the process of converting complex data structures and objects into a plain string. Serialization of data is necessary because the background tasks and the main thread run in different processes. Now think how will the main thread communicate the celery thread to do some task. This is done using serialization of the concerned data. So to avoid serialization errors, it is recommended that you make background tasks such that they require only simple arguments to run and they return only simple data.
So basically keeping small and simple tasks is recommended when using Celery. Follow this golden rule and you will not run into any problems.
In this post, I will try to explain what is JWT, what are its advantages and why you should be using it.
JWT stands for JSON Web Tokens. Let me explain what each word means.
Tokens – Token is in tech terms a piece of data (claim) which gives access to certain piece of information and allows certain actions.
Web – Web here means that it was designed to be used on the web i.e. web projects.
JSON – JSON means that the token can contain json data. In JWT, the json is first serialized and then Base64 encoded.
A JWT looks like a random sequence of strings separated by 2 dots. The yyyyy part which you see below has the Base64 encoded form of json data mentioned earlier.
The 3 parts in order are –
Header – Header is the base64 encoded json which contains hashing algorithm on which the token is secured.
Payload – Payload is the base64 encoded json data which needs to be shared through the token. The json can include some default keys like iss (issuer), exp (expiration time), sub (subject) etc. Particularly exp here is the interesting one as it allows specifying expiry time of the token.
At this point you might be thinking that how is JWT secure if all we are doing is base64 encoding payload. After all, there are easy ways to decode base64. This is where the 3rd part (zzzzz) is used.
Signature – Signature is a hashed string made up by the first two parts of the token (header and payload) and a secret. The secret should be kept confidential to the owner who is authenticating using JWT. This is how the signature is created. (assuming HMACSHA256 as the algorithm)
How to use JWT for authentication
Once you realize it, the idea of JWT is quite simple. To use JWT for authentication, what you do is you make the client POST their username and password to a certain url. If the combination is correct, you return a JWT including username in the “Payload”. So the payload looks like –
Once the client has this JWT, they can send the same in Header when accessing protected routes. The server can read the JWT from the header and verify its correctness by matching the signature (zzzzz part) with the encoded hash created using header+payload and secret (generated signature). If the strings match, it means that the JWT is valid and therefore the request can be given access to the routes. BTW, you won’t have to go through such a deal for using JWT for authentication, there are already a handful of libraries that can do these for you.
Why use JWT over auth tokens ?
As you might have noticed in the previous section, JWT has a payload field that can contain any type of information. If you include username in it, you will be able to identify the user just by validating the JWT and there will be no need to read from the database unlike typical tokens which require a database read cycle to get the claimed user. Now if you go ahead and include permission informations in JWT too (like'isAdmin': True), then more database reads can be prevented. And this optimization comes at no cost at all. So this is why you should be using JWT.
We at Open Event use JWT for our primary means of authentication. Apart from that, we support basic authentication too. Read this post for some points about that.