Continuous Integration and Deployment of Yacy Grid

We have deployed Yacy Grid on Google cloud recently, and we have achieved this using kubernetes and Travis for auto deployment.

How we have deployed it:

Firstly, it is advised to have different containers for each service your application requires, and follow a multi container architecture. Using multi container architecture you can allocate fixed size of power to each application and also replicate individual services, whichever is required. Presently, Yacy has two main applications which are required to be deployed in separate containers – Yacy_grid_mcp and ElasticSearch.

We took the official kubernetes YAML files of ElasticSearch and followed the instructions at https://github.com/kubernetes/examples/blob/master/staging/elasticsearch/README.md for deployment of elastic search on the google cloud.

With this we are able to run pods, volumes required for elastic search and services for connecting Yacy with elastic search.

The pull request regarding deployment of separate elasticsearch component is at https://github.com/yacy/yacy_grid_mcp/pull/27/files

Below figure shows different services and external endpoints present pods use for elastic search.

Now elastic search can be accessed at 35.202.154.219:9300 and http://35.193.124.253:9200/

Continuous deployment of Yacy_grid_mcp:

Please make sure that you have created a cluster on google container engine for deploying our containers on it. Regarding starting a project and cluster please read https://cloud.google.com/container-engine/docs/

1.Initially, Travis.yml initiates and sets up the required environment for Yacy deployment by installing Google cloud cli and kubectl components.

Source code regarding the Travis setup could be found at https://github.com/yacy/yacy_grid_mcp/blob/master/.travis.yml

2.Later Travis runs the depoy_staging.sh file, which builds the docker image of yacy o the present build and pushes it to hub.docker.com

if [ "$TRAVIS_PULL_REQUEST" != "false" -o "$TRAVIS_BRANCH" != "$SOURCE_BRANCH" ]; then
    echo "Skipping deploy; The request or commit is not on master"
    exit 0
fi

set -e

docker build -t nikhilrayaprolu/yacygridmcp:$TRAVIS_COMMIT ./docker
docker login -u="$DOCKER_USERNAME" -p="$DOCKER_PASSWORD"
docker tag nikhilrayaprolu/yacygridmcp:$TRAVIS_COMMIT nikhilrayaprolu/yacygridmcp:latest
docker push nikhilrayaprolu/yacygridmcp

Later with service key, we authenticate with google cloud and set the required environments and variables

echo $GCLOUD_SERVICE   base64 --decode -i > ${HOME}/gcloud-service-key.json
gcloud auth activate-service-account --key-file ${HOME}/gcloud-service-key.json

gcloud --quiet config set project $PROJECT_NAME_STG
gcloud --quiet config set container/cluster $CLUSTER_NAME_STG
gcloud --quiet config set compute/zone ${CLOUDSDK_COMPUTE_ZONE}
gcloud --quiet container clusters get-credentials $CLUSTER_NAME_STG

And Later we push the docker image built to google cloud and deploy it

kubectl config view
kubectl config current-context

kubectl set image deployment/${KUBE_DEPLOYMENT_NAME} ${KUBE_DEPLOYMENT_CONTAINER_NAME}=nikhilrayaprolu/yacygridmcp:$TRAVIS_COMMIT

Presently Yacy runs on 5vCPUs

With the following pods and services:

Also one can use kubectl cli for getting information regarding the cluster and pods as shown below

Pull request regarding deployment of yacy on google cloud is available at: https://github.com/yacy/yacy_grid_mcp/pull/16/files

References:

1.A Medium Blog on CD to Google Container: https://medium.com/google-cloud/continuous-delivery-in-a-microservice-infrastructure-with-google-container-engine-docker-and-fb9772e81da7

2.Another Blog on CD to Google Container: https://engineering.hexacta.com/automatic-deployment-of-multiple-docker-containers-to-google-container-engine-using-travis-e5d9e191d5ad

3.Deploying ElasticSearch to Cloud using Kubernetes: https://github.com/kubernetes/examples/blob/master/staging/elasticsearch/README.md

 

How to Get Secure Webhook for SUSI Bots in Kubernetes Deployment

Webhook is a user-defined callback which gets triggered by any events in code like receiving a message from a user in SUSI bot is an event. Few bots need webhook URI for callback like in SUSI Viber bot we need to define a webhook URI in the code to receive callbacks and make our Viber bot work. In this blog, we will learn how can we get an SSL activated webhook while deploying our bot to Google container using Kubernetes. We will generate SSL certificate using kube lego service that is included in kubernetes and you will define that in yaml files below. We can also generate SSL certificate using third party services like CloudFlare but by using it we will be dependant on CloudFlare so we will use kube lego.

We will start off by registering a domain first on which we will activate SSL certificate and use that domain as a webhook. Go to freenom and register your account. After logging in, register a free domain of any name and check out that order. Next, you have to set IP for DNS of this domain. To do so we will reserve an IP address in our Google cloud project with this command:

gcloud compute addresses create IPname --region us-central1

You will get a created message. To see your IP go to VPC Network -> External IP addresses. Add this IP to DNS zone of your domain and save it for later use in yaml files that we will use for deployment. Now we will deploy our bot using yaml files but before deployment, we will create a cluster

gcloud container clusters create clusterName

After creating cluster add these yaml files to your bot repository and add your IP address that you have saved above to the yamls/nginx/service.yaml file for “loadBalancerIP” parameter. Replace domain name in yamls/application/ingress-notls.yaml and yamls/application/ingress-tls.yaml with your domain name that you have registered already. Add your email ID to yamls/lego/configmap.yaml for “lego.email” parameter. Replace “image” and “env” parameters in yamls/application/deployment.yaml with your docker image and your environment variables that you are using in your code. After changing yaml files we will use this deploy script to create a deployment. Change paths for yaml files in script according to your yaml files path.

In gcloud shell run the following command to deploy an application using given configurations.

bash ./path-to-deploy-script/deploy.sh create all

This will create the deployment as we have defined in the script. The Kubernetes master creates the load balancer and related Compute Engine forwarding rules, target pools, and firewall rules to make the service fully accessible from outside of Google Cloud Platform. Wait for a few minutes for all the containers to be created and the SSL Certificates to be generated and loaded.

You have successfully created a secure webhook. Test it by opening the domain that you have registered at the start.

Resources

Enabling SSL using CloudFlare: https://jonnyjordan.com/blog/how-to-setup-cloudflare-flexible-ssl-for-wordpress/
https://www.youtube.com/watch?v=qFvwEVkl5gk

Optimising Docker Images for loklak Server

The loklak server is in a process of moving to Kubernetes. In order to do so, we needed to have different Docker images that suit these deployments. In this blog post, I will be discussing the process through which I optimised the size of Docker image for these deployments.

Initial Image

The image that I started with used Ubuntu as base. It installed all the components needed and then modified the configurations as required –

FROM ubuntu:latest

# Env Vars
ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
ENV DEBIAN_FRONTEND noninteractive

WORKDIR /loklak_server

RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y git openjdk-8-jdk
RUN git clone https://github.com/loklak/loklak_server.git /loklak_server
RUN git checkout development
RUN ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport
RUN sed -i.bak 's/^\(port.http=\).*/\180/' conf/config.properties
... # More configurations
RUN echo "while true; do sleep 10;done" >> bin/start.sh

# Start
CMD ["bin/start.sh", "-Idn"]

The size of images built using this Dockerfile was quite huge –

REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE

loklak_server       latest              a92f506b360d        About a minute ago   1.114 GB

ubuntu              latest              ccc7a11d65b1        3 days ago           120.1 MB

But since this size is not acceptable, we needed to reduce it.

Moving to Apline

Alpine Linux is an extremely lightweight Linux distro, built mainly for the container environment. Its size is so tiny that it hardly puts any impact on the overall size of images. So, I replaced Ubuntu with Alpine –

FROM alpine:latest

...
RUN apk update
RUN apk add git openjdk8 bash
...

And now we had much smaller images –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      668.8 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

As we can see that due to no caching and small size of Alpine, the image size is reduced to almost half the original.

Reducing Content Size

There are many things in a project which are no longer needed while running the project, like the .git folder (which is huge in case of loklak) –

$ du -sh loklak_server/.git
236M loklak_server/.git

We can remove such files from the Docker image and save a lot of space –

rm -rf .[^.] .??*

Optimizing Number of Layers

The number of layers also affect the size of the image. More the number of layers, more will be the size of image. In the Dockerfile, we can club together the RUN commands for lower number of images.

RUN apk update && apk add openjdk8 git bash && \
  git clone https://github.com/loklak/loklak_server.git /loklak_server && \
  ...

After this, the effective size is again reduced by a major factor –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      422.3 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

Conclusion

In this blog post, I discussed the process of optimising the size of Dockerfile for Kubernetes deployments of loklak server. The size was reduced to 426 MB from 1.234 GB and this provided much faster push/pull time for Docker images, and therefore, faster updates for Kubernetes deployments.

Resources

How to Deploy Node js App to Google Container Engine Using Kubernetes

There are many ways to host node js apps. A popular way is to host your app on Heroku. We can also deploy our app to Google cloud platform on container engine using Kubernetes. In this blog, we will learn how to deploy node js app on container engine with kubernetes. There are many resources on the web but we will use YAML files to create a deployment and to build docker image we will not use Google container registry (GCR) as it will cost us more for the deployment. To deploy we will start by creating an account on Google cloud and you can get a free tier of Google cloud platform worth 300$ for 12 months. After creating account create a project with any name of your choice. Enable Google cloud shell from an icon on right top.

We will also need a docker image of our app for deployment. To create a docker image first create an account on https://www.docker.com and create a repository with any name on it. Now, we will add docker file into our repository so that we can build docker image. Dockerfile will contain this code:

FROM node:boron
# Create app directory
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
# Install app dependencies
COPY package.json /usr/src/app
RUN npm install
# Bundle app source
COPY . /usr/src/app
EXPOSE 8080
CMD [ "npm", "start" ]

You can see an example of docker file in SUSI telegram repository. After pushing docker file to your repository we will now build docker image of the app. In Google cloud shell clone, your repository with git clone {your-repository-link here }and change your current directory to the cloned app. We will use –no-cache -t arguments as described above it will be better for building docker image and it will use fewer resources. Run these two commands to build docker image and pushing it to your docker hub.

docker build --no-cache -t {your-docker-username-here}/{repository-name-on-docker here} .

docker push {your-docker-username-here}/{repository-name-on-docker here}


We have successfully created docker image for our app. Now we will deploy our app to container engine using this image. To deploy it we will use configuration files. Add a yaml folder in your repository and we will add four files into it now. In the first file, we will specify the namespace and name it as 00-namespace.yml It will contain following code:

apiVersion: v1
kind: Namespace
metadata:
 name: web

In second file we will configure our namespace that we specified and name it as configmap.yml It will contain following code:

apiVersion: v1
metadata:
 name:{name-of-your-deployment-here}
 namespace: web
kind: ConfigMap

In third file, we will define our deployment and name it as deployment.yml It will contain following code:

kind: Deployment
apiVersion: apps/v1beta1
metadata:
 name: {name-of-your-deployment-here}
 namespace: web
spec:
 replicas: 1
 template:
   metadata:
     labels:
       app: {name-of-your-deployment-here}
   spec:
     containers:
     - name: {name-of-your-deployment-here}
       image: {your-docker-username-here}/{repository-name-on-docker here}:latest
       ports:
       - containerPort: 8080
         protocol: TCP
       envFrom:
       - configMapRef:
           name: {name-of-your-deployment-here}
     restartPolicy: Always

In fourth file, we will define service for our deployment and name it as service.yml It will contain following code:

kind: Service
apiVersion: v1
metadata:
 name: {name-of-your-deployment-here}
 namespace: web
spec:
 ports:
 - port: 8080
   protocol: TCP
   targetPort: 8080
 selector:
   app: {name-of-your-deployment-here}
 type: LoadBalancer

After adding these files to repository we will now deploy our app to a cluster on container engine. Go to Google cloud shell and run the following commands:

gcloud config set compute/zone us-central1-b

This will set zone for our cluster.

gcloud container clusters create {name-your-cluster-here}

Now update in your repository with git pull as we have added new files to it. Run the following command to make your deployment and to see it:

kubectl create -R -f ./yamls

This will create our deployment.

kubectl get services --namespace=web


With above command, you will get external IP for your app and open that IP in your browser with the port. You will see your app.

kubectl get deployments --namespace=web

Run this command if available is 1 then it means your deployment is running the file.

You have successfully deployed your app to container engine using kubernetes.

Resources

Tutorial on Google cloud: https://cloud.google.com/container-engine/docs/tutorials/hello-node
Tutorial by Jatin Shridhar: https://www.sitepoint.com/kubernetes-deploy-node-js-docker-app/

Auto Deploying loklak Server on Google Cloud Using Travis

This is a setup for loklak server which want to check in only the source files, but have the development branch in Kubernetes deployment automatically updated with some compiled output every time the push using details from Travis build.

How to achieve it?

Unix commands and shell script is one of the best option to automate all deployment and build activities. I explored Kubernetes Gcloud which can be accessed through unix command.

1.Checking for Travis build details before deployment:

Firstly check whether the repository is loklak_server, pull request is available and branches are either master or development, and then decide to update the docker image or not. The code of the aforementioned things is as follows:

if [ "$TRAVIS_REPO_SLUG" != "loklak/loklak_server" ]; then
    echo "Skipping image update for repo $TRAVIS_REPO_SLUG"
    exit 0
fi

if [ "$TRAVIS_PULL_REQUEST" != "false" ]; then
    echo "Skipping image update for pull request"
    exit 0
fi

if [ "$TRAVIS_BRANCH" != "master" ] && [ "$TRAVIS_BRANCH" != "development" ]; then
    echo "Skipping image update for branch $TRAVIS_BRANCH"
    exit 0
fi

2. Setting up Tag and Decrypting the credentials:

For the Kubernetes deployment, each time the travis build is successful, it takes the commit details from travis and appended into tag details for deployment and gcloud credentials is decrypted from the json file.

openssl aes-256-cbc -K $encrypted_48d01dc243a6_key -iv $encrypted_48d01dc243a6_iv  -in kubernetes/gcloud-credentials.json.enc -out kubernetes/gcloud-credentials.json -d

3. Install, Authenticate and Configure GCloud details with Kubernetes:

In this step, initially Google Cloud SDK should be installed with Kubernetes-

curl https://sdk.cloud.google.com | bash > /dev/null
source ~/google-cloud-sdk/path.bash.inc
gcloud components install kubectl

 

Then, Authenticate Google Cloud using the above mentioned decrypted credentials and finally configure the Google Cloud with the details like zone, project name, cluster details, number of nodes etc.

4. Update the Kubernetes deployment:

Since, in this issue it is specific to the loklak_server/development branch, so in here it checks if the branch is development or not and then updates the deployment using following command:

if [ $TRAVIS_BRANCH == "development" ]; then
    kubectl set image deployment/server --namespace=web server=$TAG
fi

 

Conclusion:

In this post, how to write a script in such a way that with each successful push after travis build how to update the deployment on Kubernetes GCloud.

Resources:

Persistently Storing loklak Server Dumps on Kubernetes

In an earlier blog post, I discussed loklak setup on Kubernetes. The deployment mentioned in the post was to test the development branch. Next, we needed to have a deployment where all the messages are collected and dumped in text files that can be reused.

In this blog post, I will be discussing the challenges with such deployment and the approach to tackle them.

Volatile Disk in Kubernetes

The pods that hold deployments in Kubernetes have disk storage. Any data that gets written by the application stays only until the same version of deployment is running. As soon as the deployment is updated/relocated, the data stored during the application is cleaned up.


Due to this, dumps are written when loklak is running but they get wiped out when the deployment image is updated. In other words, all dumps are lost when the image updates. We needed to find a solution to this as we needed a permanent storage when collecting dumps.

Persistent Disk

In order to have a storage which can hold data permanently, we can mount persistent disk(s) on a pod at the appropriate location. This ensures that the data that is important to us stays with us, even
when the deployment goes down.


In order to add persistent disks, we first need to create a persistent disk. On Google Cloud Platform, we can use the gcloud CLI to create disks in a given region –

gcloud compute disks create --size=<required size> --zone=<same as cluster zone> <unique disk name>

After this, we can mount it on a Docker volume defined in Kubernetes configurations –

      ...
      volumeMounts:
        - mountPath: /path/to/mount
          name: volume-name
  volumes:
    - name: volume-name
      gcePersistentDisk:
        pdName: disk-name
        fsType: fileSystemType

But this setup can’t be used for storing loklak dumps. Let’s see “why” in the next section.

Rolling Updates and Persistent Disk

The Kubernetes deployment needs to be updated when the master branch of loklak server is updated. This update of master deployment would create a new pod and try to start loklak server on it. During all this, the older deployment would also be running and serving the requests.


The control will not be transferred to the newer pod until it is ready and all the probes are passing. The newer deployment will now try to mount the disk which is mentioned in the configuration, but it would fail to do so. This would happen because the older pod has already mounted the disk.


Therefore, all new deployments would simply fail to start due to insufficient resources. To overcome such issues, Kubernetes allows persistent volume claims. Let’s see how we used them for loklak deployment.

Persistent Volume Claims

Kubernetes provides Persistent Volume Claims which claim resources (storage) from a Persistent Volume (just like a pod does from a node). The higher level APIs are provided by Kubernetes (configurations and kubectl command line). In the loklak deployment, the persistent volume is a Google Compute Engine disk –

apiVersion: v1
kind: PersistentVolume
metadata:
  name: dump
  namespace: web
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: slow
  gcePersistentDisk:
    pdName: "data-dump-disk"
    fsType: "ext4"

[SOURCE]

It must be noted here that a persistent disk by the name of data-dump-index is already created in the same region.


The storage class defines the way in which the PV should be handled, along with the provisioner for the service –

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: slow
  namespace: web
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  zone: us-central1-a

[SOURCE]

After having the StorageClass and PersistentVolume, we can create a claim for the volume by using appropriate configurations –

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dump
  namespace: web
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: slow

[SOURCE]

After this, we can mount this claim on our Deployment –

  ...
  volumeMounts:
    - name: dump
      mountPath: /loklak_server/data
volumes:
  - name: dump
    persistentVolumeClaim:
      claimName: dump

[SOURCE]

Verifying persistence of Dumps

To verify this, we can redeploy the cluster using the same persistent disk and check if the earlier dumps are still present there –

$ http http://link.to.deployment/dump/
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html;charset=utf-8
...


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<h1>Index of /dump</h1>
<pre>      Name 
[gz ] <a href="messages_20170802_71562040.txt.gz">messages_20170802_71562040.txt.gz</a>   Thu Aug 03 00:07:21 GMT 2017   132M
[gz ] <a href="messages_20170803_69925009.txt.gz">messages_20170803_69925009.txt.gz</a>   Mon Aug 07 15:40:04 GMT 2017   532M
[gz ] <a href="messages_20170807_36357603.txt.gz">messages_20170807_36357603.txt.gz</a>   Wed Aug 09 10:26:24 GMT 2017   377M
[txt] <a href="messages_20170809_27974404.txt">messages_20170809_27974404.txt</a>      Thu Aug 10 08:51:49 GMT 2017  1564M
<hr></pre>
...

Conclusion

In this blog post, I discussed the process of deployment of loklak with persistent dumps on Kubernetes. This deployment is intended to work as root.loklak.org in near future. The changes were proposed in loklak/loklak_server#1377 by @singhpratyush (me).

Resources

Using Mosquitto as a Message Broker for MQTT in loklak Server

In loklak server, messages are collected from various sources and indexed using Elasticsearch. To know when a message of interest arrives, users can poll the search endpoint. But this method would require a lot of HTTP requests, most of them being redundant. Also, if a user would like to collect messages for a particular topic, he would need to make a lot of requests over a period of time to get enough data.

For GSoC 2017, my proposal was to introduce stream API in the loklak server so that we could save ourselves from making too many requests and also add many use cases.

Mosquitto is Eclipse’s project which acts as a message broker for the popular MQTT protocol. MQTT, based on the pub-sub model, is a lightweight and IOT friendly protocol. In this blog post, I will discuss the basic setup of Mosquitto in the loklak server.

Installation and Dependency for Mosquitto

The installation process of Mosquitto is very simple. For Ubuntu, it is available from the pre installed PPAs –

sudo apt-get install mosquitto

Once the message broker is up and running, we can use the clients to connect to it and publish/subscribe to channels. To add MQTT client as a project dependency, we can introduce following line in Gradle dependencies file –

compile group: 'net.sf.xenqtt', name: 'xenqtt', version: '0.9.5'

[SOURCE]

After this, we can use the client libraries in the server code base.

The MQTTPublisher Class

The MQTTPublisher class in loklak would provide an interface to perform basic operations in MQTT. The implementation uses AsyncClientListener to connect to Mosquitto broker –

AsyncClientListener listener = new AsyncClientListener() {
    // Override methods according to needs
};

[SOURCE]

The publish method for the class can be used by other components of the project to publish messages on the desired channel –

public void publish(String channel, String message) {
    this.mqttClient.publish(new PublishMessage(channel, QoS.AT_LEAST_ONCE, message));
}

[SOURCE]

We also have methods which allow publishing of multiple messages to multiple channels in order to increase the functionality of the class.

Starting Publisher with Server

The flags which signal using of streaming service in loklak are located in conf/config.properties. These configurations are referred while initializing the Data Access Object and an MQTTPublisher is created if needed –

String mqttAddress = getConfig("stream.mqtt.address", "tcp://127.0.0.1:1883");
streamEnabled = getConfig("stream.enabled", false);
if (streamEnabled) {
    mqttPublisher = new MQTTPublisher(mqttAddress);
}

[SOURCE]

The mqttPublisher can now be used by other components of loklak to publish messages to the channel they want.

Adding Mosquitto to Kubernetes

Since loklak has also a nice Kubernetes setup, it was very simple to introduce a new deployment for Mosquitto to it.

Changes in Dockerfile

The Dockerfile for master deployment has to be modified to discover Mosquitto broker in the Kubernetes cluster. For this purpose, corresponding flags in config.properties have to be changed to ensure that things work fine –

sed -i.bak 's/^\(stream.enabled\).*/\1=true/' conf/config.properties && \
sed -i.bak 's/^\(stream.mqtt.address\).*/\1=mosquitto.mqtt:1883/' conf/config.properties && \

[SOURCE]

The Mosquitto broker would be available at mosquitto.mqtt:1883 because of the service that is created for it (explained in later section).

Mosquitto Deployment

The Docker image used in Kubernetes deployment of Mosquitto is taken from toke/docker-kubernetes. Two ports are exposed for the cluster but no volumes are needed –

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mosquitto
  namespace: mqtt
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: mosquitto
        image: toke/mosquitto
        ports:
        - containerPort: 9001
        - containerPort: 8883

[SOURCE]

Exposing Mosquitto to the Cluster

Now that we have the deployment running, we need to expose the required ports to the cluster so that other components may use it. The port 9001 appears as port 80 for the service and 1883 is also exposed –

apiVersion: v1
kind: Service
metadata:
  name: mosquitto
  namespace: mqtt
  ...
spec:
  ...
  ports:
  - name: mosquitto
    port: 1883
  - name: mosquitto-web
    port: 80
    targetPort: 9001

[SOURCE]

After creating the service using this configuration, we will be able to connect our clients to Mosquitto at address mosquitto.mqtt:1883.

Conclusion

In this blog post, I discussed the process of adding Mosquitto to the loklak server project. This is the first step towards introducing the stream API for messages collected in loklak.

These changes were introduced in pull requests loklak/loklak_server#1393 and loklak/loklak_server#1398 by @singhpratyush (me).

Resources

Deploying loklak Server on Kubernetes with External Elasticsearch

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

kubernetes.io

Kubernetes is an awesome cloud platform, which ensures that cloud applications run reliably. It runs automated tests, flawless updates, smart roll out and rollbacks, simple scaling and a lot more.

So as a part of GSoC, I worked on taking the loklak server to Kubernetes on Google Cloud Platform. In this blog post, I will be discussing the approach followed to deploy development branch of loklak on Kubernetes.

New Docker Image

Since Kubernetes deployments work on Docker images, we needed one for the loklak project. The existing image would not be up to the mark for Kubernetes as it contained the declaration of volumes and exposing of ports. So I wrote a new Docker image which could be used in Kubernetes.

The image would simply clone loklak server, build the project and trigger the server as CMD

FROM alpine:latest

ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

WORKDIR /loklak_server

RUN apk update && apk add openjdk8 git bash && \
    git clone https://github.com/loklak/loklak_server.git /loklak_server && \
    git checkout development && \
    ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport && \
    # Some Configurations and Cleanups

CMD ["bin/start.sh", "-Idn"]

[SOURCE]

This image wouldn’t have any volumes or exposed ports and we are now free to configure them in the configuration files (discussed in a later section).

Building and Pushing Docker Image using Travis

To automatically build and push on a commit to the master branch, Travis build is used. In the after_success section, a call to push Docker image is made.

Travis environment variables hold the username and password for Docker hub and are used for logging in –

docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD

[SOURCE]

We needed checks there to ensure that we are on the right branch for the push and we are not handling a pull request –

# Build and push Kubernetes Docker image
KUBERNETES_BRANCH=loklak/loklak_server:latest-kubernetes-$TRAVIS_BRANCH
KUBERNETES_COMMIT=loklak/loklak_server:kubernetes-$TRAVIS_COMMIT
  
if [ "$TRAVIS_BRANCH" == "development" ]; then
    docker build -t loklak_server_kubernetes kubernetes/images/development
    docker tag loklak_server_kubernetes $KUBERNETES_BRANCH
    docker push $KUBERNETES_BRANCH
    docker tag $KUBERNETES_BRANCH $KUBERNETES_COMMIT
    docker push $KUBERNETES_COMMIT
elif [ "$TRAVIS_BRANCH" == "master" ]; then
    # Build and push master
else
    echo "Skipping Kubernetes image push for branch $TRAVIS_BRANCH"
fi

[SOURCE]

Kubernetes Configurations for loklak

Kubernetes cluster can completely be configured using configurations written in YAML format. The deployment of loklak uses the previously built image. Initially, the image tagged as latest-kubernetes-development is used –

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: server
  namespace: web
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: server
    spec:
      containers:
      - name: server
        image: loklak/loklak_server:latest-kubernetes-development
        ...

[SOURCE]

Readiness and Liveness Probes

Probes act as the top level tester for the health of a deployment in Kubernetes. The probes are performed periodically to ensure that things are working fine and appropriate steps are taken if they fail.

When a new image is updated, the older pod still runs and servers the requests. It is replaced by the new ones only when the probes are successful, otherwise, the update is rolled back.

In loklak, the /api/status.json endpoint gives information about status of deployment and hence is a good target for probes –

livenessProbe:
  httpGet:
    path: /api/status.json
    port: 80
  initialDelaySeconds: 30
  timeoutSeconds: 3
readinessProbe:
  httpGet:
    path: /api/status.json
    port: 80
  initialDelaySeconds: 30
  timeoutSeconds: 3

[SOURCE]

These probes are performed periodically and the server is restarted if they fail (non-success HTTP status code or takes more than 3 seconds).

Ports and Volumes

In the configurations, port 80 is exposed as this is where Jetty serves inside loklak –

ports:
- containerPort: 80
  protocol: TCP

[SOURCE]

If we notice, this is the port that we used for running the probes. Since the development branch deployment holds no dumps, we didn’t need to specify any explicit volumes for persistence.

Load Balancer Service

While creating the configurations, a new public IP is assigned to the deployment using Google Cloud Platform’s load balancer. It starts listening on port 80 –

ports:
- containerPort: 80
  protocol: TCP

[SOURCE]

Since this service creates a new public IP, it is recommended not to replace/recreate this services as this would result in the creation of new public IP. Other components can be updated individually.

Kubernetes Configurations for Elasticsearch

To maintain a persistent index, this deployment would require an external Elasticsearch cluster. loklak is able to connect itself to external Elasticsearch cluster by changing a few configurations.

Docker Image and Environment Variables

The image used for Elasticsearch is taken from pires/docker-elasticsearch-kubernetes. It allows easy configuration of properties from environment variables in configurations. Here is a list of configurable variables, but we needed just a few of them to do our task –

image: quay.io/pires/docker-elasticsearch-kubernetes:2.0.0
env:
- name: KUBERNETES_CA_CERTIFICATE_FILE
  value: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- name: NAMESPACE
  valueFrom:
    fieldRef:
      fieldPath: metadata.namespace
- name: "CLUSTER_NAME"
  value: "loklakcluster"
- name: "DISCOVERY_SERVICE"
  value: "elasticsearch"
- name: NODE_MASTER
  value: "true"
- name: NODE_DATA
  value: "true"
- name: HTTP_ENABLE
  value: "true"

[SOURCE]

Persistent Index using Persistent Cloud Disk

To make the index last even after the deployment is stopped, we needed a stable place where we could store all that data. Here, Google Compute Engine’s standard persistent disk was used. The disk can be created using GCP web portal or the gcloud CLI.

Before attaching the disk, we need to declare a volume where we could mount it –

volumeMounts:
- mountPath: /data
  name: storage

[SOURCE]

Now that we have a volume, we can simply mount the persistent disk on it –

volumes:
- name: storage
  gcePersistentDisk:
    pdName: data-index-disk
    fsType: ext4

[SOURCE]

Now, whenever we deploy these configurations, we can reuse the previous index.

Exposing Kubernetes to Cluster

The HTTP and transport clients are enabled on port 9200 and 9300 respectively. They can be exposed to the rest of the cluster using the following service –

apiVersion: v1
kind: Service
...
Spec:
  ...
  ports:
  - name: http
    port: 9200
    protocol: TCP
  - name: transport
    port: 9300
    protocol: TCP

[SOURCE]

Once deployed, other deployments can access the cluster API from ports 9200 and 9300.

Connecting loklak to Kubernetes

To connect loklak to external Elasticsearch cluster, TransportClient Java API is used. In order to enable these settings, we simply need to make some changes in configurations.

Since we enable the service named “elasticsearch” in namespace “elasticsearch”, we can access the cluster at address elasticsearch.elasticsearch:9200 (web) and elasticsearch.elasticsearch:9300 (transport).

To confine these changes only to Kubernetes deployment, we can use sed command while building the image (in Dockerfile) –

sed -i.bak 's/^\(elasticsearch_transport.enabled\).*/\1=true/' conf/config.properties && \
sed -i.bak 's/^\(elasticsearch_transport.addresses\).*/\1=elasticsearch.elasticsearch:9300/' conf/config.properties && \

[SOURCE]

Now when we create the deployments in Kubernetes cluster, loklak auto connects to the external elasticsearch index and creates indices if needed.

Verifying persistence of the Elasticsearch Index

In order to see that the data persists, we can completely delete the deployment or even the cluster if we want. Later, when we recreate the deployment, we can see all the messages already present in the index.

I  [2017-07-29 09:42:51,804][INFO ][node                     ] [Hellion] initializing ...
 
I  [2017-07-29 09:42:52,024][INFO ][plugins                  ] [Hellion] loaded [cloud-kubernetes], sites []
 
I  [2017-07-29 09:42:52,055][INFO ][env                      ] [Hellion] using [1] data paths, mounts [[/data (/dev/sdb)]], net usable_space [84.9gb], net total_space [97.9gb], spins? [possibly], types [ext4]
 
I  [2017-07-29 09:42:53,543][INFO ][node                     ] [Hellion] initialized
 
I  [2017-07-29 09:42:53,543][INFO ][node                     ] [Hellion] starting ...
 
I  [2017-07-29 09:42:53,620][INFO ][transport                ] [Hellion] publish_address {10.8.1.13:9300}, bound_addresses {10.8.1.13:9300}
 
I  [2017-07-29 09:42:53,633][INFO ][discovery                ] [Hellion] loklakcluster/cJtXERHETKutq7nujluJvA
 
I  [2017-07-29 09:42:57,866][INFO ][cluster.service          ] [Hellion] new_master {Hellion}{cJtXERHETKutq7nujluJvA}{10.8.1.13}{10.8.1.13:9300}{master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
 
I  [2017-07-29 09:42:57,955][INFO ][http                     ] [Hellion] publish_address {10.8.1.13:9200}, bound_addresses {10.8.1.13:9200}
 
I  [2017-07-29 09:42:57,955][INFO ][node                     ] [Hellion] started
 
I  [2017-07-29 09:42:58,082][INFO ][gateway                  ] [Hellion] recovered [8] indices into cluster_state

In the last line from the logs, we can see that indices already present on the disk were recovered. Now if we head to the public IP assigned to the cluster, we can see that the message count is restored.

Conclusion

In this blog post, I discussed how we utilised the Kubernetes setup to shift loklak to Google Cloud Platform. The deployment is active and can be accessed from the link provided under wiki section of loklak/loklak_server repo.

I introduced these changes in pull request loklak/loklak_server#1349 with the help of @niranjan94, @uday96 and @chiragw15.

Resources

Auto Deployment of SUSI Server using Kubernetes on Google Cloud Platform

Recently, we auto deployed SUSI Server on Google Cloud Platform using Kubernetes and Docker Images after each commit in the GitHub repo with the help of Travis Continuous Integration. So, basically, whenever a new commit is added to the repo, during the Travis build, we build the docker image of the server and then use it to deploy the server on Google Cloud Platform. We use Kubernetes for deployment since it is very easy to scale up the Project when traffic on the server is increased and Docker because using it we can easily build docker images which then can be used to update the deployment. This schematic will make things more clear what exactly is the procedure.

Prerequisites

  1. You must be signed in to your Google Cloud Console and have enabled billing and must have credits left in your account.
  2. You must have a docker account and a repo in it. If you don’t have one, make it now.
  3. You should have enabled Travis on your repo and have a Travis.yml file in your repo.
  4. You must already have a project in Google Cloud Console. Make a new one if you don’t have.

Pre Deployment Steps

You will be needed to do some work on Google Cloud Platform before actually starting the auto deployment process. Those are:

  1. Creating a new Cluster.
  2. Adding and Formatting Persistence Disk
  3. Adding a Persistent Volume CLaim (PVC)
  4. Labeling a node as primary.

Check out this documentation on how to do that. It may help.

Implementation

Img src: https://cloud.google.com/solutions/continuous-delivery-with-travis-ci

1. The first step is simply to add this line in Travis.yml file and create an empty deploy.sh, file mentioned below.

after_success:
- bash kubernetes/travis/deploy.sh

Now we’ll be moving line by line and adding commands in the empty deploy.sh file that we created in the previous step.

2. Next step is to remove obsolete Google Cloud files and install Google Cloud SDK and kubectl command. Use following lines to do that.

echo ">>> Removing obsolete gcoud files"
sudo rm -f /usr/bin/git-credential-gcloud.sh
sudo rm -f /usr/bin/bq
sudo rm -f /usr/bin/gsutil
sudo rm -f /usr/bin/gcloud

echo ">>> Installing new files"
curl https://sdk.cloud.google.com | bash;
source ~/.bashrc
gcloud components install kubectl

3. In this step you will be needed to download a JSON file which contains your Google Cloud Credentials, then copy that file to your repo and encrypt it using Travis encryption keys. Follow https://youtu.be/7U4jjRw_AJk this video to see how to do that.

4. So, now you have added your encrypted credentials.json files in your repo and now you need to use those credentials to login into your google cloud account. So, use below lines to do that.

echo ">>> Decrypting credentials and authenticating gcloud account"
# Decrypt the credentials we added to the repo using the key we added with the Travis command line tool
openssl aes-256-cbc -K $encrypted_YOUR_key -iv $encrypted_YOUR_iv -in ./kubernetes/travis/Credentials.json.enc -out Credentials.json -d
gcloud auth activate-service-account --key-file Credentials.json
export GOOGLE_APPLICATION_CREDENTIALS=$(pwd)/Credentials.json
#add gcoud project id
gcloud config set project YOUR_PROJECT_ID
gcloud container clusters get-credentials YOUR_CONTAINER

The above lines of code first decrypt your credentials, then login into your account and set the project you already created earlier.

5. Now, we have logged into Google Cloud, we need to build docker image from a dockerfile. Follow official docker docs to see how to write a dockerfile. Here is an example of dockerfile. You will need to add “$DOCKER_USERNAME” and “$DOCKER_PASSWORD” as environment variables in Travis settings of your repo.

echo ">>> Building Docker image"
cd kubernetes/images

docker build --no-cache -t YOUR_DOCKER_USERNAME/YOUR_DOCKER_REPO:$TRAVIS_COMMIT .
docker login -u="$DOCKER_USERNAME" -p="$DOCKER_PASSWORD"
docker tag YOUR_DOCKER_USERNAME/YOUR_DOCKER_REPO:$TRAVIS_COMMIT YOUR_DOCKER_USERNAME/YOUR_DOCKER_REPO:latest

6. Now, just push the docker image created in previous step and update the deployment.

echo ">>> Pushing docker image"
docker push YOUR_DOCKER_USERNAME/YOUR_DOCKER_REPO

echo ">>> Updating deployment"
kubectl set image deployment/YOUR_CONTAINER_NAME --namespace=default YOUR_CONTAINER_NAME=YOUR_DOCKER_USERNAME/YOUR_DOCKER_REPO:$TRAVIS_COMMIT

Summary

This blog was about how we have configured travis build and auto deployed SUSI Server on Google Cloud Platform using Kubernetes and Docker. You can do the same with your server too or if you are looking to contribute to SUSI Server, this may help you a little in understanding the code of the repo.

Resources

  1. The documentation for setting up your project on Google CLoud Console before starting auto deployment https://github.com/fossasia/susi_server/blob/afb00cd9c421876f5d640ce87941e502aa52e004/docs/installation/installation_kubernetes_gcloud.md
  2. The documentation for encrypting your google cloud credentials and adding them to your repo https://cloud.google.com/solutions/continuous-delivery-with-travis-ci
  3. Docs for Docker to get you started with Docker https://docs.docker.com/
  4. Travis Documentation on how to secure your credentials https://docs.travis-ci.com/user/encryption-keys/
  5. Travis Documentation on how to add environment variables in your repo settings https://docs.travis-ci.com/user/environment-variables/

Utilizing Readiness Probes for loklak Dependencies in Kubernetes

When we use any application and fail to connect to it, we do not give up and retry connecting to it again and again. But in the reality we often face this kind of obstacles like application that break instantly or when connecting to an API or database that is not ready yet, the app gets upset and refuses to continue to work. So, something similar to this was happening with api.loklak.org.
In such cases we can’t really re-write the whole application again every time the problem occurs.So for this we need to define dependencies of some kind that can handle the situation rather than disappointing the users of loklak app.

Solution:

We will just wait until a dependent API or backend of loklak is ready and then only start the loklak app. For this to be done, we used Kubernetes Health Checks.

Kubernetes health checks are divided into liveness and readiness probes.
The purpose of liveness probes are to indicate that your application is running.
Readiness probes are meant to check if your application is ready to serve traffic.
The right combination of liveness and readiness probes used with Kubernetes deployments can:
Enable zero downtime deploys
Prevent deployment of broken images
Ensure that failed containers are automatically restarted

Pod Ready to be Used?

A Pod with defined readiness probe won’t receive any traffic until a defined request can be successfully fulfilled. This health checks are defined through the Kubernetes, so we don’t need any changes to be made in our services (APIs). We just need to setup a readiness probe for the APIs that loklak server is depending on. Here you can see the relevant part of the container spec you need to add (in this example we want to know when loklak is ready):

        readinessProbe:
          httpGet:
            path: /api/status.json
            port: 80
          initialDelaySeconds: 30
          timeoutSeconds: 3

 

Readiness Probes

Updating deployments of loklak when something pushed into development without readiness probes can result in downtime as old pods are replaced by new pods in case of Kubernetes deployment. If the new pods are misconfigured or somehow broken, that downtime extends until you detect the problem and rollback.
With readiness probes, Kubernetes will not send traffic to a pod until the readiness probe is successful. When updating a loklak deployment, it will also leave old one’s running until probes have been successful on new copy. That means that if loklak server new pods are broken in some way, they’ll never see traffic, instead old pods of loklak server will continue to serve all traffic for the deployment.

Conclusion

Readiness probes is a simple solution to ensure that Pods with dependencies do not get started before their dependencies are ready (in this case for loklak server). This also works with more than one dependency.

Resources