What is Open Source and why you should do it?

Since Codeheat is going on and Google Code-in has started, I would like to share some knowledge with the new contributors with the help of this blog.

What is an Open Source software?

When googled, you will see:

“Open-source software is computer software with its source code made available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose.”

To put it in layman terms, “A software whose source code is made available to everyone to let them change/improve provided that the contributor who changes the code cannot claim the software to be his own.”

Thus, you don’t own the software thoroughly. All you can do is change the code of the software to make it better. Now, you may be thinking what’s there in for you? There are all pros according to me and I have explained them in the latter half of this article.

Why am I writing this?

I was just in the freshman’s year of my college when I came to know about the web and how it works. I started my journey as a developer, building things, started doing some projects and keeping it with myself. Those days,  exploring more, I first came to know about the Open Source software.

Curiously, wanting to know more about the same, I got to know that anyone can make his/her software Open so as to make it available to others for use and development. Thus, learning more about the same led me to explore other’s projects on GitHub and I went through the codebases of the softwares and started contributing. I remember my first contribution was to correct a “typo” i.e correcting a spelling mistake in the README of the project. That said, I went on exploring more and more and got my hands on Open Source which made me share some of my thoughts with you.

What’s there in for you doing Open Source Contribution?

1) Teaches you how to structure code:

Now a days, nearly many of the software projects are Open Sourced and the community of developer works on the projects to constantly improve them. Thus, big projects have big codebases too which are really hard to understand at first but after giving some time to understand and contribute, you will be fine with those. The thing with such projects is they have a structured code, by “structured”, I mean to say there are strict guidelines for the project i.e they have good tests written which make you write the code as they want, i.e clean and readable. Thus, by writing such code, you will learn how to structure it which ultimately is a great habit that every developer should practice.

2) Team Work:

Creating and maintaining a large project requires team work. When you contribute to a project, you have to work in a team where you have to take others opinions, give your opinions, ask teammates for improvisations or ask anything whichever you are stuck with. Thus, working in team increases productivity, community interaction, your own network, etc.

3) Improves the developer you:

Okay, so I think, one of the most important part of your developer journey is and should be “LEARNING ALWAYS”. Thus, when you contribute, your code is reviewed by others (experts or maintainers of project) who eventually point out the mistakes or the improvisations to be done in the code so that the code can be written much cleaner than you had written. Also, you start to think a problem widely. While solving the problem, you ensure that the code you have written makes the app scalable for a large number of users, also prolonging the life of code.

4) Increases your Network:

One advantage of Open Source contribution is that it also increases your network in the developer community. Thus, you get to know about the things that you have never heard of, you get to explore them, you get to meet people, you get to know what is going in what parts of the world, etc. Having connections with other developers sitting in different countries is always a bonus.

5) Earn some bucks too:

At the end of the day, money matters. Earlier days, people used to think that contributing to Open Source projects won’t earn you money, etc. But if you are a maintainer or a continuous contributor of a great project, you get donations to get continuing the project and making it available to people.

For students in college, doing Open Source is a bonus. There are programmes like:

These programmes offer high incentives and stipends to the fellow students. FOSSASIA participates in GSoC so you can go ahead and try getting in GSoC under FOSSASIA.

6) Plus point for job seekers:

When it comes to applying for job, if you have a good Open Source profile, the recruiter finds a reason to take you out and offer you an interview since you already know how to “manage a project”, “work in team”, “get work done”, “solve a problem efficiently”, etc. Now a days, many companies mention on their job application page as “Open Source would be a bonus”.

7) Where can you start:

We have many projects at FOSSASIA to start with. There are no restrictions on the language since we have projects available for most of the languages.

Currently, we are having a couple of programs open at FOSSASIA. They are:

Feel free to check out the programs and the projects under FOSSASIA at https://github.com/fossasia.

Conclusion

So, yeah. This was it. Hope you understood what Open Source is and how would it benefit you. Keep contributing to FOSSASIA and you will see the effects in no time.

Continue ReadingWhat is Open Source and why you should do it?

Open Event Server: Creating/Rebuilding Elasticsearch Index From Existing Data In a PostgreSQL DB Using Python

The Elasticsearch instance in the current Open Event Server deployment is currently just used to store the events and search through it due to limited resources.

The project uses a PostgreSQL database, this blog will focus on setting up a job to create the events index if it does not exist. If the indices exists, the job will delete all the previous the data and rebuild the events index.

Although the project uses Flask framework, the job will be in pure python so that it can run in background properly while the application continues its work. Celery is used for queueing up the aforementioned jobs. For building the job the first step would be to connect to our database:

from config import Config
import psycopg2
conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
cur = conn.cursor()

 

The next step would be to fetch all the events from the database. We will only be indexing certain attributes of the event which will be useful in search. Rest of them are not stored in the index. The code given below will fetch us a collection of tuples containing the attributes mentioned in the code:

cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()

 

We will be using the the bulk API, which is significantly fast as compared to adding an event one by one via the API. Elasticsearch-py, the official python client for elasticsearch provides the necessary functionality to work with the bulk API of elasticsearch. The helpers present in the client enable us to use generator expressions to insert the data via the bulk API. The generator expression for events will be as follows:

event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)

 

We will now delete the events index if it exists. The the event index will be recreated. The generator expression obtained above will be passed to the bulk API helper and the event index will repopulated. The complete code for the function will now be as follows:

 

@celery.task(name='rebuild.events.elasticsearch')
def cron_rebuild_events_elasticsearch():
   """
   Re-inserts all eligible events into elasticsearch
   :return:
   """
   conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
   cur = conn.cursor()
   cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()
   event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)
   es_store.indices.delete('events')
   es_store.indices.create('events')
   abc = helpers.bulk(es_store, event_data)

 

Currently we run this job on each week and also on each new deployment. Rebuilding the index is very important as some records may not be indexed when the continuous sync is taking place.

To know more about it please visit https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/

Related links:

Continue ReadingOpen Event Server: Creating/Rebuilding Elasticsearch Index From Existing Data In a PostgreSQL DB Using Python

Open Event API Server: Implementing FAQ Types

In the Open Event Server, there was a long standing request of the users to enable the event organisers to create a FAQ section.

The API of the FAQ section was implemented subsequently. The FAQ API allowed the user to specify the following request schema

{
 "data": {
   "type": "faq",
   "relationships": {
     "event": {
       "data": {
         "type": "event",
         "id": "1"
       }
     }
   },
   "attributes": {
     "question": "Sample Question",
     "answer": "Sample Answer"
   }
 }
}

 

But, what if the user wanted to group certain questions under a specific category. There was no solution in the FAQ API for that. So a new API, FAQ-Types was created.

Why make a separate API for it?

Another question that arose while designing the FAQ-Types API was whether it was necessary to add a separate API for it or not. Consider that a type attribute was simply added to the FAQ API itself. It would mean the client would have to specify the type of the FAQ record every time a new record is being created for the same. This would mean trusting that the user will always enter the same spelling for questions falling under the same type. The user cannot be trusted on this front. Thus the separate API made sure that the types remain controlled and multiple entries for the same type are not there.

Helps in handling large number of records:

Another concern was what if there were a large number of FAQ records under the same FAQ-Type. Entering the type for each of those questions would be cumbersome for the user. The FAQ-Type would also overcome this problem

Following is the request schema for the FAQ-Types API

{
 "data": {
   "attributes": {
     "name": "abc"
   },
   "type": "faq-type",
   "relationships": {
     "event": {
       "data": {
         "id": "1",
         "type": "event"
       }
     }
   }
 }
}

 

Additionally:

  • FAQ to FAQ-type is a many to one relation.
  • A single FAQ can only belong to one Type
  • The FAQ-type relationship will be optional, if the user wants different sections, he/she can add it ,if not, it’s the user’s choice.

Related links

Continue ReadingOpen Event API Server: Implementing FAQ Types

Discount Codes in Open Event Server

The Open Event System allows usage of discount codes with tickets and events. This blogpost describes what types of discount codes are present and what endpoints can be used to fetch and update details.

In Open Event API Server, each event can have two types of discount codes. One is ‘event’ discount code, while the other is ‘ticket’ discount code. As the name suggests, the event discount code is an event level discount code and the ticket discount code is ticket level.

Now each event can have only one ‘event’ discount code and is accessible only to the server admin. The Open Event server admin can create, view and update the ‘event’ discount code for an event. The event discount code followsDiscountCodeEvent Schema. This schema is inherited from the parent class DiscountCodeSchemaPublic. To save the unique discount code associated with an event, the event model’s discount_code_id field is used.

The ‘ticket’ discount is accessible by the event organizer and co-organizer. Each event can have any number of ‘ticket’ discount codes. This follows the DiscountCodeTicket schema, which is also inherited from the same base class ofDiscountCodeSchemaPublic. The use of the schema is decided based on the value of the field ‘used_for’ which can have the value either ‘events’ or ‘tickets’. Both the schemas have different relationships with events and marketer respectively.

We have the following endpoints for Discount Code events and tickets:
‘/events/<int:event_id>/discount-code’
‘/events/<int:event_id>/discount-codes’

The first endpoint is based on the DiscountCodeDetail class. It returns the detail of one discount code which in this case is the event discount code associated with the event.

The second endpoint is based on the DiscountCodeList class which returns a list of discount codes associated with an event. Note that this list also includes the ‘event’ discount code, apart from all the ticket discount codes.

class DiscountCodeFactory(factory.alchemy.SQLAlchemyModelFactory):
   class Meta:
       model = DiscountCode
       sqlalchemy_session = db.session
event_id = None
user = factory.RelatedFactory(UserFactory)
user_id = 1


Since each discount code belongs to an event(either directly or through the ticket), the factory for this has event as related factory, but to check for 
/events/<int:event_id>/discount-code endpoint we first need the event and then pass the discount code id to be 1 for dredd to check this. Hence, event is not included as a related factory, but added as a different object every time a discount code object is to be used.

@hooks.before("Discount Codes > Get Discount Code Detail of an Event > Get Discount Code Detail of an Event")
def event_discount_code_get_detail(transaction):
   """
   GET /events/1/discount-code
   :param transaction:
   :return:
   """
   with stash['app'].app_context():
       discount_code = DiscountCodeFactory()
       db.session.add(discount_code)
       db.session.commit()
       event = EventFactoryBasic(discount_code_id=1)
       db.session.add(event)
       db.session.commit()


The other tests and extended documentation can be found 
here.

References:

Continue ReadingDiscount Codes in Open Event Server

Open Event Server: Getting The Identity From The Expired JWT Token In Flask-JWT

The Open Event Server uses JWT based authentication, where JWT stands for JSON Web Token. JSON Web Tokens are an open industry standard RFC 7519 method for representing claims securely between two parties. [source: https://jwt.io/]

Flask-JWT is being used for the JWT-based authentication in the project. Flask-JWT makes it easy to use JWT based authentication in flask, while on its core it still used PyJWT.

To get the identity when a JWT token is present in the request’s Authentication header , the current_identity proxy of Flask-JWT can be used as follows:

@app.route('/example')
@jwt_required()
def example():
   return '%s' % current_identity

 

Note that it will only be set in the context of function decorated by jwt_required(). The problem with the current_identity proxy when using jwt_required is that the token has to be active, the identity of an expired token cannot be fetched by this function.

So why not write a function on our own to do the same. A JWT token is divided into three segments. JSON Web Tokens consist of three parts separated by dots (.), which are:

  • Header
  • Payload
  • Signature

The first step would be to get the payload, that can be done as follows:

token_second_segment = _default_request_handler().split('.')[1]

 

The payload obtained above would still be in form of JSON, it can be converted into a dict as follows:

payload = json.loads(token_second_segment.decode('base64'))

 

The identity can now be found in the payload as payload[‘identity’]. We can get the actual user from the paylaod as follows:

def jwt_identity(payload):
   """
   Jwt helper function
   :param payload:
   :return:
   """
   return User.query.get(payload['identity'])

 

Our final function will now be something like:

def get_identity():
   """
   To be used only if identity for expired tokens is required, otherwise use current_identity from flask_jwt
   :return:
   """
   token_second_segment = _default_request_handler().split('.')[1]
   missing_padding = len(token_second_segment) % 4
   payload = json.loads(token_second_segment.decode('base64'))
   user = jwt_identity(payload)
   return user

 

But after using this function for sometime, you will notice that for certain tokens, the system will raise an error saying that the JWT token is missing padding. The JWT payload is base64 encoded, and it requires the payload string to be a multiple of four. If the string is not a multiple of four, the remaining spaces can pe padded with extra =(equal to) signs. And since Python 2.7’s .decode doesn’t do that by default, we can accomplish that as follows:

missing_padding = len(token_second_segment) % 4

# ensures the string is correctly padded to be a multiple of 4
if missing_padding != 0:
   token_second_segment += b'=' * (4 - missing_padding)

 

Related links:

Continue ReadingOpen Event Server: Getting The Identity From The Expired JWT Token In Flask-JWT

Introducing Stream Servlet in loklak Server

A major part of my GSoC proposal was adding stream API to loklak server. In a previous blog post, I discussed the addition of Mosquitto as a message broker for MQTT streaming. After testing this service for a few days and some minor improvements, I was in a position to expose the stream to outside users using a simple API.

In this blog post, I will be discussing the addition of /api/stream.json endpoint to loklak server.

HTTP Server-Sent Events

Server-sent events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection. The Server-Sent Events EventSource API is standardized as part of HTML5 by the W3C.

Wikipedia

This API is supported by all major browsers except Microsoft Edge. For loklak, the plan was to use this event system to send messages, as they arrive, to the connected users. Apart from browser support, EventSource API can also be used with many other technologies too.

Jetty Eventsource Plugin

For Java, we can use Jetty’s EventSource plugin to send events to clients. It is similar to other Jetty servlets when it comes to processing the arguments, handling requests, etc. But it provides a simple interface to send events as they occur to connected users.

Adding Dependency

To use this plugin, we can add the following line to Gradle dependencies –

compile group: 'org.eclipse.jetty', name: 'jetty-eventsource-servlet', version: '1.0.0'

[SOURCE]

The Event Source

An EventSource is the object which is required for EventSourceServlet to send events. All the logics for emitting events needs to be defined in the related class. To link a servlet with an EventSource, we need to override the newEventSource method –

public class StreamServlet extends EventSourceServlet {
    @Override
    protected EventSource newEventSource(HttpServletRequest request) {
        String channel = request.getParameter("channel");
        if (channel == null) {
            return null;
        }
        if (channel.isEmpty()) {
            return null;
        }
        return new MqttEventSource(channel);
    }
}

[SOURCE]

If no channel is provided, the EventSource object will be null and the request will be rejected. Here, the MqttEventSource would be used to handle the stream of Tweets as they arrive from the Mosquitto message broker.

Cross Site Requests

Since the requests to this endpoint can’t be of JSONP type, it is necessary to allow cross site requests on this endpoint. This can be done by overriding the doGet method of the servlet –

@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
     response.setHeader("Access-Control-Allow-Origin", "*");
    super.doGet(request, response);
}

[SOURCE]

Adding MQTT Subscriber

When a request for events arrives, the constructor to MqttEventSource is called. At this stage, we need to connect to the stream from Mosquitto for the channel. To achieve this, we can set the class as MqttCallback using appropriate client configurations –

public class MqttEventSource implements MqttCallback {
    ...
    MqttEventSource(String channel) {
        this.channel = channel;
    }
    ...
    this.mqttClient = new MqttClient(address, "loklak_server_subscriber");
    this.mqttClient.connect();
    this.mqttClient.setCallback(this);
    this.mqttClient.subscribe(this.channel);
    ...
}

[SOURCE]

By setting the callback to this, we can override the messageArrived method to handle the arrival of a new message on the channel. Just to mention, the client library used here is Eclipse Paho.

Connecting MQTT Stream to SSE Stream

Now that we have subscribed to the channel we wish to send events from, we can use the Emitter to send events from our EventSource by implementing it –

public class MqttEventSource implements EventSource, MqttCallback {
    private Emitter emitter;


    @Override
    public void onOpen(Emitter emitter) throws IOException {
        this.emitter = emitter;
        ...
    }

    @Override
    public void messageArrived(String topic, MqttMessage message) throws Exception {
        this.emitter.data(message.toString());
    }
}

[SOURCE]

Closing Stream on Disconnecting from User

When a client disconnects from the stream, it doesn’t makes sense to stay connected to the server. We can use the onClose method to disconnect the subscriber from the MQTT broker –

@Override
public void onClose() {
    try {
        this.mqttClient.close();
        this.mqttClient.disconnect();
    } catch (MqttException e) {
        // Log some warning 
    }
}

[SOURCE]

Conclusion

In this blog post, I discussed connecting the MQTT stream to SSE stream using Jetty’s EventSource plugin. Once in place, this event system would save us from making too many requests to collect and visualize data. The possibilities of applications of such feature are huge.

This feature can be seen in action at the World Mood Tracker app.

The changes were introduced in pull request loklak/loklak_server#1474 by @singhpratyush (me).

Resources

Continue ReadingIntroducing Stream Servlet in loklak Server

Optimising Docker Images for loklak Server

The loklak server is in a process of moving to Kubernetes. In order to do so, we needed to have different Docker images that suit these deployments. In this blog post, I will be discussing the process through which I optimised the size of Docker image for these deployments.

Initial Image

The image that I started with used Ubuntu as base. It installed all the components needed and then modified the configurations as required –

FROM ubuntu:latest

# Env Vars
ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
ENV DEBIAN_FRONTEND noninteractive

WORKDIR /loklak_server

RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y git openjdk-8-jdk
RUN git clone https://github.com/loklak/loklak_server.git /loklak_server
RUN git checkout development
RUN ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport
RUN sed -i.bak 's/^\(port.http=\).*/\180/' conf/config.properties
... # More configurations
RUN echo "while true; do sleep 10;done" >> bin/start.sh

# Start
CMD ["bin/start.sh", "-Idn"]

The size of images built using this Dockerfile was quite huge –

REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE

loklak_server       latest              a92f506b360d        About a minute ago   1.114 GB

ubuntu              latest              ccc7a11d65b1        3 days ago           120.1 MB

But since this size is not acceptable, we needed to reduce it.

Moving to Apline

Alpine Linux is an extremely lightweight Linux distro, built mainly for the container environment. Its size is so tiny that it hardly puts any impact on the overall size of images. So, I replaced Ubuntu with Alpine –

FROM alpine:latest

...
RUN apk update
RUN apk add git openjdk8 bash
...

And now we had much smaller images –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      668.8 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

As we can see that due to no caching and small size of Alpine, the image size is reduced to almost half the original.

Reducing Content Size

There are many things in a project which are no longer needed while running the project, like the .git folder (which is huge in case of loklak) –

$ du -sh loklak_server/.git
236M loklak_server/.git

We can remove such files from the Docker image and save a lot of space –

rm -rf .[^.] .??*

Optimizing Number of Layers

The number of layers also affect the size of the image. More the number of layers, more will be the size of image. In the Dockerfile, we can club together the RUN commands for lower number of images.

RUN apk update && apk add openjdk8 git bash && \
  git clone https://github.com/loklak/loklak_server.git /loklak_server && \
  ...

After this, the effective size is again reduced by a major factor –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      422.3 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

Conclusion

In this blog post, I discussed the process of optimising the size of Dockerfile for Kubernetes deployments of loklak server. The size was reduced to 426 MB from 1.234 GB and this provided much faster push/pull time for Docker images, and therefore, faster updates for Kubernetes deployments.

Resources

Continue ReadingOptimising Docker Images for loklak Server

Persistently Storing loklak Server Dumps on Kubernetes

In an earlier blog post, I discussed loklak setup on Kubernetes. The deployment mentioned in the post was to test the development branch. Next, we needed to have a deployment where all the messages are collected and dumped in text files that can be reused.

In this blog post, I will be discussing the challenges with such deployment and the approach to tackle them.

Volatile Disk in Kubernetes

The pods that hold deployments in Kubernetes have disk storage. Any data that gets written by the application stays only until the same version of deployment is running. As soon as the deployment is updated/relocated, the data stored during the application is cleaned up.


Due to this, dumps are written when loklak is running but they get wiped out when the deployment image is updated. In other words, all dumps are lost when the image updates. We needed to find a solution to this as we needed a permanent storage when collecting dumps.

Persistent Disk

In order to have a storage which can hold data permanently, we can mount persistent disk(s) on a pod at the appropriate location. This ensures that the data that is important to us stays with us, even
when the deployment goes down.


In order to add persistent disks, we first need to create a persistent disk. On Google Cloud Platform, we can use the gcloud CLI to create disks in a given region –

gcloud compute disks create --size=<required size> --zone=<same as cluster zone> <unique disk name>

After this, we can mount it on a Docker volume defined in Kubernetes configurations –

      ...
      volumeMounts:
        - mountPath: /path/to/mount
          name: volume-name
  volumes:
    - name: volume-name
      gcePersistentDisk:
        pdName: disk-name
        fsType: fileSystemType

But this setup can’t be used for storing loklak dumps. Let’s see “why” in the next section.

Rolling Updates and Persistent Disk

The Kubernetes deployment needs to be updated when the master branch of loklak server is updated. This update of master deployment would create a new pod and try to start loklak server on it. During all this, the older deployment would also be running and serving the requests.


The control will not be transferred to the newer pod until it is ready and all the probes are passing. The newer deployment will now try to mount the disk which is mentioned in the configuration, but it would fail to do so. This would happen because the older pod has already mounted the disk.


Therefore, all new deployments would simply fail to start due to insufficient resources. To overcome such issues, Kubernetes allows persistent volume claims. Let’s see how we used them for loklak deployment.

Persistent Volume Claims

Kubernetes provides Persistent Volume Claims which claim resources (storage) from a Persistent Volume (just like a pod does from a node). The higher level APIs are provided by Kubernetes (configurations and kubectl command line). In the loklak deployment, the persistent volume is a Google Compute Engine disk –

apiVersion: v1
kind: PersistentVolume
metadata:
  name: dump
  namespace: web
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: slow
  gcePersistentDisk:
    pdName: "data-dump-disk"
    fsType: "ext4"

[SOURCE]

It must be noted here that a persistent disk by the name of data-dump-index is already created in the same region.


The storage class defines the way in which the PV should be handled, along with the provisioner for the service –

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: slow
  namespace: web
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  zone: us-central1-a

[SOURCE]

After having the StorageClass and PersistentVolume, we can create a claim for the volume by using appropriate configurations –

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dump
  namespace: web
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: slow

[SOURCE]

After this, we can mount this claim on our Deployment –

  ...
  volumeMounts:
    - name: dump
      mountPath: /loklak_server/data
volumes:
  - name: dump
    persistentVolumeClaim:
      claimName: dump

[SOURCE]

Verifying persistence of Dumps

To verify this, we can redeploy the cluster using the same persistent disk and check if the earlier dumps are still present there –

$ http http://link.to.deployment/dump/
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html;charset=utf-8
...


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<h1>Index of /dump</h1>
<pre>      Name 
[gz ] <a href="messages_20170802_71562040.txt.gz">messages_20170802_71562040.txt.gz</a>   Thu Aug 03 00:07:21 GMT 2017   132M
[gz ] <a href="messages_20170803_69925009.txt.gz">messages_20170803_69925009.txt.gz</a>   Mon Aug 07 15:40:04 GMT 2017   532M
[gz ] <a href="messages_20170807_36357603.txt.gz">messages_20170807_36357603.txt.gz</a>   Wed Aug 09 10:26:24 GMT 2017   377M
[txt] <a href="messages_20170809_27974404.txt">messages_20170809_27974404.txt</a>      Thu Aug 10 08:51:49 GMT 2017  1564M
<hr></pre>
...

Conclusion

In this blog post, I discussed the process of deployment of loklak with persistent dumps on Kubernetes. This deployment is intended to work as root.loklak.org in near future. The changes were proposed in loklak/loklak_server#1377 by @singhpratyush (me).

Resources

Continue ReadingPersistently Storing loklak Server Dumps on Kubernetes

Using Mosquitto as a Message Broker for MQTT in loklak Server

In loklak server, messages are collected from various sources and indexed using Elasticsearch. To know when a message of interest arrives, users can poll the search endpoint. But this method would require a lot of HTTP requests, most of them being redundant. Also, if a user would like to collect messages for a particular topic, he would need to make a lot of requests over a period of time to get enough data.

For GSoC 2017, my proposal was to introduce stream API in the loklak server so that we could save ourselves from making too many requests and also add many use cases.

Mosquitto is Eclipse’s project which acts as a message broker for the popular MQTT protocol. MQTT, based on the pub-sub model, is a lightweight and IOT friendly protocol. In this blog post, I will discuss the basic setup of Mosquitto in the loklak server.

Installation and Dependency for Mosquitto

The installation process of Mosquitto is very simple. For Ubuntu, it is available from the pre installed PPAs –

sudo apt-get install mosquitto

Once the message broker is up and running, we can use the clients to connect to it and publish/subscribe to channels. To add MQTT client as a project dependency, we can introduce following line in Gradle dependencies file –

compile group: 'net.sf.xenqtt', name: 'xenqtt', version: '0.9.5'

[SOURCE]

After this, we can use the client libraries in the server code base.

The MQTTPublisher Class

The MQTTPublisher class in loklak would provide an interface to perform basic operations in MQTT. The implementation uses AsyncClientListener to connect to Mosquitto broker –

AsyncClientListener listener = new AsyncClientListener() {
    // Override methods according to needs
};

[SOURCE]

The publish method for the class can be used by other components of the project to publish messages on the desired channel –

public void publish(String channel, String message) {
    this.mqttClient.publish(new PublishMessage(channel, QoS.AT_LEAST_ONCE, message));
}

[SOURCE]

We also have methods which allow publishing of multiple messages to multiple channels in order to increase the functionality of the class.

Starting Publisher with Server

The flags which signal using of streaming service in loklak are located in conf/config.properties. These configurations are referred while initializing the Data Access Object and an MQTTPublisher is created if needed –

String mqttAddress = getConfig("stream.mqtt.address", "tcp://127.0.0.1:1883");
streamEnabled = getConfig("stream.enabled", false);
if (streamEnabled) {
    mqttPublisher = new MQTTPublisher(mqttAddress);
}

[SOURCE]

The mqttPublisher can now be used by other components of loklak to publish messages to the channel they want.

Adding Mosquitto to Kubernetes

Since loklak has also a nice Kubernetes setup, it was very simple to introduce a new deployment for Mosquitto to it.

Changes in Dockerfile

The Dockerfile for master deployment has to be modified to discover Mosquitto broker in the Kubernetes cluster. For this purpose, corresponding flags in config.properties have to be changed to ensure that things work fine –

sed -i.bak 's/^\(stream.enabled\).*/\1=true/' conf/config.properties && \
sed -i.bak 's/^\(stream.mqtt.address\).*/\1=mosquitto.mqtt:1883/' conf/config.properties && \

[SOURCE]

The Mosquitto broker would be available at mosquitto.mqtt:1883 because of the service that is created for it (explained in later section).

Mosquitto Deployment

The Docker image used in Kubernetes deployment of Mosquitto is taken from toke/docker-kubernetes. Two ports are exposed for the cluster but no volumes are needed –

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mosquitto
  namespace: mqtt
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: mosquitto
        image: toke/mosquitto
        ports:
        - containerPort: 9001
        - containerPort: 8883

[SOURCE]

Exposing Mosquitto to the Cluster

Now that we have the deployment running, we need to expose the required ports to the cluster so that other components may use it. The port 9001 appears as port 80 for the service and 1883 is also exposed –

apiVersion: v1
kind: Service
metadata:
  name: mosquitto
  namespace: mqtt
  ...
spec:
  ...
  ports:
  - name: mosquitto
    port: 1883
  - name: mosquitto-web
    port: 80
    targetPort: 9001

[SOURCE]

After creating the service using this configuration, we will be able to connect our clients to Mosquitto at address mosquitto.mqtt:1883.

Conclusion

In this blog post, I discussed the process of adding Mosquitto to the loklak server project. This is the first step towards introducing the stream API for messages collected in loklak.

These changes were introduced in pull requests loklak/loklak_server#1393 and loklak/loklak_server#1398 by @singhpratyush (me).

Resources

Continue ReadingUsing Mosquitto as a Message Broker for MQTT in loklak Server

Checking Whether Migrations Are Up To Date With The Sqlalchemy Models In The Open Event Server

In the Open Event Server, in the pull requests, if there is some change in the sqlalchemy model, sometimes proper migrations for the same are missed in the PR.

The first approach to check whether the migrations were up to date in the database was with the following health check function:

from subprocess import check_output
def health_check_migrations():
   """
   Checks whether database is up to date with migrations, assumes there is a single migration head
   :return:
   """
   head = check_output(["python", "manage.py", "db", "heads"]).split(" ")[0]
   
   if head == version_num:
       return True, 'database up to date with migrations'
   return False, 'database out of date with migrations'

 

In the above function, we get the head according to the migration files as following:

head = check_output(["python", "manage.py", "db", "heads"]).split(" ")[0]


The table alembic_version contains the latest alembic revision to which the database was actually upgraded. We can get this revision from the following line:

version_num = (db.session.execute('SELECT version_num from alembic_version').fetchone())['version_num']

 

Then we compare both of the given heads and return a proper tuple based on the comparison output.While this method was pretty fast, there was a drawback in this approach. If the user forgets to generate the migration files for the the changes done in the sqlalchemy model, this approach will fail to raise a failure status in the health check.

To overcome this drawback, all the sqlalchemy models were fetched automatically and simple sqlalchemy select queries were made to check whether the migrations were up to date.

Remember that a raw SQL query will not serve our purpose in this case as you’d have to specify the columns explicitly in the query. But in the case of a sqlalchemy query, it generates a SQL query based on the fields defined in the db model, so if migrations are missing to incorporate the said change proper error will be raised.

We can accomplish this from the following function:

def health_check_migrations():
   """
   Checks whether database is up to date with migrations by performing a select query on each model
   :return:
   """
   # Get all the models in the db, all models should have a explicit __tablename__
   classes, models, table_names = [], [], []
   # noinspection PyProtectedMember
   for class_ in db.Model._decl_class_registry.values():
       try:
           table_names.append(class_.__tablename__)
           classes.append(class_)
       except:
           pass
   for table in db.metadata.tables.items():
       if table[0] in table_names:
           models.append(classes[table_names.index(table[0])])

   for model in models:
       try:
           db.session.query(model).first()
       except:
           return False, '{} model out of date with migrations'.format(model)
   return True, 'database up to date with migrations'

 

In the above code, we automatically get all the models and tables present in the database. Then for each model we try a simple SELECT query which returns the first row found. If there is any error in doing so, False, ‘{} model out of date with migrations’.format(model) is returned, so as to ensure a failure status in health checks.

Related:

Continue ReadingChecking Whether Migrations Are Up To Date With The Sqlalchemy Models In The Open Event Server