What is Open Source and why you should do it?

Since Codeheat is going on and Google Code-in has started, I would like to share some knowledge with the new contributors with the help of this blog.

What is an Open Source software?

When googled, you will see:

“Open-source software is computer software with its source code made available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose.”

To put it in layman terms, “A software whose source code is made available to everyone to let them change/improve provided that the contributor who changes the code cannot claim the software to be his own.”

Thus, you don’t own the software thoroughly. All you can do is change the code of the software to make it better. Now, you may be thinking what’s there in for you? There are all pros according to me and I have explained them in the latter half of this article.

Why am I writing this?

I was just in the freshman’s year of my college when I came to know about the web and how it works. I started my journey as a developer, building things, started doing some projects and keeping it with myself. Those days,  exploring more, I first came to know about the Open Source software.

Curiously, wanting to know more about the same, I got to know that anyone can make his/her software Open so as to make it available to others for use and development. Thus, learning more about the same led me to explore other’s projects on GitHub and I went through the codebases of the softwares and started contributing. I remember my first contribution was to correct a “typo” i.e correcting a spelling mistake in the README of the project. That said, I went on exploring more and more and got my hands on Open Source which made me share some of my thoughts with you.

What’s there in for you doing Open Source Contribution?

1) Teaches you how to structure code:

Now a days, nearly many of the software projects are Open Sourced and the community of developer works on the projects to constantly improve them. Thus, big projects have big codebases too which are really hard to understand at first but after giving some time to understand and contribute, you will be fine with those. The thing with such projects is they have a structured code, by “structured”, I mean to say there are strict guidelines for the project i.e they have good tests written which make you write the code as they want, i.e clean and readable. Thus, by writing such code, you will learn how to structure it which ultimately is a great habit that every developer should practice.

2) Team Work:

Creating and maintaining a large project requires team work. When you contribute to a project, you have to work in a team where you have to take others opinions, give your opinions, ask teammates for improvisations or ask anything whichever you are stuck with. Thus, working in team increases productivity, community interaction, your own network, etc.

3) Improves the developer you:

Okay, so I think, one of the most important part of your developer journey is and should be “LEARNING ALWAYS”. Thus, when you contribute, your code is reviewed by others (experts or maintainers of project) who eventually point out the mistakes or the improvisations to be done in the code so that the code can be written much cleaner than you had written. Also, you start to think a problem widely. While solving the problem, you ensure that the code you have written makes the app scalable for a large number of users, also prolonging the life of code.

4) Increases your Network:

One advantage of Open Source contribution is that it also increases your network in the developer community. Thus, you get to know about the things that you have never heard of, you get to explore them, you get to meet people, you get to know what is going in what parts of the world, etc. Having connections with other developers sitting in different countries is always a bonus.

5) Earn some bucks too:

At the end of the day, money matters. Earlier days, people used to think that contributing to Open Source projects won’t earn you money, etc. But if you are a maintainer or a continuous contributor of a great project, you get donations to get continuing the project and making it available to people.

For students in college, doing Open Source is a bonus. There are programmes like:

These programmes offer high incentives and stipends to the fellow students. FOSSASIA participates in GSoC so you can go ahead and try getting in GSoC under FOSSASIA.

6) Plus point for job seekers:

When it comes to applying for job, if you have a good Open Source profile, the recruiter finds a reason to take you out and offer you an interview since you already know how to “manage a project”, “work in team”, “get work done”, “solve a problem efficiently”, etc. Now a days, many companies mention on their job application page as “Open Source would be a bonus”.

7) Where can you start:

We have many projects at FOSSASIA to start with. There are no restrictions on the language since we have projects available for most of the languages.

Currently, we are having a couple of programs open at FOSSASIA. They are:

Feel free to check out the programs and the projects under FOSSASIA at https://github.com/fossasia.

Conclusion

So, yeah. This was it. Hope you understood what Open Source is and how would it benefit you. Keep contributing to FOSSASIA and you will see the effects in no time.

Open Event Server: Creating/Rebuilding Elasticsearch Index From Existing Data In a PostgreSQL DB Using Python

The Elasticsearch instance in the current Open Event Server deployment is currently just used to store the events and search through it due to limited resources.

The project uses a PostgreSQL database, this blog will focus on setting up a job to create the events index if it does not exist. If the indices exists, the job will delete all the previous the data and rebuild the events index.

Although the project uses Flask framework, the job will be in pure python so that it can run in background properly while the application continues its work. Celery is used for queueing up the aforementioned jobs. For building the job the first step would be to connect to our database:

from config import Config
import psycopg2
conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
cur = conn.cursor()

 

The next step would be to fetch all the events from the database. We will only be indexing certain attributes of the event which will be useful in search. Rest of them are not stored in the index. The code given below will fetch us a collection of tuples containing the attributes mentioned in the code:

cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()

 

We will be using the the bulk API, which is significantly fast as compared to adding an event one by one via the API. Elasticsearch-py, the official python client for elasticsearch provides the necessary functionality to work with the bulk API of elasticsearch. The helpers present in the client enable us to use generator expressions to insert the data via the bulk API. The generator expression for events will be as follows:

event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)

 

We will now delete the events index if it exists. The the event index will be recreated. The generator expression obtained above will be passed to the bulk API helper and the event index will repopulated. The complete code for the function will now be as follows:

 

@celery.task(name='rebuild.events.elasticsearch')
def cron_rebuild_events_elasticsearch():
   """
   Re-inserts all eligible events into elasticsearch
   :return:
   """
   conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
   cur = conn.cursor()
   cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()
   event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)
   es_store.indices.delete('events')
   es_store.indices.create('events')
   abc = helpers.bulk(es_store, event_data)

 

Currently we run this job on each week and also on each new deployment. Rebuilding the index is very important as some records may not be indexed when the continuous sync is taking place.

To know more about it please visit https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/

Related links:

FOSSASIA Summit 2018 Singapore – Call for Speakers

The FOSSASIA Open Tech Summit is Asia’s leading Open Technology conference for developers, companies, and IT professionals. The event will take place from Thursday, 22nd – Sunday, 25th March at the Lifelong Learning Institute in Singapore.

During four days developers, technologists, scientists, and entrepreneurs convene to collaborate, share information and learn about the latest in open technologies, including Artificial Intelligence software, DevOps, Cloud Computing, Linux, Science, Hardware and more. The theme of this year’s event is “Towards the Open Conversational Web“.

For our feature event we are looking for speaker submissions about Open Source for the following areas:

  • Artificial Intelligence, Algorithms, Search Engines, Cognitive Experts
  • Open Design, Hardware, Imaging
  • Science, Tech and Education
  • Kernel and Platform
  • Database
  • Cloud, Container, DevOps
  • Internet Society and Community
  • Open Event Solutions
  • Security and Privacy
  • Open Source in Business
  • Blockchain

There will be special events celebrating the 20th anniversary of the Open Source Initiative and its impact in Open Source business. An exhibition space is available for company and project stands.

Submission Guidelines

Please propose your session as early as possible and include a description of your session proposal that is as complete as possible. The description is of particular importance for the selection. Once accepted, speakers will receive a code for a speakers ticket. Speakers will receive a free speakers ticket and two standard tickets for their partner or friends. Sessions are accepted on an ongoing basis.

Submission Link: 2018.fossasia.org/speaker-registration

Dates & Deadlines

Please send us your proposal as soon as possible via the FOSSASIA Summit speaker registration.

Deadline for submissions: December 27th, 2017

Late submissions: Later submissions are possible, but early submissions have priority

Notification of acceptance: On an ongoing basis

Schedule Announced: January 20, 2018

FOSSASIA Open Tech Summit: March 22nd – 25th, 2018

Sessions and Tracks

Talks and Workshops

Talk slots are 20 minutes long plus 5-10 minutes for questions and answers. The idea is, that participants will use the sessions to get an idea of the work of others and are able to follow up in more detail in break-out areas, where they discuss more and start to work together. Speakers can also sign up for either a 1-hour long or a 2-hours workshop sessions. Longer sessions are possible in principle. Please tell us the proposed length of your session at the time of submission.

Lightning talks

You have some interesting ideas but do not want to submit a full talk? We suggest you go for a lightning talk which is a 5 minutes slot to present your idea or project. You are welcome to continue the discussion in breakout areas. There are tables and chairs to serve your get-togethers.

Stands and assemblies

We offer spaces in our exhibition area for companies, projects, installations, team gatherings and other fun activities. We are curious to know what you would like to make, bring or show. Please add details in the submission form.

Developer Rooms/Track Hosts

Get in touch early if you plan to organize a developer room at the event. FOSSASIA is also looking for team members who are interested to co-host and moderate tracks. Please sign up to become a host here.

Publication

Audio and video recordings of the lectures will be published in various formats under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This license allows commercial use by media institutions as part of their reporting. If you do not wish for material from your lecture to be published or streamed, please let us know in your submission.

Sponsorship & Contact

If you would like to sponsor FOSSASIA or have any questions, please contact us via [email protected].

Suggested Topics

  • Artificial Intelligence (SUSI.AI, Algorithms, Cognitive Expert Systems AI on a Chip)
  • Hardware (Architectures, Maker Culture, Small Devices)
  • 20 years Impact of Open Source in Business
  • DevOps (Continuous Delivery, Lean IT, Moving at Cloud-speed)
  • Networking (Software Defined Networking, OpenFlow, Satellite Communication)
  • Security (Coding, Configuration, Testing, Malware)
  • Cloud & Microservices (Containers – Libraries, Runtimes, Composition; Kubernetes; Docker, Distributed Services)
  • Databases (Location-aware and Mapping, Replication and Clustering, Data Warehousing, NoSQL)
  • Science and Applications (Pocket Science Lab, Neurotech, Biohacking, Science Education)
  • Business Development (Open Source Business Models, Startups, Kickstarter Campaigns)
  • Internet of Everything (Smart Home, Medical Systems, Environmental Systems)
  • Internet Society and Culture (Collaborative Development, Community, Advocacy, Government, Governance, Legal)​
  • Kernel Development and Linux On The Desktop (Meilix, Light Linux systems, Custom Linux Generator)
  • Open Design and Libre Art (Open Source Design)
  • Open Event (Event Management systems, Ticketing solutions, Scheduling, Event File Formats)

Links

Speaker Registration and Proposal Submission:
2018.fossasia.org/speaker-registration

FOSSASIA Summit: 2018.fossasia.org

FOSSASIA Summit 2017: Event Wrap-Up

FOSSASIA Photos: flickr.com/photos/fossasia/

FOSSASIA Videos: Youtube FOSSASIA

FOSSASIA on Twitter: twitter.com/fossasia

Open Event API Server: Implementing FAQ Types

In the Open Event Server, there was a long standing request of the users to enable the event organisers to create a FAQ section.

The API of the FAQ section was implemented subsequently. The FAQ API allowed the user to specify the following request schema

{
 "data": {
   "type": "faq",
   "relationships": {
     "event": {
       "data": {
         "type": "event",
         "id": "1"
       }
     }
   },
   "attributes": {
     "question": "Sample Question",
     "answer": "Sample Answer"
   }
 }
}

 

But, what if the user wanted to group certain questions under a specific category. There was no solution in the FAQ API for that. So a new API, FAQ-Types was created.

Why make a separate API for it?

Another question that arose while designing the FAQ-Types API was whether it was necessary to add a separate API for it or not. Consider that a type attribute was simply added to the FAQ API itself. It would mean the client would have to specify the type of the FAQ record every time a new record is being created for the same. This would mean trusting that the user will always enter the same spelling for questions falling under the same type. The user cannot be trusted on this front. Thus the separate API made sure that the types remain controlled and multiple entries for the same type are not there.

Helps in handling large number of records:

Another concern was what if there were a large number of FAQ records under the same FAQ-Type. Entering the type for each of those questions would be cumbersome for the user. The FAQ-Type would also overcome this problem

Following is the request schema for the FAQ-Types API

{
 "data": {
   "attributes": {
     "name": "abc"
   },
   "type": "faq-type",
   "relationships": {
     "event": {
       "data": {
         "id": "1",
         "type": "event"
       }
     }
   }
 }
}

 

Additionally:

  • FAQ to FAQ-type is a many to one relation.
  • A single FAQ can only belong to one Type
  • The FAQ-type relationship will be optional, if the user wants different sections, he/she can add it ,if not, it’s the user’s choice.

Related links

Discount Codes in Open Event Server

The Open Event System allows usage of discount codes with tickets and events. This blogpost describes what types of discount codes are present and what endpoints can be used to fetch and update details.

In Open Event API Server, each event can have two types of discount codes. One is ‘event’ discount code, while the other is ‘ticket’ discount code. As the name suggests, the event discount code is an event level discount code and the ticket discount code is ticket level.

Now each event can have only one ‘event’ discount code and is accessible only to the server admin. The Open Event server admin can create, view and update the ‘event’ discount code for an event. The event discount code followsDiscountCodeEvent Schema. This schema is inherited from the parent class DiscountCodeSchemaPublic. To save the unique discount code associated with an event, the event model’s discount_code_id field is used.

The ‘ticket’ discount is accessible by the event organizer and co-organizer. Each event can have any number of ‘ticket’ discount codes. This follows the DiscountCodeTicket schema, which is also inherited from the same base class ofDiscountCodeSchemaPublic. The use of the schema is decided based on the value of the field ‘used_for’ which can have the value either ‘events’ or ‘tickets’. Both the schemas have different relationships with events and marketer respectively.

We have the following endpoints for Discount Code events and tickets:
‘/events/<int:event_id>/discount-code’
‘/events/<int:event_id>/discount-codes’

The first endpoint is based on the DiscountCodeDetail class. It returns the detail of one discount code which in this case is the event discount code associated with the event.

The second endpoint is based on the DiscountCodeList class which returns a list of discount codes associated with an event. Note that this list also includes the ‘event’ discount code, apart from all the ticket discount codes.

class DiscountCodeFactory(factory.alchemy.SQLAlchemyModelFactory):
   class Meta:
       model = DiscountCode
       sqlalchemy_session = db.session
event_id = None
user = factory.RelatedFactory(UserFactory)
user_id = 1


Since each discount code belongs to an event(either directly or through the ticket), the factory for this has event as related factory, but to check for 
/events/<int:event_id>/discount-code endpoint we first need the event and then pass the discount code id to be 1 for dredd to check this. Hence, event is not included as a related factory, but added as a different object every time a discount code object is to be used.

@hooks.before("Discount Codes > Get Discount Code Detail of an Event > Get Discount Code Detail of an Event")
def event_discount_code_get_detail(transaction):
   """
   GET /events/1/discount-code
   :param transaction:
   :return:
   """
   with stash['app'].app_context():
       discount_code = DiscountCodeFactory()
       db.session.add(discount_code)
       db.session.commit()
       event = EventFactoryBasic(discount_code_id=1)
       db.session.add(event)
       db.session.commit()


The other tests and extended documentation can be found 
here.

References:

Open Event Server: Getting The Identity From The Expired JWT Token In Flask-JWT

The Open Event Server uses JWT based authentication, where JWT stands for JSON Web Token. JSON Web Tokens are an open industry standard RFC 7519 method for representing claims securely between two parties. [source: https://jwt.io/]

Flask-JWT is being used for the JWT-based authentication in the project. Flask-JWT makes it easy to use JWT based authentication in flask, while on its core it still used PyJWT.

To get the identity when a JWT token is present in the request’s Authentication header , the current_identity proxy of Flask-JWT can be used as follows:

@app.route('/example')
@jwt_required()
def example():
   return '%s' % current_identity

 

Note that it will only be set in the context of function decorated by jwt_required(). The problem with the current_identity proxy when using jwt_required is that the token has to be active, the identity of an expired token cannot be fetched by this function.

So why not write a function on our own to do the same. A JWT token is divided into three segments. JSON Web Tokens consist of three parts separated by dots (.), which are:

  • Header
  • Payload
  • Signature

The first step would be to get the payload, that can be done as follows:

token_second_segment = _default_request_handler().split('.')[1]

 

The payload obtained above would still be in form of JSON, it can be converted into a dict as follows:

payload = json.loads(token_second_segment.decode('base64'))

 

The identity can now be found in the payload as payload[‘identity’]. We can get the actual user from the paylaod as follows:

def jwt_identity(payload):
   """
   Jwt helper function
   :param payload:
   :return:
   """
   return User.query.get(payload['identity'])

 

Our final function will now be something like:

def get_identity():
   """
   To be used only if identity for expired tokens is required, otherwise use current_identity from flask_jwt
   :return:
   """
   token_second_segment = _default_request_handler().split('.')[1]
   missing_padding = len(token_second_segment) % 4
   payload = json.loads(token_second_segment.decode('base64'))
   user = jwt_identity(payload)
   return user

 

But after using this function for sometime, you will notice that for certain tokens, the system will raise an error saying that the JWT token is missing padding. The JWT payload is base64 encoded, and it requires the payload string to be a multiple of four. If the string is not a multiple of four, the remaining spaces can pe padded with extra =(equal to) signs. And since Python 2.7’s .decode doesn’t do that by default, we can accomplish that as follows:

missing_padding = len(token_second_segment) % 4

# ensures the string is correctly padded to be a multiple of 4
if missing_padding != 0:
   token_second_segment += b'=' * (4 - missing_padding)

 

Related links:

Introducing Stream Servlet in loklak Server

A major part of my GSoC proposal was adding stream API to loklak server. In a previous blog post, I discussed the addition of Mosquitto as a message broker for MQTT streaming. After testing this service for a few days and some minor improvements, I was in a position to expose the stream to outside users using a simple API.

In this blog post, I will be discussing the addition of /api/stream.json endpoint to loklak server.

HTTP Server-Sent Events

Server-sent events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection. The Server-Sent Events EventSource API is standardized as part of HTML5 by the W3C.

Wikipedia

This API is supported by all major browsers except Microsoft Edge. For loklak, the plan was to use this event system to send messages, as they arrive, to the connected users. Apart from browser support, EventSource API can also be used with many other technologies too.

Jetty Eventsource Plugin

For Java, we can use Jetty’s EventSource plugin to send events to clients. It is similar to other Jetty servlets when it comes to processing the arguments, handling requests, etc. But it provides a simple interface to send events as they occur to connected users.

Adding Dependency

To use this plugin, we can add the following line to Gradle dependencies –

compile group: 'org.eclipse.jetty', name: 'jetty-eventsource-servlet', version: '1.0.0'

[SOURCE]

The Event Source

An EventSource is the object which is required for EventSourceServlet to send events. All the logics for emitting events needs to be defined in the related class. To link a servlet with an EventSource, we need to override the newEventSource method –

public class StreamServlet extends EventSourceServlet {
    @Override
    protected EventSource newEventSource(HttpServletRequest request) {
        String channel = request.getParameter("channel");
        if (channel == null) {
            return null;
        }
        if (channel.isEmpty()) {
            return null;
        }
        return new MqttEventSource(channel);
    }
}

[SOURCE]

If no channel is provided, the EventSource object will be null and the request will be rejected. Here, the MqttEventSource would be used to handle the stream of Tweets as they arrive from the Mosquitto message broker.

Cross Site Requests

Since the requests to this endpoint can’t be of JSONP type, it is necessary to allow cross site requests on this endpoint. This can be done by overriding the doGet method of the servlet –

@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
     response.setHeader("Access-Control-Allow-Origin", "*");
    super.doGet(request, response);
}

[SOURCE]

Adding MQTT Subscriber

When a request for events arrives, the constructor to MqttEventSource is called. At this stage, we need to connect to the stream from Mosquitto for the channel. To achieve this, we can set the class as MqttCallback using appropriate client configurations –

public class MqttEventSource implements MqttCallback {
    ...
    MqttEventSource(String channel) {
        this.channel = channel;
    }
    ...
    this.mqttClient = new MqttClient(address, "loklak_server_subscriber");
    this.mqttClient.connect();
    this.mqttClient.setCallback(this);
    this.mqttClient.subscribe(this.channel);
    ...
}

[SOURCE]

By setting the callback to this, we can override the messageArrived method to handle the arrival of a new message on the channel. Just to mention, the client library used here is Eclipse Paho.

Connecting MQTT Stream to SSE Stream

Now that we have subscribed to the channel we wish to send events from, we can use the Emitter to send events from our EventSource by implementing it –

public class MqttEventSource implements EventSource, MqttCallback {
    private Emitter emitter;


    @Override
    public void onOpen(Emitter emitter) throws IOException {
        this.emitter = emitter;
        ...
    }

    @Override
    public void messageArrived(String topic, MqttMessage message) throws Exception {
        this.emitter.data(message.toString());
    }
}

[SOURCE]

Closing Stream on Disconnecting from User

When a client disconnects from the stream, it doesn’t makes sense to stay connected to the server. We can use the onClose method to disconnect the subscriber from the MQTT broker –

@Override
public void onClose() {
    try {
        this.mqttClient.close();
        this.mqttClient.disconnect();
    } catch (MqttException e) {
        // Log some warning 
    }
}

[SOURCE]

Conclusion

In this blog post, I discussed connecting the MQTT stream to SSE stream using Jetty’s EventSource plugin. Once in place, this event system would save us from making too many requests to collect and visualize data. The possibilities of applications of such feature are huge.

This feature can be seen in action at the World Mood Tracker app.

The changes were introduced in pull request loklak/loklak_server#1474 by @singhpratyush (me).

Resources

Optimising Docker Images for loklak Server

The loklak server is in a process of moving to Kubernetes. In order to do so, we needed to have different Docker images that suit these deployments. In this blog post, I will be discussing the process through which I optimised the size of Docker image for these deployments.

Initial Image

The image that I started with used Ubuntu as base. It installed all the components needed and then modified the configurations as required –

FROM ubuntu:latest

# Env Vars
ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
ENV DEBIAN_FRONTEND noninteractive

WORKDIR /loklak_server

RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y git openjdk-8-jdk
RUN git clone https://github.com/loklak/loklak_server.git /loklak_server
RUN git checkout development
RUN ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport
RUN sed -i.bak 's/^\(port.http=\).*/\180/' conf/config.properties
... # More configurations
RUN echo "while true; do sleep 10;done" >> bin/start.sh

# Start
CMD ["bin/start.sh", "-Idn"]

The size of images built using this Dockerfile was quite huge –

REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE

loklak_server       latest              a92f506b360d        About a minute ago   1.114 GB

ubuntu              latest              ccc7a11d65b1        3 days ago           120.1 MB

But since this size is not acceptable, we needed to reduce it.

Moving to Apline

Alpine Linux is an extremely lightweight Linux distro, built mainly for the container environment. Its size is so tiny that it hardly puts any impact on the overall size of images. So, I replaced Ubuntu with Alpine –

FROM alpine:latest

...
RUN apk update
RUN apk add git openjdk8 bash
...

And now we had much smaller images –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      668.8 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

As we can see that due to no caching and small size of Alpine, the image size is reduced to almost half the original.

Reducing Content Size

There are many things in a project which are no longer needed while running the project, like the .git folder (which is huge in case of loklak) –

$ du -sh loklak_server/.git
236M loklak_server/.git

We can remove such files from the Docker image and save a lot of space –

rm -rf .[^.] .??*

Optimizing Number of Layers

The number of layers also affect the size of the image. More the number of layers, more will be the size of image. In the Dockerfile, we can club together the RUN commands for lower number of images.

RUN apk update && apk add openjdk8 git bash && \
  git clone https://github.com/loklak/loklak_server.git /loklak_server && \
  ...

After this, the effective size is again reduced by a major factor –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      422.3 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

Conclusion

In this blog post, I discussed the process of optimising the size of Dockerfile for Kubernetes deployments of loklak server. The size was reduced to 426 MB from 1.234 GB and this provided much faster push/pull time for Docker images, and therefore, faster updates for Kubernetes deployments.

Resources

Persistently Storing loklak Server Dumps on Kubernetes

In an earlier blog post, I discussed loklak setup on Kubernetes. The deployment mentioned in the post was to test the development branch. Next, we needed to have a deployment where all the messages are collected and dumped in text files that can be reused.

In this blog post, I will be discussing the challenges with such deployment and the approach to tackle them.

Volatile Disk in Kubernetes

The pods that hold deployments in Kubernetes have disk storage. Any data that gets written by the application stays only until the same version of deployment is running. As soon as the deployment is updated/relocated, the data stored during the application is cleaned up.


Due to this, dumps are written when loklak is running but they get wiped out when the deployment image is updated. In other words, all dumps are lost when the image updates. We needed to find a solution to this as we needed a permanent storage when collecting dumps.

Persistent Disk

In order to have a storage which can hold data permanently, we can mount persistent disk(s) on a pod at the appropriate location. This ensures that the data that is important to us stays with us, even
when the deployment goes down.


In order to add persistent disks, we first need to create a persistent disk. On Google Cloud Platform, we can use the gcloud CLI to create disks in a given region –

gcloud compute disks create --size=<required size> --zone=<same as cluster zone> <unique disk name>

After this, we can mount it on a Docker volume defined in Kubernetes configurations –

      ...
      volumeMounts:
        - mountPath: /path/to/mount
          name: volume-name
  volumes:
    - name: volume-name
      gcePersistentDisk:
        pdName: disk-name
        fsType: fileSystemType

But this setup can’t be used for storing loklak dumps. Let’s see “why” in the next section.

Rolling Updates and Persistent Disk

The Kubernetes deployment needs to be updated when the master branch of loklak server is updated. This update of master deployment would create a new pod and try to start loklak server on it. During all this, the older deployment would also be running and serving the requests.


The control will not be transferred to the newer pod until it is ready and all the probes are passing. The newer deployment will now try to mount the disk which is mentioned in the configuration, but it would fail to do so. This would happen because the older pod has already mounted the disk.


Therefore, all new deployments would simply fail to start due to insufficient resources. To overcome such issues, Kubernetes allows persistent volume claims. Let’s see how we used them for loklak deployment.

Persistent Volume Claims

Kubernetes provides Persistent Volume Claims which claim resources (storage) from a Persistent Volume (just like a pod does from a node). The higher level APIs are provided by Kubernetes (configurations and kubectl command line). In the loklak deployment, the persistent volume is a Google Compute Engine disk –

apiVersion: v1
kind: PersistentVolume
metadata:
  name: dump
  namespace: web
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: slow
  gcePersistentDisk:
    pdName: "data-dump-disk"
    fsType: "ext4"

[SOURCE]

It must be noted here that a persistent disk by the name of data-dump-index is already created in the same region.


The storage class defines the way in which the PV should be handled, along with the provisioner for the service –

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: slow
  namespace: web
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  zone: us-central1-a

[SOURCE]

After having the StorageClass and PersistentVolume, we can create a claim for the volume by using appropriate configurations –

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dump
  namespace: web
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: slow

[SOURCE]

After this, we can mount this claim on our Deployment –

  ...
  volumeMounts:
    - name: dump
      mountPath: /loklak_server/data
volumes:
  - name: dump
    persistentVolumeClaim:
      claimName: dump

[SOURCE]

Verifying persistence of Dumps

To verify this, we can redeploy the cluster using the same persistent disk and check if the earlier dumps are still present there –

$ http http://link.to.deployment/dump/
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html;charset=utf-8
...


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<h1>Index of /dump</h1>
<pre>      Name 
[gz ] <a href="messages_20170802_71562040.txt.gz">messages_20170802_71562040.txt.gz</a>   Thu Aug 03 00:07:21 GMT 2017   132M
[gz ] <a href="messages_20170803_69925009.txt.gz">messages_20170803_69925009.txt.gz</a>   Mon Aug 07 15:40:04 GMT 2017   532M
[gz ] <a href="messages_20170807_36357603.txt.gz">messages_20170807_36357603.txt.gz</a>   Wed Aug 09 10:26:24 GMT 2017   377M
[txt] <a href="messages_20170809_27974404.txt">messages_20170809_27974404.txt</a>      Thu Aug 10 08:51:49 GMT 2017  1564M
<hr></pre>
...

Conclusion

In this blog post, I discussed the process of deployment of loklak with persistent dumps on Kubernetes. This deployment is intended to work as root.loklak.org in near future. The changes were proposed in loklak/loklak_server#1377 by @singhpratyush (me).

Resources

Using Mosquitto as a Message Broker for MQTT in loklak Server

In loklak server, messages are collected from various sources and indexed using Elasticsearch. To know when a message of interest arrives, users can poll the search endpoint. But this method would require a lot of HTTP requests, most of them being redundant. Also, if a user would like to collect messages for a particular topic, he would need to make a lot of requests over a period of time to get enough data.

For GSoC 2017, my proposal was to introduce stream API in the loklak server so that we could save ourselves from making too many requests and also add many use cases.

Mosquitto is Eclipse’s project which acts as a message broker for the popular MQTT protocol. MQTT, based on the pub-sub model, is a lightweight and IOT friendly protocol. In this blog post, I will discuss the basic setup of Mosquitto in the loklak server.

Installation and Dependency for Mosquitto

The installation process of Mosquitto is very simple. For Ubuntu, it is available from the pre installed PPAs –

sudo apt-get install mosquitto

Once the message broker is up and running, we can use the clients to connect to it and publish/subscribe to channels. To add MQTT client as a project dependency, we can introduce following line in Gradle dependencies file –

compile group: 'net.sf.xenqtt', name: 'xenqtt', version: '0.9.5'

[SOURCE]

After this, we can use the client libraries in the server code base.

The MQTTPublisher Class

The MQTTPublisher class in loklak would provide an interface to perform basic operations in MQTT. The implementation uses AsyncClientListener to connect to Mosquitto broker –

AsyncClientListener listener = new AsyncClientListener() {
    // Override methods according to needs
};

[SOURCE]

The publish method for the class can be used by other components of the project to publish messages on the desired channel –

public void publish(String channel, String message) {
    this.mqttClient.publish(new PublishMessage(channel, QoS.AT_LEAST_ONCE, message));
}

[SOURCE]

We also have methods which allow publishing of multiple messages to multiple channels in order to increase the functionality of the class.

Starting Publisher with Server

The flags which signal using of streaming service in loklak are located in conf/config.properties. These configurations are referred while initializing the Data Access Object and an MQTTPublisher is created if needed –

String mqttAddress = getConfig("stream.mqtt.address", "tcp://127.0.0.1:1883");
streamEnabled = getConfig("stream.enabled", false);
if (streamEnabled) {
    mqttPublisher = new MQTTPublisher(mqttAddress);
}

[SOURCE]

The mqttPublisher can now be used by other components of loklak to publish messages to the channel they want.

Adding Mosquitto to Kubernetes

Since loklak has also a nice Kubernetes setup, it was very simple to introduce a new deployment for Mosquitto to it.

Changes in Dockerfile

The Dockerfile for master deployment has to be modified to discover Mosquitto broker in the Kubernetes cluster. For this purpose, corresponding flags in config.properties have to be changed to ensure that things work fine –

sed -i.bak 's/^\(stream.enabled\).*/\1=true/' conf/config.properties && \
sed -i.bak 's/^\(stream.mqtt.address\).*/\1=mosquitto.mqtt:1883/' conf/config.properties && \

[SOURCE]

The Mosquitto broker would be available at mosquitto.mqtt:1883 because of the service that is created for it (explained in later section).

Mosquitto Deployment

The Docker image used in Kubernetes deployment of Mosquitto is taken from toke/docker-kubernetes. Two ports are exposed for the cluster but no volumes are needed –

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mosquitto
  namespace: mqtt
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: mosquitto
        image: toke/mosquitto
        ports:
        - containerPort: 9001
        - containerPort: 8883

[SOURCE]

Exposing Mosquitto to the Cluster

Now that we have the deployment running, we need to expose the required ports to the cluster so that other components may use it. The port 9001 appears as port 80 for the service and 1883 is also exposed –

apiVersion: v1
kind: Service
metadata:
  name: mosquitto
  namespace: mqtt
  ...
spec:
  ...
  ports:
  - name: mosquitto
    port: 1883
  - name: mosquitto-web
    port: 80
    targetPort: 9001

[SOURCE]

After creating the service using this configuration, we will be able to connect our clients to Mosquitto at address mosquitto.mqtt:1883.

Conclusion

In this blog post, I discussed the process of adding Mosquitto to the loklak server project. This is the first step towards introducing the stream API for messages collected in loklak.

These changes were introduced in pull requests loklak/loklak_server#1393 and loklak/loklak_server#1398 by @singhpratyush (me).

Resources