One Click Deployment Button for loklak Using Heroku with Gradle Build

The one click deploy button makes it easy for the users of loklak to get their own cloud instance created and deployed in their heroku account and can be used according to their flexibility. Heroku uses an app.json manifest in the code repo to figure out what add-ons, config and other deployment steps are required to make the code run. This is used to configure and deploy the app.

Once you have provide the app name and then click on deploy button, Heroku will start deploying the loklak server to a new app on your account:

When setup is complete, you can open the deployed app in your browser or inspect it in Dashboard.

All these steps and requirements can now be encoded in an app.json file and placed in a repo alongside a button that kicks off the setup with a single click.

App.json is a manifest format for describing apps and specifying what their config requirements are. Heroku uses this file to figure out how code in a particular repo should be deployed on the platform. Here is the loklak’s app.json file which used gradle build pack:

{
	"name": "Loklak Server",
	"description": "Distributed Tweet Search Server",
	"logo": "https://raw.githubusercontent.com/loklak/loklak_server/master/html/images/loklak_anonymous.png",
	"website": "http://api.loklak.org",
	"repository": "https://github.com/loklak/loklak_server.git",
	"image": "loklak/loklak_server:latest-master",
	"env": {
		"BUILDPACK_URL": "https://github.com/heroku/heroku-buildpack-gradle.git"
	}
}

 

If you are interested you can try deploying the peer from here itself. Checkout how simple it can be to deploy.

Deploy button:

Deploy

Resources:

Continue Reading

Auto Deploying loklak Server on Google Cloud Using Travis

This is a setup for loklak server which want to check in only the source files, but have the development branch in Kubernetes deployment automatically updated with some compiled output every time the push using details from Travis build.

How to achieve it?

Unix commands and shell script is one of the best option to automate all deployment and build activities. I explored Kubernetes Gcloud which can be accessed through unix command.

1.Checking for Travis build details before deployment:

Firstly check whether the repository is loklak_server, pull request is available and branches are either master or development, and then decide to update the docker image or not. The code of the aforementioned things is as follows:

if [ "$TRAVIS_REPO_SLUG" != "loklak/loklak_server" ]; then
    echo "Skipping image update for repo $TRAVIS_REPO_SLUG"
    exit 0
fi

if [ "$TRAVIS_PULL_REQUEST" != "false" ]; then
    echo "Skipping image update for pull request"
    exit 0
fi

if [ "$TRAVIS_BRANCH" != "master" ] && [ "$TRAVIS_BRANCH" != "development" ]; then
    echo "Skipping image update for branch $TRAVIS_BRANCH"
    exit 0
fi

2. Setting up Tag and Decrypting the credentials:

For the Kubernetes deployment, each time the travis build is successful, it takes the commit details from travis and appended into tag details for deployment and gcloud credentials is decrypted from the json file.

openssl aes-256-cbc -K $encrypted_48d01dc243a6_key -iv $encrypted_48d01dc243a6_iv  -in kubernetes/gcloud-credentials.json.enc -out kubernetes/gcloud-credentials.json -d

3. Install, Authenticate and Configure GCloud details with Kubernetes:

In this step, initially Google Cloud SDK should be installed with Kubernetes-

curl https://sdk.cloud.google.com | bash > /dev/null
source ~/google-cloud-sdk/path.bash.inc
gcloud components install kubectl

 

Then, Authenticate Google Cloud using the above mentioned decrypted credentials and finally configure the Google Cloud with the details like zone, project name, cluster details, number of nodes etc.

4. Update the Kubernetes deployment:

Since, in this issue it is specific to the loklak_server/development branch, so in here it checks if the branch is development or not and then updates the deployment using following command:

if [ $TRAVIS_BRANCH == "development" ]; then
    kubectl set image deployment/server --namespace=web server=$TAG
fi

 

Conclusion:

In this post, how to write a script in such a way that with each successful push after travis build how to update the deployment on Kubernetes GCloud.

Resources:

Continue Reading

Developing LoklakWordCloud app for Loklak apps site

LoklakWordCloud app is an app to visualise data returned by loklak in form of a word cloud.

The app is presently hosted on Loklak apps site.

Word clouds provide a very simple, easy, yet interesting and effective way to analyse and visualise data. This app will allow users to create word cloud out of twitter data via Loklak API.

Presently the app is at its very early stage of development and more work is left to be done. The app consists of a input field where user can enter a query word and on pressing search button a word cloud will be generated using the words related to the query word entered.

Loklak API is used to fetch all the tweets which contain the query word entered by the user.

These tweets are processed to generate the word cloud.

Related issue: https://github.com/fossasia/apps.loklak.org/pull/279

Live app: http://apps.loklak.org/LoklakWordCloud/

Developing the app

The main challenge in developing this app is implementing its prime feature, that is, generating the word cloud. How do we get a dynamic word cloud which can be easily generated by the user based on the word he has entered? Well, here comes in Jqcloud. An awesome lightweight Jquery plugin for generating word clouds. All we need to do is provide list of words along with their weights.

Let us see step by step how this app (first version) works. First we require all the tweets which contain the entered word. For this we use Loklak search service. Once we get all the tweets, then we can parse the tweet body to create a list of words along with their frequency.

var url = "http://35.184.151.104/api/search.json?callback=JSON_CALLBACK&count=100&q=" + query;
        $http.jsonp(url)
            .then(function (response) {
                $scope.createWordCloudData(response.data.statuses);
                $scope.tweet = null;
            });

Once we have all the tweets, we need to extract the tweet texts and create a list of valid words. What are valid words? Well words like ‘the’, ‘is’, ‘a’, ‘for’, ‘of’, ‘then’, does not provide us with any important information and will not help us in doing any kind of analysis. So there is no use of including them in our word cloud. Such words are called stop words and we need to get rid of them. For this we are using a list of commonly used stop words. Such lists can be very easily found over the internet. Here is the list which we are using. Once we are able to extract the text from the tweets, we need to filter stop words and insert the valid words into a list.

 tweet = data[i];
            tweetWords = tweet.text.replace(", ", " ").split(" ");

            for (var j = 0; j < tweetWords.length; j++) {
                word = tweetWords[j];
                word = word.trim();
                if (word.startsWith("'") || word.startsWith('"') || word.startsWith("(") || word.startsWith("[")) {
                    word = word.substring(1);
                }
                if (word.endsWith("'") || word.endsWith('"') || word.endsWith(")") || word.endsWith("]") ||
                    word.endsWith("?") || word.endsWith(".")) {
                    word = word.substring(0, word.length - 1);
                }
                if (stopwords.indexOf(word.toLowerCase()) !== -1) {
                    continue;
                }
                if (word.startsWith("#") || word.startsWith("@")) {
                    continue;
                }
                if (word.startsWith("http") || word.startsWith("https")) {
                    continue;
                }
                $scope.filteredWords.push(word);
            }

What are we actually doing in the above snippet? We are simply iterating over each of the statuses returned by Loklak API. For each tweet, first we are splitting the text into words and then we are iterating over those words. For a given word we do a number of checks. First we check if the word begins or ends with a special character, for example quotation marks or brackets. If so we remove those character as it will cause trouble in calculating frequencies. Next we also check if the word is beginning with ‘#’ or ‘@’. If it is true, then we discard such words as we are handling hashtags and mentions separately. Finally we check whether the word is a stop word or not. If it is a stop word then we discard it. If a word passes all the checks, we add it to our list of valid words.

Once we are done with the tweet bodies, next we need to handle hashtags and mentions.

tweet.hashtags.forEach(function (hashtag) {
                $scope.filteredWords.push("#" + hashtag);
            });

            tweet.mentions.forEach(function (mention) {
                $scope.filteredWords.push("@" + mention);
            });

The above code simply iterates over the hashtags and mentions and inserts them into the filteredWords list. We have handled hashtags and mentions separately so that we can apply filters in future.

Once we are done with generating list of valid words, we need to calculate weight for each of the word. Here weight is nothing but the number of times a particular word is present in the list. We calculate this using JavaScript object. We iterate over the list of valid words. If word is not present in the object (or dictionary as you wish to call it), we create a new key by the name of that word and set its value to one. If a word is already present as a key, then we simply increment its value by one.

for (var word in $scope.wordFreq) {
            $scope.wordCloudData.push({
                text: word,
                weight: $scope.wordFreq[word]
            });
        }

The above code snippet calculates the frequency of each word by the process mentioned above.

Now we are all set to generate our word cloud. We simply use Jqcloud’s interface to configure it with the words and their respective frequencies, provide a list of color codes for a color gradient, and set autoResize to true so that our word cloud resizes itself when the screen size changes.

$scope.generateWordCloud = function() {
        if ($scope.wordCloud === null) {
            $scope.wordCloud = $('.wordcloud').jQCloud($scope.wordCloudData, {
                colors: ["#D50000", "#FF5722", "#FF9800", "#4CAF50", "#8BC34A", "#4DB6AC", "#7986CB", "#5C6BC0", "#64B5F6"],
                fontSize: {
                    from: 0.06,
                    to: 0.01
                },
                autoResize: true
            });
        } else {
            $scope.wordCloud = $(".wordcloud").jQCloud('update', $scope.wordCloudData);
        }
    }

Whenever the user searches for a new word, we simply update the existing word cloud with the cloud of the new word.

Future roadmap

  • Make the words in the cloud clickable. On clicking a word, the cloud should get replaced by the selected word’s cloud.
  • Add filters for hashtags, mentions, date.
  • Add option for exporting the cloud to an image, so that user’s can also use this app as a tool to generate word clouds as images and save them.
  • Add a loader and error notification for invalid or empty input.

Important resources

  • View the app source code here.
  • Learn more about Loklak API here.
  • Learn more about Jqcloud here.
  • Learn more about AngularJS here.
Continue Reading

Persistently Storing loklak Server Dumps on Kubernetes

In an earlier blog post, I discussed loklak setup on Kubernetes. The deployment mentioned in the post was to test the development branch. Next, we needed to have a deployment where all the messages are collected and dumped in text files that can be reused.

In this blog post, I will be discussing the challenges with such deployment and the approach to tackle them.

Volatile Disk in Kubernetes

The pods that hold deployments in Kubernetes have disk storage. Any data that gets written by the application stays only until the same version of deployment is running. As soon as the deployment is updated/relocated, the data stored during the application is cleaned up.


Due to this, dumps are written when loklak is running but they get wiped out when the deployment image is updated. In other words, all dumps are lost when the image updates. We needed to find a solution to this as we needed a permanent storage when collecting dumps.

Persistent Disk

In order to have a storage which can hold data permanently, we can mount persistent disk(s) on a pod at the appropriate location. This ensures that the data that is important to us stays with us, even
when the deployment goes down.


In order to add persistent disks, we first need to create a persistent disk. On Google Cloud Platform, we can use the gcloud CLI to create disks in a given region –

gcloud compute disks create --size=<required size> --zone=<same as cluster zone> <unique disk name>

After this, we can mount it on a Docker volume defined in Kubernetes configurations –

      ...
      volumeMounts:
        - mountPath: /path/to/mount
          name: volume-name
  volumes:
    - name: volume-name
      gcePersistentDisk:
        pdName: disk-name
        fsType: fileSystemType

But this setup can’t be used for storing loklak dumps. Let’s see “why” in the next section.

Rolling Updates and Persistent Disk

The Kubernetes deployment needs to be updated when the master branch of loklak server is updated. This update of master deployment would create a new pod and try to start loklak server on it. During all this, the older deployment would also be running and serving the requests.


The control will not be transferred to the newer pod until it is ready and all the probes are passing. The newer deployment will now try to mount the disk which is mentioned in the configuration, but it would fail to do so. This would happen because the older pod has already mounted the disk.


Therefore, all new deployments would simply fail to start due to insufficient resources. To overcome such issues, Kubernetes allows persistent volume claims. Let’s see how we used them for loklak deployment.

Persistent Volume Claims

Kubernetes provides Persistent Volume Claims which claim resources (storage) from a Persistent Volume (just like a pod does from a node). The higher level APIs are provided by Kubernetes (configurations and kubectl command line). In the loklak deployment, the persistent volume is a Google Compute Engine disk –

apiVersion: v1
kind: PersistentVolume
metadata:
  name: dump
  namespace: web
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: slow
  gcePersistentDisk:
    pdName: "data-dump-disk"
    fsType: "ext4"

[SOURCE]

It must be noted here that a persistent disk by the name of data-dump-index is already created in the same region.


The storage class defines the way in which the PV should be handled, along with the provisioner for the service –

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: slow
  namespace: web
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  zone: us-central1-a

[SOURCE]

After having the StorageClass and PersistentVolume, we can create a claim for the volume by using appropriate configurations –

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dump
  namespace: web
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: slow

[SOURCE]

After this, we can mount this claim on our Deployment –

  ...
  volumeMounts:
    - name: dump
      mountPath: /loklak_server/data
volumes:
  - name: dump
    persistentVolumeClaim:
      claimName: dump

[SOURCE]

Verifying persistence of Dumps

To verify this, we can redeploy the cluster using the same persistent disk and check if the earlier dumps are still present there –

$ http http://link.to.deployment/dump/
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html;charset=utf-8
...


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<h1>Index of /dump</h1>
<pre>      Name 
[gz ] <a href="messages_20170802_71562040.txt.gz">messages_20170802_71562040.txt.gz</a>   Thu Aug 03 00:07:21 GMT 2017   132M
[gz ] <a href="messages_20170803_69925009.txt.gz">messages_20170803_69925009.txt.gz</a>   Mon Aug 07 15:40:04 GMT 2017   532M
[gz ] <a href="messages_20170807_36357603.txt.gz">messages_20170807_36357603.txt.gz</a>   Wed Aug 09 10:26:24 GMT 2017   377M
[txt] <a href="messages_20170809_27974404.txt">messages_20170809_27974404.txt</a>      Thu Aug 10 08:51:49 GMT 2017  1564M
<hr></pre>
...

Conclusion

In this blog post, I discussed the process of deployment of loklak with persistent dumps on Kubernetes. This deployment is intended to work as root.loklak.org in near future. The changes were proposed in loklak/loklak_server#1377 by @singhpratyush (me).

Resources

Continue Reading

Using Mosquitto as a Message Broker for MQTT in loklak Server

In loklak server, messages are collected from various sources and indexed using Elasticsearch. To know when a message of interest arrives, users can poll the search endpoint. But this method would require a lot of HTTP requests, most of them being redundant. Also, if a user would like to collect messages for a particular topic, he would need to make a lot of requests over a period of time to get enough data.

For GSoC 2017, my proposal was to introduce stream API in the loklak server so that we could save ourselves from making too many requests and also add many use cases.

Mosquitto is Eclipse’s project which acts as a message broker for the popular MQTT protocol. MQTT, based on the pub-sub model, is a lightweight and IOT friendly protocol. In this blog post, I will discuss the basic setup of Mosquitto in the loklak server.

Installation and Dependency for Mosquitto

The installation process of Mosquitto is very simple. For Ubuntu, it is available from the pre installed PPAs –

sudo apt-get install mosquitto

Once the message broker is up and running, we can use the clients to connect to it and publish/subscribe to channels. To add MQTT client as a project dependency, we can introduce following line in Gradle dependencies file –

compile group: 'net.sf.xenqtt', name: 'xenqtt', version: '0.9.5'

[SOURCE]

After this, we can use the client libraries in the server code base.

The MQTTPublisher Class

The MQTTPublisher class in loklak would provide an interface to perform basic operations in MQTT. The implementation uses AsyncClientListener to connect to Mosquitto broker –

AsyncClientListener listener = new AsyncClientListener() {
    // Override methods according to needs
};

[SOURCE]

The publish method for the class can be used by other components of the project to publish messages on the desired channel –

public void publish(String channel, String message) {
    this.mqttClient.publish(new PublishMessage(channel, QoS.AT_LEAST_ONCE, message));
}

[SOURCE]

We also have methods which allow publishing of multiple messages to multiple channels in order to increase the functionality of the class.

Starting Publisher with Server

The flags which signal using of streaming service in loklak are located in conf/config.properties. These configurations are referred while initializing the Data Access Object and an MQTTPublisher is created if needed –

String mqttAddress = getConfig("stream.mqtt.address", "tcp://127.0.0.1:1883");
streamEnabled = getConfig("stream.enabled", false);
if (streamEnabled) {
    mqttPublisher = new MQTTPublisher(mqttAddress);
}

[SOURCE]

The mqttPublisher can now be used by other components of loklak to publish messages to the channel they want.

Adding Mosquitto to Kubernetes

Since loklak has also a nice Kubernetes setup, it was very simple to introduce a new deployment for Mosquitto to it.

Changes in Dockerfile

The Dockerfile for master deployment has to be modified to discover Mosquitto broker in the Kubernetes cluster. For this purpose, corresponding flags in config.properties have to be changed to ensure that things work fine –

sed -i.bak 's/^\(stream.enabled\).*/\1=true/' conf/config.properties && \
sed -i.bak 's/^\(stream.mqtt.address\).*/\1=mosquitto.mqtt:1883/' conf/config.properties && \

[SOURCE]

The Mosquitto broker would be available at mosquitto.mqtt:1883 because of the service that is created for it (explained in later section).

Mosquitto Deployment

The Docker image used in Kubernetes deployment of Mosquitto is taken from toke/docker-kubernetes. Two ports are exposed for the cluster but no volumes are needed –

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mosquitto
  namespace: mqtt
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: mosquitto
        image: toke/mosquitto
        ports:
        - containerPort: 9001
        - containerPort: 8883

[SOURCE]

Exposing Mosquitto to the Cluster

Now that we have the deployment running, we need to expose the required ports to the cluster so that other components may use it. The port 9001 appears as port 80 for the service and 1883 is also exposed –

apiVersion: v1
kind: Service
metadata:
  name: mosquitto
  namespace: mqtt
  ...
spec:
  ...
  ports:
  - name: mosquitto
    port: 1883
  - name: mosquitto-web
    port: 80
    targetPort: 9001

[SOURCE]

After creating the service using this configuration, we will be able to connect our clients to Mosquitto at address mosquitto.mqtt:1883.

Conclusion

In this blog post, I discussed the process of adding Mosquitto to the loklak server project. This is the first step towards introducing the stream API for messages collected in loklak.

These changes were introduced in pull requests loklak/loklak_server#1393 and loklak/loklak_server#1398 by @singhpratyush (me).

Resources

Continue Reading

Deploying loklak Server on Kubernetes with External Elasticsearch

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

kubernetes.io

Kubernetes is an awesome cloud platform, which ensures that cloud applications run reliably. It runs automated tests, flawless updates, smart roll out and rollbacks, simple scaling and a lot more.

So as a part of GSoC, I worked on taking the loklak server to Kubernetes on Google Cloud Platform. In this blog post, I will be discussing the approach followed to deploy development branch of loklak on Kubernetes.

New Docker Image

Since Kubernetes deployments work on Docker images, we needed one for the loklak project. The existing image would not be up to the mark for Kubernetes as it contained the declaration of volumes and exposing of ports. So I wrote a new Docker image which could be used in Kubernetes.

The image would simply clone loklak server, build the project and trigger the server as CMD

FROM alpine:latest

ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

WORKDIR /loklak_server

RUN apk update && apk add openjdk8 git bash && \
    git clone https://github.com/loklak/loklak_server.git /loklak_server && \
    git checkout development && \
    ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport && \
    # Some Configurations and Cleanups

CMD ["bin/start.sh", "-Idn"]

[SOURCE]

This image wouldn’t have any volumes or exposed ports and we are now free to configure them in the configuration files (discussed in a later section).

Building and Pushing Docker Image using Travis

To automatically build and push on a commit to the master branch, Travis build is used. In the after_success section, a call to push Docker image is made.

Travis environment variables hold the username and password for Docker hub and are used for logging in –

docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD

[SOURCE]

We needed checks there to ensure that we are on the right branch for the push and we are not handling a pull request –

# Build and push Kubernetes Docker image
KUBERNETES_BRANCH=loklak/loklak_server:latest-kubernetes-$TRAVIS_BRANCH
KUBERNETES_COMMIT=loklak/loklak_server:kubernetes-$TRAVIS_COMMIT
  
if [ "$TRAVIS_BRANCH" == "development" ]; then
    docker build -t loklak_server_kubernetes kubernetes/images/development
    docker tag loklak_server_kubernetes $KUBERNETES_BRANCH
    docker push $KUBERNETES_BRANCH
    docker tag $KUBERNETES_BRANCH $KUBERNETES_COMMIT
    docker push $KUBERNETES_COMMIT
elif [ "$TRAVIS_BRANCH" == "master" ]; then
    # Build and push master
else
    echo "Skipping Kubernetes image push for branch $TRAVIS_BRANCH"
fi

[SOURCE]

Kubernetes Configurations for loklak

Kubernetes cluster can completely be configured using configurations written in YAML format. The deployment of loklak uses the previously built image. Initially, the image tagged as latest-kubernetes-development is used –

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: server
  namespace: web
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: server
    spec:
      containers:
      - name: server
        image: loklak/loklak_server:latest-kubernetes-development
        ...

[SOURCE]

Readiness and Liveness Probes

Probes act as the top level tester for the health of a deployment in Kubernetes. The probes are performed periodically to ensure that things are working fine and appropriate steps are taken if they fail.

When a new image is updated, the older pod still runs and servers the requests. It is replaced by the new ones only when the probes are successful, otherwise, the update is rolled back.

In loklak, the /api/status.json endpoint gives information about status of deployment and hence is a good target for probes –

livenessProbe:
  httpGet:
    path: /api/status.json
    port: 80
  initialDelaySeconds: 30
  timeoutSeconds: 3
readinessProbe:
  httpGet:
    path: /api/status.json
    port: 80
  initialDelaySeconds: 30
  timeoutSeconds: 3

[SOURCE]

These probes are performed periodically and the server is restarted if they fail (non-success HTTP status code or takes more than 3 seconds).

Ports and Volumes

In the configurations, port 80 is exposed as this is where Jetty serves inside loklak –

ports:
- containerPort: 80
  protocol: TCP

[SOURCE]

If we notice, this is the port that we used for running the probes. Since the development branch deployment holds no dumps, we didn’t need to specify any explicit volumes for persistence.

Load Balancer Service

While creating the configurations, a new public IP is assigned to the deployment using Google Cloud Platform’s load balancer. It starts listening on port 80 –

ports:
- containerPort: 80
  protocol: TCP

[SOURCE]

Since this service creates a new public IP, it is recommended not to replace/recreate this services as this would result in the creation of new public IP. Other components can be updated individually.

Kubernetes Configurations for Elasticsearch

To maintain a persistent index, this deployment would require an external Elasticsearch cluster. loklak is able to connect itself to external Elasticsearch cluster by changing a few configurations.

Docker Image and Environment Variables

The image used for Elasticsearch is taken from pires/docker-elasticsearch-kubernetes. It allows easy configuration of properties from environment variables in configurations. Here is a list of configurable variables, but we needed just a few of them to do our task –

image: quay.io/pires/docker-elasticsearch-kubernetes:2.0.0
env:
- name: KUBERNETES_CA_CERTIFICATE_FILE
  value: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- name: NAMESPACE
  valueFrom:
    fieldRef:
      fieldPath: metadata.namespace
- name: "CLUSTER_NAME"
  value: "loklakcluster"
- name: "DISCOVERY_SERVICE"
  value: "elasticsearch"
- name: NODE_MASTER
  value: "true"
- name: NODE_DATA
  value: "true"
- name: HTTP_ENABLE
  value: "true"

[SOURCE]

Persistent Index using Persistent Cloud Disk

To make the index last even after the deployment is stopped, we needed a stable place where we could store all that data. Here, Google Compute Engine’s standard persistent disk was used. The disk can be created using GCP web portal or the gcloud CLI.

Before attaching the disk, we need to declare a volume where we could mount it –

volumeMounts:
- mountPath: /data
  name: storage

[SOURCE]

Now that we have a volume, we can simply mount the persistent disk on it –

volumes:
- name: storage
  gcePersistentDisk:
    pdName: data-index-disk
    fsType: ext4

[SOURCE]

Now, whenever we deploy these configurations, we can reuse the previous index.

Exposing Kubernetes to Cluster

The HTTP and transport clients are enabled on port 9200 and 9300 respectively. They can be exposed to the rest of the cluster using the following service –

apiVersion: v1
kind: Service
...
Spec:
  ...
  ports:
  - name: http
    port: 9200
    protocol: TCP
  - name: transport
    port: 9300
    protocol: TCP

[SOURCE]

Once deployed, other deployments can access the cluster API from ports 9200 and 9300.

Connecting loklak to Kubernetes

To connect loklak to external Elasticsearch cluster, TransportClient Java API is used. In order to enable these settings, we simply need to make some changes in configurations.

Since we enable the service named “elasticsearch” in namespace “elasticsearch”, we can access the cluster at address elasticsearch.elasticsearch:9200 (web) and elasticsearch.elasticsearch:9300 (transport).

To confine these changes only to Kubernetes deployment, we can use sed command while building the image (in Dockerfile) –

sed -i.bak 's/^\(elasticsearch_transport.enabled\).*/\1=true/' conf/config.properties && \
sed -i.bak 's/^\(elasticsearch_transport.addresses\).*/\1=elasticsearch.elasticsearch:9300/' conf/config.properties && \

[SOURCE]

Now when we create the deployments in Kubernetes cluster, loklak auto connects to the external elasticsearch index and creates indices if needed.

Verifying persistence of the Elasticsearch Index

In order to see that the data persists, we can completely delete the deployment or even the cluster if we want. Later, when we recreate the deployment, we can see all the messages already present in the index.

I  [2017-07-29 09:42:51,804][INFO ][node                     ] [Hellion] initializing ...
 
I  [2017-07-29 09:42:52,024][INFO ][plugins                  ] [Hellion] loaded [cloud-kubernetes], sites []
 
I  [2017-07-29 09:42:52,055][INFO ][env                      ] [Hellion] using [1] data paths, mounts [[/data (/dev/sdb)]], net usable_space [84.9gb], net total_space [97.9gb], spins? [possibly], types [ext4]
 
I  [2017-07-29 09:42:53,543][INFO ][node                     ] [Hellion] initialized
 
I  [2017-07-29 09:42:53,543][INFO ][node                     ] [Hellion] starting ...
 
I  [2017-07-29 09:42:53,620][INFO ][transport                ] [Hellion] publish_address {10.8.1.13:9300}, bound_addresses {10.8.1.13:9300}
 
I  [2017-07-29 09:42:53,633][INFO ][discovery                ] [Hellion] loklakcluster/cJtXERHETKutq7nujluJvA
 
I  [2017-07-29 09:42:57,866][INFO ][cluster.service          ] [Hellion] new_master {Hellion}{cJtXERHETKutq7nujluJvA}{10.8.1.13}{10.8.1.13:9300}{master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
 
I  [2017-07-29 09:42:57,955][INFO ][http                     ] [Hellion] publish_address {10.8.1.13:9200}, bound_addresses {10.8.1.13:9200}
 
I  [2017-07-29 09:42:57,955][INFO ][node                     ] [Hellion] started
 
I  [2017-07-29 09:42:58,082][INFO ][gateway                  ] [Hellion] recovered [8] indices into cluster_state

In the last line from the logs, we can see that indices already present on the disk were recovered. Now if we head to the public IP assigned to the cluster, we can see that the message count is restored.

Conclusion

In this blog post, I discussed how we utilised the Kubernetes setup to shift loklak to Google Cloud Platform. The deployment is active and can be accessed from the link provided under wiki section of loklak/loklak_server repo.

I introduced these changes in pull request loklak/loklak_server#1349 with the help of @niranjan94, @uday96 and @chiragw15.

Resources

Continue Reading

Scraping Concurrently with Loklak Server

At Present, SearchScraper in Loklak Server uses numerous threads to scrape Twitter website. The data fetched is cleaned and more data is extracted from it. But just scraping Twitter is under-performance.

Concurrent scraping of other websites like Quora, Youtube, Github, etc can be added to diversify the application. In this way, single endpoint search.json can serve multiple services.

As this Feature is under-refinement, We will discuss only the basic structure of the system with new changes. I tried to implement more abstract way of Scraping by:-

1) Fetching the input data in SearchServlet

Instead of selecting the input get-parameters and referencing them to be used, Now complete Map object is referenced, helping to be able to add more functionality based on input get-parameters. The dataArray object (as JSONArray) is fetched from DAO.scrapeLoklak method and is embedded in output with key results

    // start a scraper
    inputMap.put("query", query);
    DAO.log(request.getServletPath() + " scraping with query: "
           + query + " scraper: " + scraper);
    dataArray = DAO.scrapeLoklak(inputMap, true, true);

 

2) Scraping the selected Scrapers concurrently

In DAO.java, the useful get parameters of inputMap are fetched and cleaned. They are used to choose the scrapers that shall be scraped, using getScraperObjects() method.

Timeline2.Order order= getOrder(inputMap.get("order"));
Timeline2 dataSet = new Timeline2(order);
List<String> scraperList = Arrays.asList(inputMap.get("scraper").trim().split("\\s*,\\s*"));

 

Threads are created to fetch data from different scrapers according to size of list of scraper objects fetched. input map is passed as argument to the scrapers for further get parameters related to them and output data according to them.

List<BaseScraper> scraperObjList = getScraperObjects(scraperList, inputMap);
ExecutorService scraperRunner = Executors.newFixedThreadPool(scraperObjList.size());

try{
    for (BaseScraper scraper : scraperObjList)
    {
        scraperRunner.execute(() -> {
            dataSet.mergePost(scraper.getData());
        });

    }

} finally {
    scraperRunner.shutdown();

    try {
        scraperRunner.awaitTermination(24L, TimeUnit.HOURS);
    } catch (InterruptedException e) { }
}

 

3) Fetching the selected Scraper Objects in DAO.java

Here the variable of abstract class BaseScraper (SuperClass of all search scrapers) is used to create List of scrapers to be scraped. All the scrapers’ constructors are fed with input map to be scraped accordingly.

List<BaseScraper> scraperObjList = new ArrayList<BaseScraper>();
BaseScraper scraperObj = null;

if (scraperList.contains("github") || scraperList.contains("all")) {
    scraperObj = new GithubProfileScraper(inputMap);
    scraperObjList.add(scraperObj);
}
.
.
.

 

References:

Continue Reading

Caching Elasticsearch Aggregation Results in loklak Server

To provide aggregated data for various classifiers, loklak uses Elasticsearch aggregations. Aggregated data speaks a lot more than a few instances from it can say. But performing aggregations on each request can be very resource consuming. So we needed to come up with a way to reduce this load.

In this post, I will be discussing how I came up with a caching model for the aggregated data from the Elasticsearch index.

Fields to Consider while Caching

At the classifier endpoint, aggregations can be requested based on the following fields –

  • Classifier Name
  • Classifier Classes
  • Countries
  • Start Date
  • End Date

But to cache results, we can ignore cases where we just require a few classes or countries and store aggregations for all of them instead. So the fields that will define the cache to look for will be –

  • Classifier Name
  • Start Date
  • End Date

Type of Cache

The data structure used for caching was Java’s HashMap. It would be used to map a special string key to a special object discussed in a later section.

Key

The key is built using the fields mentioned previously –

private static String getKey(String index, String classifier, String sinceDate, String untilDate) {
    return index + "::::"
        + classifier + "::::"
        + (sinceDate == null ? "" : sinceDate) + "::::"
        + (untilDate == null ? "" : untilDate);
}

[SOURCE]

In this way, we can handle requests where a user makes a request for every class there is without running the expensive aggregation job every time. This is because the key for such requests will be same as we are not considering country and class for this purpose.

Value

The object used as key in the HashMap is a wrapper containing the following –

  1. json – It is a JSONObject containing the actual data.
  2. expiry – It is the expiry of the object in milliseconds.

class JSONObjectWrapper {
    private JSONObject json; 
    private long expiry;
    ... 
}

Timeout

The timeout associated with a cache is defined in the configuration file of the project as “classifierservlet.cache.timeout”. It defaults to 5 minutes and is used to set the eexpiryof a cached JSONObject –

class JSONObjectWrapper {
    ...
    private static long timeout = DAO.getConfig("classifierservlet.cache.timeout", 300000);

    JSONObjectWrapper(JSONObject json) {
        this.json = json;
        this.expiry = System.currentTimeMillis() + timeout;
    }
    ...
}

 

Cache Hit

For searching in the cache, the previously mentioned string is composed from the parameters requested by the user. Checking for a cache hit can be done in the following manner –

String key = getKey(index, classifier, sinceDate, untilDate);
if (cacheMap.keySet().contains(key)) {
    JSONObjectWrapper jw = cacheMap.get(key);
    if (!jw.isExpired()) {
        // Do something with jw
    }
}
// Calculate the aggregations
...

But since jw here would contain all the data, we would need to filter out the classes and countries which are not needed.

Filtering results

For filtering out the parts which do not contain the information requested by the user, we can perform a simple pass and exclude the results that are not needed.

Since the number of fields to filter out, i.e. classes and countries, would not be that high, this process would not be that resource intensive. And at the same time, would save us from requesting heavy aggregation tasks from the user.

Since the data about classes is nested inside the respective country field, we need to perform two level of filtering –

JSONObject retJson = new JSONObject(true);
for (String key : json.keySet()) {
    JSONArray value = filterInnerClasses(json.getJSONArray(key), classes);
    if ("GLOBAL".equals(key) || countries.contains(key)) {
        retJson.put(key, value);
    }
}

Cache Miss

In the case of a cache miss, the helper functions are called from ElasticsearchClient.java to get results. These results are then parsed from HashMap to JSONObject and stored in the cache for future usages.

JSONObject freshCache = getFromElasticsearch(index, classifier, sinceDate, untilDate);
cacheMap.put(key, new JSONObjectWrapper(freshCache));

The getFromElasticsearch method finds all the possible classes and makes a request to the appropriate method in ElasticsearchClient, getting data for all classifiers and all countries.

Conclusion

In this blog post, I discussed the need for caching of aggregations and the way it is achieved in the loklak server. This feature was introduced in pull request loklak/loklak_server#1333 by @singhpratyush (me).

Resources

Continue Reading

Implementing Tweet Search feature in Loklak Wok Android

Loklak Wok Android is a peer harvester that posts collected tweets to the Loklak Server. Along with that tweets can be searched using the app. This post describes how search API endpoint and TabLayout is used to implement the tweet searching feature.

Adding Dependencies to the project

This feature uses Retrofit2, Reactive extensions(RxJava2, RxAndroid and Retrofit RxJava adapter) and RetroLambda (for Java lambda support in Android).

In app/build.gradle:

apply plugin: 'com.android.application'
apply plugin: 'me.tatarka.retrolambda'

android {
   ...
   packagingOptions {
       exclude 'META-INF/rxjava.properties'
   }
}

dependencies {
   ...
   compile 'com.google.code.gson:gson:2.8.1'

   compile 'com.squareup.retrofit2:retrofit:2.3.0'
   compile 'com.squareup.retrofit2:converter-gson:2.3.0'
   compile 'com.squareup.retrofit2:adapter-rxjava2:2.3.0'

   compile 'io.reactivex.rxjava2:rxjava:2.0.5'
   compile 'io.reactivex.rxjava2:rxandroid:2.0.1'
}

 

In build.gradle project level:

dependencies {
   classpath 'com.android.tools.build:gradle:2.3.3'
   classpath 'me.tatarka:gradle-retrolambda:3.2.0'
}

 

Implementation

The search API endpoint is defined in LoklakApi interface which would provide the tweet search result.

public interface LoklakApi {

   @GET("api/search.json")
   Observable<Search> getSearchedTweets(
           @Query("q") String query,
           @Query("filter") String filter,
           @Query("count") int count);
}

 

The POJOs (Plain Old Java Objects) for the result of search API endpoint are obtained using jsonschema2pojo, Gson uses POJOs to convert JSON to Java objects and vice-versa.

The REST client is created by Retrofit2 and is implemented in RestClient class. The Gson converter and RxJava adapter for retrofit is added in the retrofit builder. create method is called to generate the API methods(retrofit implements LoklakApi Interface).

public class RestClient {

   private RestClient() {
   }

   private static void createRestClient() {
       sRetrofit = new Retrofit.Builder()
               .baseUrl(BASE_URL)
               // gson converter
               .addConverterFactory(GsonConverterFactory.create(gson))
               // retrofit adapter for rxjava
               .addCallAdapterFactory(RxJava2CallAdapterFactory.create())
               .build();
   }

   private static Retrofit getRetrofitInstance() {
       if (sRetrofit == null) {
           createRestClient();
       }
       return sRetrofit;
   }

   public static <T> T createApi(Class<T> apiInterface) {
       // create method to generate API methods
       return getRetrofitInstance().create(apiInterface);
   }

}

 

As search API endpoint provides filter parameter which can be used to filter out tweets containing images and videos. So, the tweets are displayed in three categories i.e. latest, images and videos.

The tweets of different category are displayed using a ViewPager. The fragments in ViewPager are inflated by a class that extends FragmentPagerAdapter. SearchFragmentPagerAdapter extends FragmentPagerAdapter, at least two methods getItem and getCount needs to be overridden. Going by the name of methods, getItem provides ith fragment to the  ViewPager and based on the value returned by getCount number of tabs are inflated in TabLayout, a ViewGroup to display fragments in ViewPager in an elegant way. For better UI, the names (here the category of tweets) are displayed, for which we override getPageTitle method.

public class SearchFragmentPagerAdapter extends FragmentPagerAdapter {

   private List<Fragment>  mFragmentList = new ArrayList<>();
   private List<String> mFragmentNameList = new ArrayList<>();

   public SearchFragmentPagerAdapter(FragmentManager fm) {
       super(fm);
   }

   @Override
   public Fragment getItem(int position) {
       return mFragmentList.get(position);
   }

   @Override
   public int getCount() {
       return mFragmentList.size();
   }

   @Override
   public CharSequence getPageTitle(int position) {
       return mFragmentNameList.get(position);
   }

   public void addFragment(Fragment fragment, String pageTitle) {
       mFragmentList.add(fragment);
       mFragmentNameList.add(pageTitle);
   }
}

 

For easy understanding an analogy with RecyclerView can be made. The TabLayout here functions as a RecyclerView, ViewPager does the work of LayoutManager and FragmentPagerAdapter is analogous to RecyclerView.Adapter.

Now, the fragments which contain the categorical tweets are inflated in the parent fragment. Firstly, the ViewPager of TabLayout is set. Then fragments and their names are added to the FragmentPagerAdapter using the addFragment method implemented in SearchFragmentAdapter class above and finally the created adapter is set as the adapter of ViewPager, which is implemented in setupWithViewPager method.

@Override
public View onCreateView(LayoutInflater inflater, ViewGroup container,
                        Bundle savedInstanceState) {
   // Inflate the layout for this fragment
   View rootView = inflater.inflate(R.layout.fragment_search, container, false);
   ButterKnife.bind(this, rootView);
   ...
   tabLayout.setupWithViewPager(viewPager);
   setupViewPager(viewPager);

   return rootView;
}

private void setupViewPager(ViewPager viewPager) {
   FragmentManager fragmentManager = getActivity().getSupportFragmentManager();
   SearchFragmentPagerAdapter pagerAdapter = new SearchFragmentPagerAdapter(fragmentManager);
   pagerAdapter.addFragment(SearchCategoryFragment.newInstance("", mQuery), "LATEST");
   pagerAdapter.addFragment(SearchCategoryFragment.newInstance("image", mQuery), "PHOTOS");
   pagerAdapter.addFragment(SearchCategoryFragment.newInstance("video", mQuery), "VIDEOS");
   viewPager.setAdapter(pagerAdapter);
}

 

SearchCategoryFragment are child fragments displayed as tabs in TabLayout. These child fragments are created using newInstance method which takes two parameters, category of tweets and the tweet search query respectively, the reason a constructor with these parameters are not used is that during a orientation change only the default constructor i.e. with no parameters is restored by Android system. So, these parameters are stored in a data structure called Bundle, once the fragment object is created using the default parameter the arguments present in the bundle are passed to fragment using setArguments method. These parameter are retrieved in onCreate lifecycle callback method of fragment which are used to fetch search results.

public static SearchCategoryFragment newInstance(String category, String query) {
   Bundle args = new Bundle();
   // query and category stored in bundle
   args.putString(Constants.TWEET_SEARCH_SUGGESTION_QUERY_KEY, query);
   args.putString(TWEET_SEARCH_CATEGORY_KEY, category);
   // fragment with default constructor created
   SearchCategoryFragment fragment = new SearchCategoryFragment();
   // arguments in bundle are passed to fragment
   fragment.setArguments(args);
   return fragment;
}

@Override
public void onCreate(@Nullable Bundle savedInstanceState) {
   super.onCreate(savedInstanceState);
   Bundle bundle = getArguments();
   if (bundle != null) {
       // arguments retrieved
       mTweetSearchCategory = bundle.getString(TWEET_SEARCH_CATEGORY_KEY);
       mSearchQuery = bundle.getString(Constants.TWEET_SEARCH_SUGGESTION_QUERY_KEY);
   }
}

 

As we have search query and category we can now obtain the search result and pass the obtained result – a List of type Status – to the adapter of RecyclerView which shows the tweets beautifully inside a CardView. The adapter and LayoutManager of RecyclerView are instantiated and set in onCreateView lifecycle callback method. Finally, network request is sent by calling fetchSearchedTweets method to obtain the search results.

@Override
public View onCreateView(LayoutInflater inflater, ViewGroup container,
                        Bundle savedInstanceState) {
   // Inflate the layout for this fragment
   View view = inflater.inflate(R.layout.fragment_search_category, container, false);
   ButterKnife.bind(this, view);

   mSearchCategoryAdapter = new SearchCategoryAdapter(getActivity(), new ArrayList<>());
   recyclerView.setLayoutManager(new LinearLayoutManager(getActivity()));
   recyclerView.setAdapter(mSearchCategoryAdapter);

   // request sent to obtain search result.
   fetchSearchedTweets();

   return view;
}

 

The LoklakApi interface is implemented using the created Rest client and then getSearchedTweets method is invoked which takes in search query, category of tweets and maximum number of results in the mentioned order. If the network request is successful then setSearchResultView is invoked else setNetworkErrorView.

private void fetchSearchedTweets() {
   LoklakApi loklakApi = RestClient.createApi(LoklakApi.class);
   loklakApi.getSearchedTweets(mSearchQuery, mTweetSearchCategory, 30)
           .subscribeOn(Schedulers.io()) // network request sent in a background thread
           .observeOn(AndroidSchedulers.mainThread())
           .subscribe(this::setSearchResultView, this::setNetworkErrorView);
}

 

setSearchResultView displays the obtained result if any in RecyclerView else shows a message that there is no result for the search query.

private void setSearchResultView(Search search) {
   List<Status> statusList = search.getStatuses();
   networkErrorTextView.setVisibility(View.GONE);
   if (statusList.size() == 0) { // request successful but no results
       recyclerView.setVisibility(View.GONE);

       Resources res = getResources();
       String noSearchResultMessage = res.getString(R.string.no_search_match, mSearchQuery); // no result matched message
       // no result message displayed
       noSearchResultFoundTextView.setVisibility(View.VISIBLE);
       noSearchResultFoundTextView.setText(noSearchResultMessage);
   } else { // there are some results, so display them in RecyclerView
       recyclerView.setVisibility(View.VISIBLE);
       mSearchCategoryAdapter.setStatuses(statusList);
   }
}

 

In case of a failed network request, a TextView is displayed asking the user to check the network connections and click on it to retry.

private void setNetworkErrorView(Throwable throwable) {
   Log.e(LOG_TAG, throwable.toString());
   recyclerView.setVisibility(View.GONE);
   networkErrorTextView.setVisibility(View.VISIBLE);
}

 

When the TextView, networkErrorTextView, is clicked a network request is sent again and the visibility of networkErrorTextView is changed to GONE, as implemented in setOnClickNetworkErrorTextViewListner

@OnClick(R.id.network_error)
public void setOnClickNetworkErrorTextViewListener() {
   networkErrorTextView.setVisibility(View.GONE);
   fetchSearchedTweets();
}

 

References:

Resources

Continue Reading

Creating CountryTweetMap app for Loklak apps site

The CountryTweetMap app is a web application which uses Loklak api and visualises loklak data on a map. The app is presently a part of Loklak apps site and can be used here. In this blog I have discussed in details about how I have developed CountryTweetMap. Before delving right into the development process let us know what the app does?

Related issue: https://github.com/fossasia/apps.loklak.org/issues/244

Overview

Loklak CoutryTweetMap uses a map to plot data returned by the Loklak api. The user needs to enter a query word in the input field and press search. The app uses Loklak API to determine all the tweets which contain the query words. Finally the various places from where tweets have been made containing the given query word are plotted on the map with markers of  specific color. Color can be red (highest number of tweets), blue (medium number of tweets) and green (less number of tweets). The three categories of markers are added as layers. That is, users can filter the markers. For example, the users can view only the green markers and hide the other markers if they want to know from which countries the number of tweets are low. Once the data is plotted there is also an option to plot distribution. It simply lists the returned data in a tabular form showing country v/s number of tweets.

Developing CountryTweetMap app

Now we know what CountryTweetMap does. Let us find out how it works. Like other loklak apps present on the app site, this app also uses the loklak API to fetch the data. Next it uses a framework called leaflet.js to plot the acquired data on map. ‘leaflet.js’ provides various APIs to deal with maps and corresponding layers. For date input, the app uses jquery UI, a library based on jquery which provides various ready to use components.

Let us iterate through each step involved in creating the app. At first we take input from user and get the data from Loklak server. This is done by a simple ajax call. However before making the ajax call and actually getting the data, we need to ensure certain things. Firstly, we need to make sure that the user has actually entered something in the query field, dates are in order, that is start date is less than end date and count is a number. All these checks are done in the search function itself.

$scope.error = "";
        if ($scope.tweet === undefined || $scope.tweet === "" || $scope.isLoading === true) {
            if ($scope.tweet === undefined || $scope.tweet === "") {
                $scope.error = "Please enter a valid query word."
            }
            if ($scope.isLoading === true) {
                $scope.error = "Previous search not completed. Please wait...";
            }
            $scope.showSnackbar();
            return;
        }

        var count = $(".count").val();
        if (count.length !== 0) {
            if (/^\d+$/.test(count) === false) {
                $scope.error = "Count should be a valid number.";
                $scope.showSnackbar();
                return;
            }
        }

        var sinceDate = $(".start-date").val();
        var endDate = $(".end-date").val();
        if ((sinceDate !== undefined && endDate !== "") && (endDate !== undefined && endDate !== "")) {
            var date1 = new Date(sinceDate);
            var date2 = new Date(endDate);
            if (endDate < sinceDate) {
                $scope.error = "End date should be larger than start date";
                $scope.showSnackbar();
                return;
            }
        }

We first check that the query field is not empty and whether the app is already processing some query or not. In either case we show a suitable message in a snackbar and return from the function. Next we validate the dates. For date comparison we use the inbuilt Date class already present in javascript. For validating the count field we use regex for identifying only unsigned numbers. Finally if everything is alright we fetch the the data making an ajax call to the loklak search service.

$scope.isLoading = true;
        var query = "q=" + $scope.tweet;

        if (sinceDate !== undefined && sinceDate !== "" ) {
            query += "%20since:" + sinceDate;
        }
        if (endDate !== undefined && endDate !== "") {
            query += "%20until:" + endDate;
        }

        // Change base url to api.loklak.org later
        var url = "http://35.184.151.104/api/search.json?callback=JSON_CALLBACK&" + query;
        var count = $(".count").val();
        if (count !== undefined && count !== "") {
            url  += "&count=" + count;
        }
        $http.jsonp(url)
            .then(function (response) {
                $scope.reset();
                $scope.prepareFreq(response.data.statuses);
                $scope.displayMap();
            });

Once we get the desired data from the ajax call, we prepare a country v/s number of tweets frequency and store it in a javascript object and then plot the map. For calculating the frequency we simply iterate over each status returned, insert each country code as key in the object, and corresponding statuses as value in a list. In this way we create a country v/s tweets distribution where against each country code we have a list of statuses.

data.forEach(function(status) {
            if (status.place_country_code !== undefined) {
                if ($scope.tweetFreq[status.place_country_code] === undefined) {
                    $scope.tweetFreq[status.place_country_code] = [];
                }
                $scope.tweetFreq[status.place_country_code].push(status);
            }
        });

Next we need a suitable point which will act as the center of the world map. We need to chose the center in such a way so that maximum number of markers are visible without any need to scroll. For this we calculate the average latitude and longitude and feed it to leaflet as the map center. Once we have all the data in place, we are ready to plot the map. But before that we need to create our markers. For each country we require a marker and the color of the marker is determined by the number of tweets associated with the corresponding country.
Next, we need to group all the markers of the same color into a single layer so that the user can add and remove all the markers at once.

for (var key in $scope.tweetFreq) {
            var country = $scope.tweetFreq[key][0];
            var size = $scope.tweetFreq[key].length;
            var markerIcon = null;
            var marker = null;
            if (size === $scope.tweetMax) {
                markerIcon = L.marker([country.place_country_center[1], country.place_country_center[0]], {icon: redIcon});
                marker = $scope.getMarker(markerIcon, country, key);
                countriesHigh.push(marker);
            } else if (size > rangeMid) {
                markerIcon = L.marker([country.place_country_center[1], country.place_country_center[0]], {icon: blueIcon});
                marker = $scope.getMarker(markerIcon, country, key);
                countriesMedium.push(marker)
            } else {
                markerIcon = L.marker([country.place_country_center[1], country.place_country_center[0]], {icon: greenIcon});
                marker = $scope.getMarker(markerIcon, country, key);
                countriesLow.push(marker);
            }
        }

The above code snippet classifies the markers into three different classes (identified by three colors) and groups them into layers. Finally we need to plot them on our map. This is extremely easy with the simple interface provided by leaflet.

var backgroundLight = L.tileLayer(
            'http://{s}.basemaps.cartocdn.com/light_all/{z}/{x}/{y}.png',
            {
                maxZoom: 18,
                minZoom: 1,
                noWrap: true
            });
        $scope.map = L.map('map', {
            center: [mapCenterLat, mapCenterLon],
            zoom: 2,
            'maxBounds': [
                [-90,-180],
                [90, 180]
            ],
            layers: [backgroundLight, countryHighGroup, countryMediumGroup, countryLowGroup]
        });
       var overlayMaps = {
            "Maximum tweets": countryHighGroup,
            "Medium tweets": countryMediumGroup,
            "Low tweets": countryLowGroup,
            "Heatmap": heat
        };
        var baseMaps = {
            "basemap": backgroundLight
        };
        L.control.layers(baseMaps, overlayMaps).addTo($scope.map);

What is happening in the above code snippet? First we create a title layer for our map. The title consists of the map tiles which we actually see. Here we are using basemaps from cartocdn.com. There are several other map layouts like openlayer and openstreets. Next we specify the zoom values for this layer and set nowrap to false. This means that the map tiles will not get repeated and we will not get multiple copies of the world.
Next we create our map, we set the map’s center, zoom and bounds. Bounds restrict the users from scrolling beyond a particular portion of the map. Finally we set our layers. We have four layers in our map, the background layer containing our map tiles and the three marker layers signifying high, medium and low number of tweets. We separate the layers into two categories, baseMaps and overlayMaps. After this we add these layers to our map and we are done.

Future roadmap

  • Adding a heatmap overlay.
  • Visualising country v/s tweet frequency using a suitable graph.

Important resources

Continue Reading
Close Menu