Deploy to Azure Button for loklak

In this blog post, am going to tell you about yet a new deployment method for loklak which is easy and quick with just one click. Deploying to Azure Websites from a Git repository just got a little easier with the Deploy to Azure Button. Simply place the button in README.md with a link to the loklak, and users who click on it will be directed to a streamlined deployment process. If we want to do something more advanced and customize this behavior, then add an ARM template called “azuredeploy.json” at the root of the repository which will cause users to be presented with different inputs and configure your services as specified.

I’m going to walk you through a workflow that I used to test them before checking them in to my repo, as well as describe some of the special behaviors that the “Deploy to Azure” site does

Adding a button

To add a deployment button, insert the following markdown to your README.md file:

[![Deploy to Azure](https://azuredeploy.net/deploybutton.svg)](https://deploy.azure.com/?repository=https://github.com/loklak/loklak_server)

How it works

When a user clicks on the button, a “referrer” header is sent to azuredeploy.net which contains the location of the Git repository of loklak_server to deploy from.

An Example Template

This is a blank template which shows, how the azure divides its inputs.

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
  },
  "variables": {
  },
  "resources": [
  ],
  "outputs": {
  }
}

By following the above template, in the case of loklak server, the parameters used are name, image i.e., docker image, port , number of CPUs to be utilized and space i.e., memory required.

In the resources section we use container, the type of the container will be

"type": "Microsoft.ContainerInstance/containerGroups",

 

And as output, we expect a public IP address to access the azure cloud instance created by us.

Everything under the root “parameters” property will be inputs into our template. Then these parameter values feed into the resources defined later in the template with the “[parameters(‘paramName’)]” syntax.

Try the “Deploy to Azure” Button here:



Resources

Working of One Click Deployment Buttons in loklak

Today’s topic is deployment. It’s called one-click deployment for a reason: Developers are lazy. It’s hard to do less than clicking on one button, so that’s our goal to make use of one click button in loklak.

For one click buttons we only need a central build server, which is our loklak_server. Everything written here was based on Apache ant, but later on ant build was deprecated and loklak server started to use gradle build. We wanted to make the process of provisioning and setting up a complete infrastructure of your own, from server to continuous integration tasks, as easy as possible. These button allows you to do all of that in one click.

How does it work?

You can see the one click buttons in the README page of loklak_server repository.

These repositories may include a different files like scalingo.json for scalingo, docker-compose.yml and docker-cloud.yml for docker cloud etc files at their root, allowing them to define a few things like a name, description, logo and build environment (Gradle build in the case of loklak server). Once you’ve clicked on any of the buttons, you will be redirected to respective apps and prompted with this information for you to review before confirming the fork.

This will effectively fork the repository in your account. Once the repo is ready, you can click on it. You will then be asked to “activate” or “deploy” your branch, allowing it to provision actual servers and run tasks. At the same time, you will be asked to review and potentially modify a few variables that were defined in the predefined files (for eg: app.json for heroku) of the apps. These are usually things like the Git URL of the repo for loklak, or some of the details related to the cloud provider you want to use (eg: Digital Ocean).

Once you confirmed this last step, your branch i.e., most probably master branch of loklak server repo is activated and the button will start provisioning and configuring your servers, along with the tasks which may allow you to build and deploy your app. In most of the cases, you can go to the tasks/setup section and run the build task that will fetch loklak server’s code, build it and deploy it on your server, all configurations included and will get a public IP.

What’s next

In loklak we are also introducing new one click “AZURE” button, then the users can also start deploying loklak in azure platform.

Resources

Auto Deploying loklak Server on Google Cloud Using Travis

This is a setup for loklak server which want to check in only the source files, but have the development branch in Kubernetes deployment automatically updated with some compiled output every time the push using details from Travis build.

How to achieve it?

Unix commands and shell script is one of the best option to automate all deployment and build activities. I explored Kubernetes Gcloud which can be accessed through unix command.

1.Checking for Travis build details before deployment:

Firstly check whether the repository is loklak_server, pull request is available and branches are either master or development, and then decide to update the docker image or not. The code of the aforementioned things is as follows:

if [ "$TRAVIS_REPO_SLUG" != "loklak/loklak_server" ]; then
    echo "Skipping image update for repo $TRAVIS_REPO_SLUG"
    exit 0
fi

if [ "$TRAVIS_PULL_REQUEST" != "false" ]; then
    echo "Skipping image update for pull request"
    exit 0
fi

if [ "$TRAVIS_BRANCH" != "master" ] && [ "$TRAVIS_BRANCH" != "development" ]; then
    echo "Skipping image update for branch $TRAVIS_BRANCH"
    exit 0
fi

2. Setting up Tag and Decrypting the credentials:

For the Kubernetes deployment, each time the travis build is successful, it takes the commit details from travis and appended into tag details for deployment and gcloud credentials is decrypted from the json file.

openssl aes-256-cbc -K $encrypted_48d01dc243a6_key -iv $encrypted_48d01dc243a6_iv  -in kubernetes/gcloud-credentials.json.enc -out kubernetes/gcloud-credentials.json -d

3. Install, Authenticate and Configure GCloud details with Kubernetes:

In this step, initially Google Cloud SDK should be installed with Kubernetes-

curl https://sdk.cloud.google.com | bash > /dev/null
source ~/google-cloud-sdk/path.bash.inc
gcloud components install kubectl

 

Then, Authenticate Google Cloud using the above mentioned decrypted credentials and finally configure the Google Cloud with the details like zone, project name, cluster details, number of nodes etc.

4. Update the Kubernetes deployment:

Since, in this issue it is specific to the loklak_server/development branch, so in here it checks if the branch is development or not and then updates the deployment using following command:

if [ $TRAVIS_BRANCH == "development" ]; then
    kubectl set image deployment/server --namespace=web server=$TAG
fi

 

Conclusion:

In this post, how to write a script in such a way that with each successful push after travis build how to update the deployment on Kubernetes GCloud.

Resources:

Deploying loklak Server on Kubernetes with External Elasticsearch

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

kubernetes.io

Kubernetes is an awesome cloud platform, which ensures that cloud applications run reliably. It runs automated tests, flawless updates, smart roll out and rollbacks, simple scaling and a lot more.

So as a part of GSoC, I worked on taking the loklak server to Kubernetes on Google Cloud Platform. In this blog post, I will be discussing the approach followed to deploy development branch of loklak on Kubernetes.

New Docker Image

Since Kubernetes deployments work on Docker images, we needed one for the loklak project. The existing image would not be up to the mark for Kubernetes as it contained the declaration of volumes and exposing of ports. So I wrote a new Docker image which could be used in Kubernetes.

The image would simply clone loklak server, build the project and trigger the server as CMD

FROM alpine:latest

ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

WORKDIR /loklak_server

RUN apk update && apk add openjdk8 git bash && \
    git clone https://github.com/loklak/loklak_server.git /loklak_server && \
    git checkout development && \
    ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport && \
    # Some Configurations and Cleanups

CMD ["bin/start.sh", "-Idn"]

[SOURCE]

This image wouldn’t have any volumes or exposed ports and we are now free to configure them in the configuration files (discussed in a later section).

Building and Pushing Docker Image using Travis

To automatically build and push on a commit to the master branch, Travis build is used. In the after_success section, a call to push Docker image is made.

Travis environment variables hold the username and password for Docker hub and are used for logging in –

docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD

[SOURCE]

We needed checks there to ensure that we are on the right branch for the push and we are not handling a pull request –

# Build and push Kubernetes Docker image
KUBERNETES_BRANCH=loklak/loklak_server:latest-kubernetes-$TRAVIS_BRANCH
KUBERNETES_COMMIT=loklak/loklak_server:kubernetes-$TRAVIS_COMMIT
  
if [ "$TRAVIS_BRANCH" == "development" ]; then
    docker build -t loklak_server_kubernetes kubernetes/images/development
    docker tag loklak_server_kubernetes $KUBERNETES_BRANCH
    docker push $KUBERNETES_BRANCH
    docker tag $KUBERNETES_BRANCH $KUBERNETES_COMMIT
    docker push $KUBERNETES_COMMIT
elif [ "$TRAVIS_BRANCH" == "master" ]; then
    # Build and push master
else
    echo "Skipping Kubernetes image push for branch $TRAVIS_BRANCH"
fi

[SOURCE]

Kubernetes Configurations for loklak

Kubernetes cluster can completely be configured using configurations written in YAML format. The deployment of loklak uses the previously built image. Initially, the image tagged as latest-kubernetes-development is used –

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: server
  namespace: web
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: server
    spec:
      containers:
      - name: server
        image: loklak/loklak_server:latest-kubernetes-development
        ...

[SOURCE]

Readiness and Liveness Probes

Probes act as the top level tester for the health of a deployment in Kubernetes. The probes are performed periodically to ensure that things are working fine and appropriate steps are taken if they fail.

When a new image is updated, the older pod still runs and servers the requests. It is replaced by the new ones only when the probes are successful, otherwise, the update is rolled back.

In loklak, the /api/status.json endpoint gives information about status of deployment and hence is a good target for probes –

livenessProbe:
  httpGet:
    path: /api/status.json
    port: 80
  initialDelaySeconds: 30
  timeoutSeconds: 3
readinessProbe:
  httpGet:
    path: /api/status.json
    port: 80
  initialDelaySeconds: 30
  timeoutSeconds: 3

[SOURCE]

These probes are performed periodically and the server is restarted if they fail (non-success HTTP status code or takes more than 3 seconds).

Ports and Volumes

In the configurations, port 80 is exposed as this is where Jetty serves inside loklak –

ports:
- containerPort: 80
  protocol: TCP

[SOURCE]

If we notice, this is the port that we used for running the probes. Since the development branch deployment holds no dumps, we didn’t need to specify any explicit volumes for persistence.

Load Balancer Service

While creating the configurations, a new public IP is assigned to the deployment using Google Cloud Platform’s load balancer. It starts listening on port 80 –

ports:
- containerPort: 80
  protocol: TCP

[SOURCE]

Since this service creates a new public IP, it is recommended not to replace/recreate this services as this would result in the creation of new public IP. Other components can be updated individually.

Kubernetes Configurations for Elasticsearch

To maintain a persistent index, this deployment would require an external Elasticsearch cluster. loklak is able to connect itself to external Elasticsearch cluster by changing a few configurations.

Docker Image and Environment Variables

The image used for Elasticsearch is taken from pires/docker-elasticsearch-kubernetes. It allows easy configuration of properties from environment variables in configurations. Here is a list of configurable variables, but we needed just a few of them to do our task –

image: quay.io/pires/docker-elasticsearch-kubernetes:2.0.0
env:
- name: KUBERNETES_CA_CERTIFICATE_FILE
  value: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- name: NAMESPACE
  valueFrom:
    fieldRef:
      fieldPath: metadata.namespace
- name: "CLUSTER_NAME"
  value: "loklakcluster"
- name: "DISCOVERY_SERVICE"
  value: "elasticsearch"
- name: NODE_MASTER
  value: "true"
- name: NODE_DATA
  value: "true"
- name: HTTP_ENABLE
  value: "true"

[SOURCE]

Persistent Index using Persistent Cloud Disk

To make the index last even after the deployment is stopped, we needed a stable place where we could store all that data. Here, Google Compute Engine’s standard persistent disk was used. The disk can be created using GCP web portal or the gcloud CLI.

Before attaching the disk, we need to declare a volume where we could mount it –

volumeMounts:
- mountPath: /data
  name: storage

[SOURCE]

Now that we have a volume, we can simply mount the persistent disk on it –

volumes:
- name: storage
  gcePersistentDisk:
    pdName: data-index-disk
    fsType: ext4

[SOURCE]

Now, whenever we deploy these configurations, we can reuse the previous index.

Exposing Kubernetes to Cluster

The HTTP and transport clients are enabled on port 9200 and 9300 respectively. They can be exposed to the rest of the cluster using the following service –

apiVersion: v1
kind: Service
...
Spec:
  ...
  ports:
  - name: http
    port: 9200
    protocol: TCP
  - name: transport
    port: 9300
    protocol: TCP

[SOURCE]

Once deployed, other deployments can access the cluster API from ports 9200 and 9300.

Connecting loklak to Kubernetes

To connect loklak to external Elasticsearch cluster, TransportClient Java API is used. In order to enable these settings, we simply need to make some changes in configurations.

Since we enable the service named “elasticsearch” in namespace “elasticsearch”, we can access the cluster at address elasticsearch.elasticsearch:9200 (web) and elasticsearch.elasticsearch:9300 (transport).

To confine these changes only to Kubernetes deployment, we can use sed command while building the image (in Dockerfile) –

sed -i.bak 's/^\(elasticsearch_transport.enabled\).*/\1=true/' conf/config.properties && \
sed -i.bak 's/^\(elasticsearch_transport.addresses\).*/\1=elasticsearch.elasticsearch:9300/' conf/config.properties && \

[SOURCE]

Now when we create the deployments in Kubernetes cluster, loklak auto connects to the external elasticsearch index and creates indices if needed.

Verifying persistence of the Elasticsearch Index

In order to see that the data persists, we can completely delete the deployment or even the cluster if we want. Later, when we recreate the deployment, we can see all the messages already present in the index.

I  [2017-07-29 09:42:51,804][INFO ][node                     ] [Hellion] initializing ...
 
I  [2017-07-29 09:42:52,024][INFO ][plugins                  ] [Hellion] loaded [cloud-kubernetes], sites []
 
I  [2017-07-29 09:42:52,055][INFO ][env                      ] [Hellion] using [1] data paths, mounts [[/data (/dev/sdb)]], net usable_space [84.9gb], net total_space [97.9gb], spins? [possibly], types [ext4]
 
I  [2017-07-29 09:42:53,543][INFO ][node                     ] [Hellion] initialized
 
I  [2017-07-29 09:42:53,543][INFO ][node                     ] [Hellion] starting ...
 
I  [2017-07-29 09:42:53,620][INFO ][transport                ] [Hellion] publish_address {10.8.1.13:9300}, bound_addresses {10.8.1.13:9300}
 
I  [2017-07-29 09:42:53,633][INFO ][discovery                ] [Hellion] loklakcluster/cJtXERHETKutq7nujluJvA
 
I  [2017-07-29 09:42:57,866][INFO ][cluster.service          ] [Hellion] new_master {Hellion}{cJtXERHETKutq7nujluJvA}{10.8.1.13}{10.8.1.13:9300}{master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
 
I  [2017-07-29 09:42:57,955][INFO ][http                     ] [Hellion] publish_address {10.8.1.13:9200}, bound_addresses {10.8.1.13:9200}
 
I  [2017-07-29 09:42:57,955][INFO ][node                     ] [Hellion] started
 
I  [2017-07-29 09:42:58,082][INFO ][gateway                  ] [Hellion] recovered [8] indices into cluster_state

In the last line from the logs, we can see that indices already present on the disk were recovered. Now if we head to the public IP assigned to the cluster, we can see that the message count is restored.

Conclusion

In this blog post, I discussed how we utilised the Kubernetes setup to shift loklak to Google Cloud Platform. The deployment is active and can be accessed from the link provided under wiki section of loklak/loklak_server repo.

I introduced these changes in pull request loklak/loklak_server#1349 with the help of @niranjan94, @uday96 and @chiragw15.

Resources

Best Practices when writing Tests for loklak Server

Why do we write unit-tests? We write them to ensure that developers’ implementation doesn’t change the behaviour of parts of the project. If there is a change in the behaviour, unit-tests throw errors. This keep developers in ease during integration of the software and ensure lower chances of unexpected bugs.

After setting up the tests in Loklak Server, we were able to check whether there is any error or not in the test. Test failures didn’t mention the error and the exact test case at which they failed. It was YoutubeScraperTest that brought some of the best practices in the project. We modified the tests according to it.

The following are some of the best practices in 5 points that we shall follow while writing unit tests:

  1. Assert the assertions

There are many assert methods which we can use like assertNull, assertEquals etc. But we should use one which describes the error well (being more descriptive) so that developer’s effort is reduced while debugging.

Using these assertions related preferences help in getting to the exact errors on test fails, thus helping in easier debugging of the code.

Some examples can be:-

  • Using assertThat() over assertTrue

assertThat() give more descriptive errors over assertTrue(). Like:-

When assertTrue() is used:

java.lang.AssertionError: Expected: is <true> but: was <false> at org.loklak.harvester.TwitterScraperTest.testSimpleSearch(TwitterScraperTest.java:142) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at org.hamcr.......... 

 

When assertThat() is used:

java.lang.AssertionError:
Expected: is <true>
     but: was <false>
at org.loklak.harvester.TwitterScraperTest.testSimpleSearch(TwitterScraperTest.java:142)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at org.hamcr...........

 

NOTE:- In many cases, assertThat() is preferred over other assert method (read this), but in some cases other methods are used to give better descriptive output (like in next examples)

  • Using assertEquals() over assertThat()

For assertThat()

java.lang.AssertionError:

Expected: is "ar photo #test #car https://pic.twitter.com/vd1itvy8Mx"

but: was "car photo #test #car https://pic.twitter.com/vd1itvy8Mx"

at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)

at org.junit.Assert.assertThat(Ass........

 

For assertEquals()

org.junit.ComparisonFailure: expected:<[c]ar photo #test #car ...> but was:<[]ar photo #test #car ...>

at org.junit.Assert.assertEquals(Assert.java:115)

at org.junit.Assert.assertEquals(Assert.java:144)

at org.loklak.harvester.Twitter.........

 

We can clearly see that second example gives better error description than the first one.(An SO link)

  1. One Test per Behaviour

Each test shall be independent of other with none having mutual dependencies. It shall test only a specific behaviour of the module that is tested.

Have a look of this snippet. This test checks the method that creates the twitter url by comparing the output url method with the expected output url.

@Test

public void testPrepareSearchURL() {

    String url;

    String[] query = {

        "fossasia", "from:loklak_test",

        "spacex since:2017-04-03 until:2017-04-05"

    };

    String[] filter = {"video", "image", "video,image", "abc,video"};

    String[] out_url = {

        "https://twitter.com/search?f=tweets&vertical=default&q=fossasia&src=typd",

        "https://twitter.com/search?f=tweets&vertical=default&q=fossasia&src=typd",

    };

    // checking simple urls

    for (int i = 0; i < query.length; i++) {

        url = TwitterScraper.prepareSearchURL(query[i], "");


        //compare urls with urls created

        assertThat(out_url[i], is(url));

    }

}

 

This unit-test tests whether the method-under-test is able to create twitter link according to query or not.

  1. Selecting test cases for the test

We shall remember that testing is a very costly task in terms of processing. It takes time to execute. That is why, we need to keep the test cases precise and limited. In loklak server, most of the tests are based on connection to the respective websites and this step is very costly. That is why, in implementation, we must use least number of test cases so that all possible corner cases are covered.

  1. Test names

Descriptive test names that are short but give hint about their task which are very helpful. A comment describing what it does is a plus point. The following example is from YoutubeScraperTest. I added this point to my ‘best practices queue’ after reviewing the code (when this module was in review process).

/**

* When try parse video from input stream should check that video parsed.

* @throws IOException if some problem with open stream for reading data.

*/

@Test

public void whenTryParseVideoFromInputStreamShouldCheckThatJSONObjectGood() throws IOException {

    //Some tests related to method

}

 

AND the last one, accessing methods

This point shall be kept in mind. In loklak server, there are some tests that use Reflection API to access private and protected methods. This is the best example for reflection API.

In general, such changes to access specifiers are not allowed, that is why we shall resolve this issue with the help of:-

  •  Setters and Getters (if available, use it or else create them)
  •  Else use Reflection

If the getter methods are not available, using Reflection API will be the last resort to access the private and protected members of the class. Hereunder is a simple example of how a private method can be accessed using Reflection:

void getPrivateMethod() throws Exception {

    A ret = new A();

    Class<?> clazz = ret.getClass();

    Method method = clazz.getDeclaredMethod("changeValue", Integer.TYPE);

    method.setAccessible(true);

    System.out.println(method.invoke(ret, 2)); 
    //set null if method is static

}

 

I should end here. Try applying these practices, go through the links and get sync with these ‘Best Practices’ 🙂

Resources:

Editing files and piped data with “sed” in Loklak server

What is sed ?

“sed” is used in “Loklak Server” one of the most popular projects of FOSSASIA. “sed” is acronym for “Stream Editor” used for filtering and transforming text, as mentioned in the manual page of sed. Stream can be a file or input from a pipeline or even standard input. Regular expressions are used to filter the text and transformation are carried out using sed commands, either inline or from a file. So, most of the time writing a single line does the work of text substitution, removal or to obtaining a value from a text file.

Basic Syntax of “sed”

$sed [options]... {inline commands or file having sed commands} [input_file]...

Loklak Server uses a config.properties file – contains key-value pairs – which is the basis of the server as it contains configuration values, used by the server during runtime for various operations.

Let’s go through a simple sed example that prints line containing word “https” at the beginning in the config.properties file.

$sed -n '/^https/p' config.properties

Here “-n” option suppresses automatically printing of pattern space (pattern space is where each line is put that is to be processed by sed). Without “-n” option sed will print the whole file.

Now, the regular expression part,  “/^https” matches all the lines that has “https” at the start of line and “/p” is print command to print the output in console. Finally we provide the filename i.e. config.properties. If filename is not provided then sed waits for input from standard input.

Use cases of “sed” in Loklak Server

  • Displaying proper port number in message while starting or installing Loklak Server

The default port of loklak server is port number 9000, but it can be started in any non-occupied port by using “-p” flag with bin/start.sh and bin/installation.sh like

$ bin/installation.sh -p 8888

starts installation of Loklak Server in port 8888. To display the proper localhost address so that user can open it in a browser the port number in shortlink.urlstub parameter in config.properties needs to be changed. This is carried out by the function change_shortlink_urlstub in bin/utility.sh. The function is defined as

Now let’s try to understand what the sed command is doing.

“-i” option is used for in-place editing of the specified file i.e. config.properties in conf directory.

s” is substitute command of sed. The regular expression can be divided into two parts, between “/”:

  1. \(shortlink\.urlstub=http:.*:\)\(.*\) this is used to find the match in a line.
  2. \1′”$1″ is used to substitute the matched string in part 1.

The regular expressions can be split into groups so that operations can be performed on each group separately. A group is enclosed between “\(“ and “\)”. In our 1st part of regular expressions, there are two groups.

Dissecting the first group i.e. \(shortlink\.urlstub=http:.*:\):

  • shortlink\.urlstub=http:” will match the expression “shortlink.urlstub=http:”, here “\” is used as an escape sequence as “.” in regex represents any character.
  • “.*:”, “.” represents any character and “*” represents 0 or more characters of the previous character. So, it will basically match any expression. But, it ends with a “:”, which means any expression that ends with a “:”. Thus it matches the expression “//localhost:”.

So, the first group collectively matches the expression “shortlink.urlstub=http://localhost:”.

As described above second group i.e. \(.*\) will match any expression, here matches “9000”.

Now coming to the 2nd part of regular expression i.e. \1′”$1″:

  • “\1” represents the match of the first group in 1st part i.e. “shortlink.urlstub=http://localhost:”.
  • “$1” is the value of the first parameter provided to our function “change_shortlink_urlstub” which is an unused port where we want to run Loklak Server.

So 2nd part picks up the match from the first group and concatenates with the first parameter of the function. Assuming the first parameter to the function is “8888”, the expression for 2nd part becomes “shortlink.urlstub=http://localhost:8888” which replaces “shortlink.urlstub=http://localhost:9000”.

So a correct localhost address is displayed in the console while starting or installing Loklak Server.

  • Extracting value of a key from config.properties

“grep” and “sed” are used to extract the values of key from config.properties in bash scripts e.g. extracting default port, value of “port.http” in bin/start.sh, value of “shortlink.urlstub” in bin/start.sh and bin/installation.sh.

$ grep -iw 'port.http' conf/config.properties | sed 's/^[^=]*=//'

Here grep is used to filter a line and pass the filtered line to sed by piping the output. “i” flag is for ignoring case sensitivity and “w” flag is used for matching of word only. The output of

$ grep -iw 'port.http' conf/config.properties
port.http=9000

The aim is to get “9000”, the value of “port.http”. The approach used here is to substitute “port.http=” in the output of grep command above with a string of zero characters, that way only “9000” remains.

Let’s deconstruct the “sed” part, s/^[^=]*=//:

s” command of sed is used for substitution.  Here 1st part is “^[^=]*=” and 2nd part is nothing, as no characters are enclosed within “//”.

  • “^” means to match at the start.
  • [^characters] represents not to consider a set of characters. For matching a set of characters “^” is not used inside square brackets, [characters]. Here [^=] means not to include equal to – “=” symbol – and “*” after that makes it to match characters that are not “=”.

So “^[^=]*=” matches a sequence of characters that doesn’t start with “=” and followed by “=”. Thus the matched expression in this case is “http.port=” (“h” is not “=” and it ends with “=”), which is substituted by a string with zero characters which leaves “9000”.  Finally, “9000” is assigned to the required variable is bash script.

Writing Simple Unit-Tests with JUnit

In the Loklak Server project, we use a number of automation tools like the build testing tool ‘TravisCI’, automated code reviewing tool ‘Codacy’, and ‘Gemnasium’. We are also using JUnit, a java-based unit-testing framework for writing automated Unit-Tests for java projects. It can be used to test methods to check their behaviour whenever there is any change in implementation. These unit-tests are handy and are coded specifically for the project. In the Loklak Server project it is used to test the web-scrapers. Generally JUnit is used to check if there is no change in behaviour of the methods, but in this project, it also helps in keeping check if the website code has been modified, affecting the data that is scraped.

Let’s start with basics, first by setting up, writing a simple Unit-Tests and then Test-Runners. Here we will refer how unit tests have been implemented in Loklak Server to familiarize with the JUnit Framework.

Setting-UP

Setting up JUnit with gradle is easy, You have to do just 2 things:-

1) Add JUnit dependency in build.gradle

Dependencies {

. . .

. . .<other compile groups>. . .

compile group: 'com.twitter', name: 'jsr166e', version: '1.1.0'

compile group: 'com.vividsolutions', name: 'jts', version: '1.13'

compile group: 'junit', name: 'junit', version: '4.12'

compile group: 'org.apache.logging.log4j', name: 'log4j-1.2-api', version: '2.6.2'

compile group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.6.2'

. . .

. . .

}

 

2) Add source for ‘test’ task from where tests are built (like here).

Save all tests in test directory and keep its internal directory structure identical to src directory structure. Now set the path in build.gradle so that they can be compiled.

sourceSets.test.java.srcDirs = ['test']

 

Writing Unit-Tests

In JUnit FrameWork a Unit-Test is a method that tests a particular behaviour of a section of code. Test methods are identified by annotation @Test.

Unit-Test implements methods of source files to test their behaviour. This can be done by fetching the output and comparing it with expected outputs.

The following test tests if twitter url that is created is valid or not that is to be scraped.

/**

 * This unit-test tests twitter url creation

 */

@Test

public void testPrepareSearchURL() {

String url;

String[] query = {"fossasia", "from:loklak_test",

"spacex since:2017-04-03 until:2017-04-05"};

String[] filter = {"video", "image", "video,image", "abc,video"};

String[] out_url = {

"https://twitter.com/search?f=tweets&vertical=default&q=fossasia&src=typd",

"https://twitter.com/search?f=tweets&vertical=default&q=from%3Aloklak_test&src=typd",

"and other output url strings to be matched…..."

};


// checking simple urls

for (int i = 0; i < query.length; i++) {

url = TwitterScraper.prepareSearchURL(query[i], "");


//compare urls with urls created

assertThat(out_url[i], is(url));

}


// checking urls having filters

for (int i = 0; i < filter.length; i++) {

url = TwitterScraper.prepareSearchURL(query[0], filter[i]);


//compare urls with urls created

assertThat(out_url[i+3], is(url));

}

}

 

Testing the implementation of code is useless as it will either make code more difficult to change or tests useless  . So be cautious while writing tests and keep difference between Implementation and Behaviour in mind.

This is the perfect example for a simple Unit-Test. As we see there are some points, which needs to be observed like:-

1) There is a annotation @Test .

2) Input array of query which is fed to the method TwitterScraper.prepareSearchURL() .

3) Array of urls out_url[], which are the expected urls to output.

4) asserThat() to compare the expected url (in array out_url[]) and the output url (in variable ‘url’).

NOTE: assertEquals() could also be used here, but we prefer to use assert methods to get error message that is readable (We will discuss about this some time later)

And the TestRunner

When we are working on a project, It is not feasible to run tests using gradle as they are first built  (else verified whether tests are build-ready) and then executed. gradle test shall be used only for building and testing the tests. For testing the project, one shall set-up TestRunner. It allows to run specific set of tests, one wants to run.

TestRunners are built once using gradle (with other tests) and can be run whenever you want. Also it is easy to stack up the test classes you want to run in SuiteClasses and @RunWith to run SuiteClasses with the TestRunner.

In loklak server, TestRunner runs the web-scraper tests. They are used by developers to test the changes they have made.

This is a sample TestRunner, code link here .

package org.loklak;


// Library classes imported

import org.junit.runner.RunWith;

import org.junit.runners.Suite;

// Source files to be tested

import org.loklak.harvester.TwitterScraperTest;

import org.loklak.harvester.YoutubeScraperTest;


/*

* TestRunner for harvesters

*/

@RunWith(Suite.class)

@Suite.SuiteClasses({

TwitterScraperTest.class,

YoutubeScraperTest.class

})

public class TestRunner {

}

 

You can also add TestRunners for different sections of the project. Like here it is initialized only to test harvesters.

To run the TestRunner

Add classpath of the jar file of the project and run ‘JUnitCore’ with TestRunner to get output on terminal.

java -classpath .:build/libs/<yourProject>.jar:build/classes/test org.junit.runner.JUnitCore org.loklak.TestRunner

In the project we have set up a shell script to run the tests.

Few points

1) Build the project and tests separately. Build tests only when changed as they take time to be built and executed.

2) Whenever you are done with the coding part, run the tests using TestRunner.

3) Write unit-tests whenever you add a new feature to the project to keep it up-to-date.

Now lets end up here.

So for now, Code it, Test it and Repeat.

Resources: