Creating a command line tool to initiate loklak app development

There are various apps presently hosted on, which shows several interesting and awesome stuff that can be done using the loklak API. Now the question is how to create a loklak app? Well previously there were two ways to create a loklak app, you could either create an app from scratch creating all the necessary files including index.html (app entry point), app.json (required for maintaining app metadata and store listing),, and other necessary files, or you could use the boilerplate app which provides you with a ready to go app template which you have to edit to create your app.

Recently there has been a new addition to the repository, a command line tool called loklakinit, built with python which will automate the process of initiating a loklak app on the go. The tool creates the entire loklak app directory structure including app.json with all the necessary details (as provided by the user) and defaults, index.html, a css and js subdirectory with a style.css and script.js in the respective folders, file, file and file with proper references in app.json.

All that the developer needs to do is, execute the following from the root of the repository.


This will start the process and initiate the app.

Creating loklakinit tool

Now let us delve into the code. So what actually the script does? How it works? Well the tool acts very much like the popular npm init command. At first the script creates a dictionary which stores the defaults for app.json. Next the script takes a number of inputs from the user and predicts some default values, if the user refuses to provide any input for a particular parameter then the default value for that parameter is used.

app_json = collections.OrderedDict()

app_json["@context"] = ""
app_json["@type"] = "SoftwareApplication"
app_json["permissions"] = "/api/search.json"
app_json["name"] = "myloklakapp"
app_json["headline"] = "My first loklak app"
app_json["alternativeHeadline"] = app_json["headline"]
app_json["applicationCategory"] = "Misc"
app_json["applicationSubCategory"] = ""
app_json["operatingSystem"] = ""
app_json["promoImage"] = "promo.png"
app_json["appImages"] = ""
app_json["oneLineDescription"] = ""
app_json["getStarted"] = ""
app_json["appUse"] = ""
app_json["others"] = ""

author = collections.OrderedDict()
author["@type"] = "Person"
author["name"] = ""
author["email"] = ""
author["url"] = ""
author["sameAs"] = "" 

app_json["author"] = author

The first part of the script inserts some default values into the app_json dictionary. Now what is an OrderedDict? Well, it is nothing but a python dictionary in which the order in which the keys are inserted is maintained. A ordinary python dictionary does not maintain the order of the keys.

while True :

  app_name = raw_input("name (" + app_json["name"] + ") : ")
  if app_name :
    app_json["name"] = app_name

  app_context = raw_input("@context (" + app_json["@context"] + ") : ")
  if app_context :
    app_json["@context"] = app_context

  app_type = raw_input("@type (" + app_json["@type"] + ") : ")
  if app_type :
    app_json["@type"] = app_type

  app_permissions = raw_input("permissions (" + app_json["permissions"] + ") : ")
  if app_permissions :
    app_json["permissions"] = app_permissions.split(",")

  app_headline = raw_input("headline (" + app_json["headline"] + ") : ")
  if app_headline :
    app_json["headline"] = app_headline
    app_json["alternativeHeadline"] = app_headline

  app_alternative_headline = raw_input("alternative headline (" + app_json["alternativeHeadline"] + ") : ")
  if app_alternative_headline :
    app_json["alternativeHeadline"] = app_alternative_headline

  app_category = raw_input("application category (" + app_json["applicationCategory"] + ") : ")
  if app_category :
    app_json["applicationCategory"] = app_category

  app_sub_category = raw_input("sub category : ")
  if app_sub_category :
    app_json["applicationSubCategory"] = app_sub_category

  app_os = raw_input("application operating system (" + app_json["operatingSystem"] + ") : ")
  if app_os :
    app_json["operatingSystem"] = app_os

  app_promo_image = raw_input("promo image (" + app_json["promoImage"] + ") : ")
  if app_promo_image :
    app_json["promoImage"] = app_promo_image

  app_images = raw_input("app preview images (values can be separted by comma) : ")
  if app_images :
    app_json["appImages"] = app_images.split(",")

  app_description = raw_input("one line description : ")
  if app_description :
    app_json["oneLineDescription"] = app_description

  author["name"] = raw_input("author name : ")

  author_type = raw_input("author type (" + author["@type"] + ") : ")
  if author_type :
    author["@type"] = author_type

  author["email"] = raw_input("email : ")

  author["url"] = raw_input("url : ")
  author["sameAs"] = author["url"]

  author["sameAs"] = raw_input("same as : ")

  app_json["author"] = author

  print json.dumps(app_json, indent=2)

  confirm = raw_input("confirm (yes) : ")

  if confirm.lower() != "no" :

The next part of the script asks for various inputs from the developer which he or she can either provide or skip, if an input skipped then the default value is used. However for some parameters like author name, url etc, no default value is used, the developer must provide those values later by editing the app.json file. Once all the inputs are taken, the script outputs the json object created, if the output is alright then the developer may press enter and the process will continue, otherwise the developer may enter ‘no’ and once again the script will start taking inputs.

Once the final app_json is created, next part is creating the actual directory structure with the default files. This is done as shown below.

index_html = open("index.html", 'w')
index_html.write("<!-- entry point of the app, on launching the appliccation" +
                  "this page will be displayed -->")

app_json_file = open("app.json", 'w')
app_json_file.write(json.dumps(app_json, indent=2))

get_started_md = open("", 'w')
get_started_md.write("<!-- getting started with the app -->")

app_use_md = open("", 'w')
app_use_md.write("<!-- use of this app -->")

others_md = open("", 'w')
others_md.write("<!-- other relevant information about this app -->")

style_css = open("css/style.css", 'w')
style_css.write("/* css to style the app */")

script_js = open("js/script.js", 'w')
script_js.write("// javascript logic for the app")

print "done"

A new directory is created in the root of the repository, the name of the directory is same as the app name. Python’s os module is used to create the directory and subdirectories. Inside the directory, app.json file is created in which app_json dictionary is dumped as a json object.
After this the other default files are created. These include index.html, js/script.js, css/style.css, and After everything is done script exits and the developer is all set to start with his development.

Finally is used to call

#!/usr/bin/env bash

python bin/

This allows users to initiate the script by just executing bin/ from root of the repository. The entire source code for this script can be found here (python code) and here (bash script)

Continue ReadingCreating a command line tool to initiate loklak app development

Enhancing Github profile scraper service in loklak

The Github profile scraper is one of the several scrapers present in the loklak project, some of the other scrapers being Quora profile scraper, WordPress profile scraper, Instagram profile scraper, etc.Github profile scraper scrapes the profile of a given github user and provides the data in json format. The scraped data contains user_id, followers, users following the given user, starred repositories of an user, basic information like user’s name, bio and much more. It uses the popular Java Jsoup library for scraping.

The entire source code for the scraper can be found here.

Changes made

The scraper previously provided very limited information like, name of the user, description, starred repository url, followers url, following url, user_id, etc. One of the major problem was the api accepted only the profile name as a parameter and it returned the entire data, that is, the scraper scraped all the data even if the user did not ask for it. Moreover the service provided certain urls as data for example starred repository url, followers url, following url instead of providing the actual data present at those urls.

The scraper contained only one big function where all the scraping was performed. The code was not modular.

The scraper has been enhanced in the ways mentioned below.

  • The entire code has been refactored.Code scraping user specific data has been separated from code scraping organization specific data. Also separate method has been created for accessing the Github API.
  • Apart from profile parameter, the api now accepts another parameter called terms. It is a list of fields the user wants information on. In this way the scraper scrapes only that much data which is required by the user. This allows better response time and prevents unnecessary scraping.
  • The scraper now provides more information like user gists, subscriptions, events, received_events and repositories.

Code refactoring

The code has been refactored to smaller methods. ‘getDataFromApi’ method has been designed to access Github API. All the other methods which want to make request to now calls ‘getDatFromApi’ method with required parameter. The method is shown below.

private static JSONArray getDataFromApi(String url) {
       URI uri = null;
       try {
           uri = new URI(url);
       } catch (URISyntaxException e1) {

       JSONTokener tokener = null;
       try {
           tokener = new JSONTokener(uri.toURL().openStream());
       } catch (Exception e1) {
       JSONArray arr = new JSONArray(tokener);
       return arr;


For example if we want to make a request to the endpoint then we can use the above method and we will get a JSONArray in return.

All the code which scrapes user related data has been moved to ‘scrapeGithubUser’ method. This method scrapes basic user information like full name of the user, bio, user name, atom feed link, and location.

String fullName = html.getElementsByAttributeValueContaining("class", "vcard-fullname").text();
githubProfile.put("full_name", fullName);
String userName = html.getElementsByAttributeValueContaining("class", "vcard-username").text();
githubProfile.put("user_name", userName);
String bio = html.getElementsByAttributeValueContaining("class", "user-profile-bio").text();
githubProfile.put("bio", bio);
String atomFeedLink = html.getElementsByAttributeValueContaining("type", "application/atom+xml").attr("href");
githubProfile.put("atom_feed_link", "" + atomFeedLink);
String worksFor = html.getElementsByAttributeValueContaining("itemprop", "worksFor").text();
githubProfile.put("works_for", worksFor);
String homeLocation = html.getElementsByAttributeValueContaining("itemprop", "homeLocation").attr("title");
githubProfile.put("home_location", homeLocation);


Next it returns other informations like starred repositories information, followers information, following information and information about the organizations the user belongs to but before fetching such data it checks whether such data is really needed. In this way unnecessary api calls are prevented.

if (terms.contains("starred") || terms.contains("all")) {
            String starredUrl = GITHUB_API_BASE + profile + STARRED_ENDPOINT;
            JSONArray starredData = getDataFromApi(starredUrl);
            githubProfile.put("starred_data", starredData);
            int starred = Integer.parseInt(html.getElementsByAttributeValue("class", "Counter").get(1).text());
            githubProfile.put("starred", starred);
        if (terms.contains("follows") || terms.contains("all")) {
            String followersUrl = GITHUB_API_BASE + profile + FOLLOWERS_ENDPOINT;
            JSONArray followersData = getDataFromApi(followersUrl);
            githubProfile.put("followers_data", followersData);
            int followers = Integer.parseInt(html.getElementsByAttributeValue("class", "Counter").get(2).text());
            githubProfile.put("followers", followers);
        if (terms.contains("following") || terms.contains("all")) {
            String followingUrl = GITHUB_API_BASE + profile + FOLLOWING_ENDPOINT;
            JSONArray followingData = getDataFromApi(followingUrl);
            githubProfile.put("following_data", followingData);
            int following = Integer.parseInt(html.getElementsByAttributeValue("class", "Counter").get(3).text());
            githubProfile.put("following", following);
        if (terms.contains("organizations") || terms.contains("all")) {
            JSONArray organizations = new JSONArray();
            Elements orgs = html.getElementsByAttributeValue("itemprop", "follows");
            for (Element e : orgs) {
                JSONObject obj = new JSONObject();
                String label = e.attr("aria-label");
                obj.put("label", label);
                String link = e.attr("href");
                obj.put("link", "" + link);
                String imgLink = e.children().attr("src");
                obj.put("img_link", imgLink);
                String imgAlt = e.children().attr("alt");
                obj.put("img_Alt", imgAlt);
            githubProfile.put("organizations", organizations);


Similarly ‘scrapeGithubOrg’ is used to scrape information related to Github organization.

API usage

The API can be used in the ways shown below.

  • api/githubprofilescraper.json?profile=<profile_name> : This is equivalent to api/githubprofilescraper.json?profile=<profile_name>&terms=all :This will return all the data the scraper is currently capable of collecting.
  • api/githubprofilescraper.json?profile=<profile_name>&terms=<list_containing_required_fields> : This will return the basic information about a profile and data on selective fields as mentioned by the user.

For example – api/githubprofilescraper.json?profile=djmgit&terms=followers,following,organizations

The above request will return data about followers, following, and organizations apart from basic profile information. Thus the scraper will scrape only those data that is required by the user.

Future scope

‘scrapeGithubUser’ is still lengthy and it can be further broken down into smaller methods. The part of the method which scrapes basic user information like name, bio, etc can be moved to a separate method. This will further increase code readability and maintainability.

Now we have a much more functional and enhanced Github profile scraper which provides more data than the previous one.

Continue ReadingEnhancing Github profile scraper service in loklak

Link one repository’s subdirectory to another

Loklak, a distributed social media message search server, generates its documentation automatically using Continuous Integration. It generates the documentation in the gh-pages branch of its repository. From there, the documentation is linked and updated in the gh-pages branch of repository.

The method with which Loklak links the documentations from their respective subprojects to the repository is by using `subtrees`. Stating the git documentation,

“Subtrees allow subprojects to be included within a subdirectory of the main project, optionally including the subproject’s entire history. A subtree is just a subdirectory that can be committed to, branched, and merged along with your project in any way you want.”

What this means is that with the help of subtrees, we can link any project to our project keeping it locally in a subdirectory of our main repository. This is done in such a way that the local content of that project can easily be synced with the original project through a simple git command.


A representation of how Loklak currently uses subtrees to link the repositories is as follows.

git clone --quiet --branch=gh-pages central-docs
cd central docs

git subtree pull --prefix=server gh-pages --squash -m "Update server subtree"

git subtree split:

The problem with the way documentation is generated currently is that the method is very bloated. The markup of the documentation is first compiled in each sub-projects’s repository and then aggregated in the repository. What we intend to do now is to keep all the markup documentation in the docs folder of the respective repositories and then create subtrees from these folders to a master repository. From the master repository, we then intend to compile the markups to generate the HTML pages.

Intended Implementation:

To implement this we will be making use of subtree split command.

    1. Run the following command in the subprojects repository. It creates a subtree from the docs directory and moves it to a new branch documentation.
git subtree --prefix=docs/ split -b documentation
    1. Clone the master repository. (This is the repository we intend to link the subdirectory with.)
git clone --quiet --branch=master loklak_docs
cd loklak_docs
    1. Retrieve the latest content from the subproject.
      • During the first run.
git subtree add --prefix=raw/server ../loklak_server documentation --squash -m "Update server subtree"
      • During following runs.
git subtree pull --prefix=raw/server ../loklak_server documentation --squash -m "Update server subtree"
    1. Push changes.
git push -fq origin master > /dev/null 2>&1
Continue ReadingLink one repository’s subdirectory to another

Continuous Deployment Implementation in Loklak Search

In current pace of web technology, the quick response time and low downtime are the core goals of any project. To achieve a continuous deployment scheme the most important factor is how efficiently contributors and maintainers are able to test and deploy the code with every PR. We faced this question when we started building loklak search.

As Loklak Search is a data driven client side web app, GitHub pages is the simplest way to set it up. At FOSSASIA apps are developed by many developers working together on different features. This makes it more important to have a unified flow of control and simple integration with GitHub pages as continuous deployment pipeline.

So the broad concept of continuous deployment boils down to three basic requirements

  1. Automatic unit testing.
  2. The automatic build of the applications on the successful merge of PR and deployment on the gh-pages branch.
  3. Easy provision of demo links for the developers to test and share the features they are working on before the PR is actually merged.

Automatic Unit Testing

At Loklak Search we use karma unit tests. For loklak search, we get the major help from angular/cli which helps in running of unit tests. The main part of the unit testing is TravisCI which is used as the CI solution. All these things are pretty easy to set up and use.

Travis CI has a particular advantage which is the ability to run custom shell scripts at different stages of the build process, and we use this capability for our Continuous Deployment.

Automatic Builds of PR’s and Deploy on Merge

This is the main requirement of the our CD scheme, and we do so by setting up a shell script. This file is in the project repository root.

There are few critical sections of the deploy script. The script starts with the initialisation instructions which set up the appropriate variables and also decrypts the ssh key which travis uses for pushing the repo on gh-pages branch (we will set up this key later).

  • Here we also check that we run our deploy script only when the build is for Master Branch and we do this by early exiting from the script if it is not so.


# Pull requests and commits to other branches shouldn't try to deploy.
if [ "$TRAVIS_PULL_REQUEST" != "false" -o "$TRAVIS_BRANCH" != "$SOURCE_BRANCH" ]; then
echo "Skipping deploy; The request or commit is not on master"
exit 0


  • We also store important information regarding the deploy keys which are generated manually and are encrypted using travis.
# Save some useful information
REPO=`git config remote.origin.url`
SHA=`git rev-parse --verify HEAD`

# Decryption of the deploy_key.enc
openssl aes-256-cbc -K $ENCRYPTED_KEY -iv $ENCRYPTED_IV -in deploy_key.enc -out deploy_key -d

chmod 600 deploy_key
eval `ssh-agent -s`
ssh-add deploy_key


  • We clone our repo from GitHub and then go to the Target Branch which is gh-pages in our case.
# Cloning the repository to repo/ directory,
# Creating gh-pages branch if it doesn't exists else moving to that branch
git clone $REPO repo
cd repo
git checkout $TARGET_BRANCH || git checkout --orphan $TARGET_BRANCH
cd ..

# Setting up the username and email.
git config "Travis CI"


  • Now we do a clean up of our directory here, we do this so that fresh build is done every time, here we protect our files which are static and are not generated by the build process. These are CNAME and 404.html
# Cleaning up the old repo's gh-pages branch except CNAME file and 404.html
find repo/* ! -name "CNAME" ! -name "404.html" -maxdepth 1  -exec rm -rf {} \; 2> /dev/null
cd repo

git add --all
git commit -m "Travis CI Clean Deploy : ${SHA}"


  • After checking out to our Master Branch we do an npm install to install all our dependencies here and then do our project build. Then we move our files generated by the ng build to our gh-pages branch, and then we make a commit, to this branch.
git checkout $SOURCE_BRANCH
# Actual building and setup of current push or PR.
npm install
ng build --prod --aot
git checkout $TARGET_BRANCH
mv dist/* .
# Staging the new build for commit; and then committing the latest build
git add .
git commit --amend --no-edit --allow-empty


  • Now the final step is to push our build files to gh-pages branch and as we only want to put the build there if the code has actually changed, we make sure by adding that check.
# Deploying only if the build has changed
if [ -z `git diff --name-only HEAD HEAD~1` ]; then

echo "No Changes in the Build; exiting"
exit 0

# There are changes in the Build; push the changes to gh-pages
echo "There are changes in the Build; pushing the changes to gh-pages"

# Actual push to gh-pages branch via Travis
git push --force $SSH_REPO $TARGET_BRANCH


Now this 70 lines of code handle all our heavy lifting and automates a large part of our CD. This makes sure that no incorrect builds are entering the gh-pages branch and also enabling smoother experience for both developers and maintainers.

The important aspect of this script is ability to make sure Travis is able to push to gh-pages. This requires the proper setup of Keys, and it definitely is the trickiest part the whole setup.

  • The first step is to generate the SSH key. This is done easily using terminal and ssh-keygen.
$ ssh-keygen -t rsa -b 4096 -C "”


  • I would recommend not using any passphrase as it will then be required by Travis and thus will be tricky to setup.
  • Now, this generates the RSA public/private key pair.
  • We now add this public deploy key to the settings of the repository.
  • After setting up the public key on GitHub we give the private key to Travis so that Travis is able to push on GitHub.
  • For doing this we use the Travis Client, this helps to encrypt the key properly and send the key and iv to the travis. Which then using these values is able to decrypt the private key.
$ travis encrypt-file deploy_key
encrypting deploy_key for domenic/travis-encrypt-file-example
storing result as deploy_key.enc
storing secure env variables for decryption

Please add the following to your build script (before_install stage in your .travis.yml, for instance):

    openssl aes-256-cbc -K $encrypted_0a6446eb3ae3_key -iv $encrypted_0a6446eb3ae3_key -in super_secret.txt.enc -out super_secret.txt -d

Pro Tip: You can add it automatically by running with --add.

Make sure to add deploy_key.enc to the git repository.
Make sure not to add deploy_key to the git repository.
Commit all changes to your .travis.yml.


  • Make sure to add deploy_key.enc to git repository and not to add deploy_key to git.

And after all these steps everything is done our client-side web application will deploy on every push on the master branch.

These steps are required only one time in project life cycle. At loklak search, we haven’t touched the since it was written, it’s a simple script but it does all the work of Continuous Deployment we want to achieve.

Generation of Demo Links and Test Deployments

This is also an essential part of the continuous agile development that developers are able to share what they have built and the maintainers to review those features and fixes. This becomes difficult in a web application as the fixes and features are more often than not visual and attaching screenshots with every PR become the hassle. If the developers are able to deploy their changes on their gh-pages and share the demo links with the PR then it’s a big win for development at a faster pace.

Now, this step is highly specific for Angular projects while there are are similar approaches for React and other frameworks as well, if not we can build the page easily and push our changes to gh-pages of our fork.

We use @angular/cli for building project then use angular-cli-ghpages npm package to actually push to gh-pages branch of the fork. These commands are combined and are provided as node command npm run deploy. And this makes our CD scheme complete.


Clearly, the continuous deployment scheme has a lot of advantages over the other methods especially in the client side web apps where there are a lot of PR’s. This essentially eliminates all the deployment hassles in a simple way that any deployment doesn’t require any manual interventions. The developers can simply concentrate on coding the application and maintainers can just simply review the PR’s by seeing the demo links and then merge when they feel like the PR is in good shape and the deployment is done all by the Shell Script without requiring the commands from a developer or a maintainer.


Loklak Search GitHub Repository:

Loklak Search Application:

Loklak Search TravisCI:

Deploy Script:

Further Resources

Continue ReadingContinuous Deployment Implementation in Loklak Search

Build a simple Angular 4 message publisher


The differences between Angular 1, Angular 2 and Angular 4

AngularJS/Angular 1 is a very popular framework which is based on MVC model, and it was released in October, 2010.
Angular 2, also called Angular, is based on component and is completely different from Angular 1 , which was released in September 2016.
Angular 4 is simply an update of Angular 2, which is released in March, 2017.

Note that Angular 3 was skipped due to the version number conflicts.



1. Install TypeScript:

npm install -g typescript

2. Install code editor: vscode, sublime, intellij.

3. Use Google chrome and development tool to debug.

Install Angular CLI by command line

Angular CLI is a very useful tool which contains a brunch of commands to work with Angular 4. In order to install Angular CLI, the version of Node should be higher or equal to 6.9.0, and NPM need to higher or equal to 3.

npm install -g @angular/cli

Create a new Project

ng new loklak-message-publisher
cd loklak-message-publisher
ng serve

The command will launch web browser automatically in localhost:4200 to run the application. It will detect any change in files. Of course you can change the default port to others.
The basic project structure is like this:

package.json: standard node configuration file, which including the name, version and dependency of the configuration.

tsconfig.json: configuration file for TypeScript compiler.

typings.json: another configuration file for typescript, mainly used for type checking.

app folder contains the main ts file, boot.ts and app.component.ts.


Next we will show how to build a simple message publisher. The final application looks like this, we can post new tweets in the timeline, and remove any tweet by clicking ‘x’ button:

  • The first step in to create new elements in app.component.html file.

The HTML file look quite simple, we save the file and switch to browser, the application is like this:

A very simple interface! In next step we need to add some style for the elements. Edit the app.component.css file:

.block {
   width: 800px;
   border: 1px solid #E8F5FD;
   border-bottom: 0px;
   text-align: center;
   margin: 0 auto;
   padding: 50px;
   font-family: monospace;
   background-color: #F5F8FA;
   border-radius: 5px;

h1 {
   text-align: center;
   color: #0084B4;
   margin-top: 0px;

.post {
   width: 600px;
   height: 180px;
   text-align: center;
   margin: 0 auto;
   background-color: #E8F5FD;
   border-radius: 5px;

.addstory {
   width: 80%;
   height: 100px;
   font-size: 15px;
   margin-top: 20px;
   resize: none;
   -webkit-border-radius: 5px;
   -moz-border-radius: 5px;
   border-radius: 5px;
   border-color: #fff;
   outline: none;

.post button {
   height: 30px;
   width: 75px;
   margin-top: 5px;
   position: relative;
   left: 200px;
   background-color: #008CBA;
   border: none;
   color: white;
   display: inline-block;
   border-radius: 6px;
   outline: none;

.post button:active {
   border: 0.2em solid #fff;
   opacity: 0.6;

.stream-container {
   width: 600px;
   border-bottom: 1px solid #E8F5FD;
   background-color: white;
   padding: 10px 0px;
   margin: 0 auto;
   border-radius: 5px;

.tweet {
   border: 1px solid;
   border-color: #E8F5FD;
   border-radius: 3px;

p {
   text-align: left;
   text-indent: 25px;
   display: inline-block;
   width: 500px;

span {
   display: inline-block;
   width: 30px;
   height: 30px;
   font-size: 30px;
   cursor: pointer;
   -webkit-text-fill-color: #0084B4;

The app is almost finished, so excited! Now we want to add some default tweets that will be shown up in our application.
Let’s open the app.component.ts file, add an array in class AppComponent.

Then in the app.component.html file, we add ngFor to instantiates a tweets in the page.

After we finished this step, all the tweets are displayed in our application. Now we have to implement ‘tweet’ button and delete button. We define addTweet and removeTweet functions in app.component.ts file.

For the addTweet function, we need to modify html to define the behavior of textarea and tweet button:

For removeTweet function, we add id for each tweet we posted, and combine delete span with removeTweet function, which pass the id of tweet.


There are two things about Angular 4 we can learn from this tutorial:

  1. How to bind functions or variables to DOM elements, e.g. click function, ngModel.
  2. How to use the directives such as *ngFor to instantiate template.
Continue ReadingBuild a simple Angular 4 message publisher

Using LokLak to Scrape Profiles from Quora, GitHub, Weibo and Instagram

Most of us are really so curious to know about one’s social life. So taking this as a key point, LokLak has many profile scrapers in it. Profile scraper which are now available in LokLak  helps us to know about the posts, followers one has. Few of the profile scrapers available in LokLak are Quora Profile, GitHub Profile, Weibo Profile and Instagram Profile.

How do the scrapers work?

In loklak now we are using java to get the json objects of the scraped profile from different websites as mentioned above. So here is a simple explanation how one of the scraper works. In this post I am going to give you a gist about how Github Profile scraper API works:

In the github profile scraper one can search for a profile without logging in and know the contents like the followers, repositories, gists of that profile and many more.

The simple query which can be used is:

To scrape individual profiles:

To scrape organization profiles:

Jsoup is an API and it is a easiest way used by java developers for scraping the web i.e.,web scraping. This API is used for manipulating and extracting data using DOM, CSS like methods. So in here, the Jsoup API is helping us to extract the html data and with the help of the tags used in the html extracted data we are trying to get the relevant data which is needed.

How do we get the matching elements?

We here are using special methods like getElementsByAttributeValueContaining() of the org.jsoup.nodes.Element class to get the data. For instance, to get the email from the extracted data the code is written as:

String email = html.getElementsByAttributeValueContaining("itemprop", "email").text();
            if (!email.contains("@"))
                  email = "";
            githubProfile.put("email", email);


Here is the java code which imports and extracts data:

Imports the html file:

html = Jsoup.connect("" + profile).get();

Extracts the html file for individual user:

/*If Individual*/
           if (html.getElementsByAttributeValueContaining("class", "user-profile-nav").size() != 0) {
               scrapeGithubUser(githubProfile, terms, profile, html);
           if (terms.contains("gists") || terms.contains("all")) {
               String gistsUrl = GITHUB_API_BASE + profile + GISTS_ENDPOINT;
               JSONArray gists = getDataFromApi(gistsUrl);
               githubProfile.put("gists", gists);
           if (terms.contains("subscriptions") || terms.contains("all")) {
               String subscriptionsUrl = GITHUB_API_BASE + profile + SUBSCRIPTIONS_ENDPOINT;
               JSONArray subscriptions = getDataFromApi(subscriptionsUrl);
               githubProfile.put("subscriptions", subscriptions);
           if (terms.contains("repos") || terms.contains("all")) {
               String reposUrl = GITHUB_API_BASE + profile + REPOS_ENDPOINT;
               JSONArray repos = getDataFromApi(reposUrl);
               githubProfile.put("repos", repos);
           if (terms.contains("events") || terms.contains("all")) {
               String eventsUrl = GITHUB_API_BASE + profile + EVENTS_ENDPOINT;
               JSONArray events = getDataFromApi(eventsUrl);
               githubProfile.put("events", events);
          if (terms.contains("received_events") || terms.contains("all")) {
              String receivedEventsUrl = GITHUB_API_BASE + profile + RECEIVED_EVENTS_ENDPOINT;
              JSONArray receivedEvents = getDataFromApi(receivedEventsUrl);
              githubProfile.put("received_events", receivedEvents);

Extracts the html file for organization:

/*If organization*/
if (html.getElementsByAttributeValue("class", "orgnav").size() != 0) {
    scrapeGithubOrg(profile, githubProfile, html);

And this is the sample output:

For query:
  "data": [{
    "joining_date": "2016-04-12",
    "gists_url": "",
    "repos_url": "",
    "user_name": "kavithaenair",
    "bio": "GSoC'17 @loklak @fossasia ; Developer @fossasia ; Intern @amazon",
    "subscriptions_url": "",
    "received_events_url": "",
    "full_name": "Kavitha E Nair",
    "avatar_url": "",
    "user_id": "18421291",
    "events_url": "",
    "organizations": [
        "img_link": "",
        "link": "",
        "label": "fossasia",
        "img_Alt": "@fossasia"
        "img_link": "",
        "link": "",
        "label": "coala",
        "img_Alt": "@coala"
        "img_link": "",
        "link": "",
        "label": "loklak",
        "img_Alt": "@loklak"
        "img_link": "",
        "link": "",
        "label": "bvrit-wise-django-team",
        "img_Alt": "@bvrit-wise-django-team"
    "home_location": "\n    Hyderabad, India\n",
    "works_for": "",
    "special_link": "",
    "email": "",
    "atom_feed_link": ""
  "metadata": {"count": 1},
  "session": {"identity": {
    "type": "host",
    "name": "",
    "anonymous": true
Continue ReadingUsing LokLak to Scrape Profiles from Quora, GitHub, Weibo and Instagram

Automatic Imports of Events to Open Event from online event sites with Query Server and Event Collect

One goal for the next version of the Open Event project is to allow an automatic import of events from various event listing sites. We will implement this using Open Event Import APIs and two additional modules: Query Server and Event Collect. The idea is to run the modules as micro-services or as stand-alone solutions.

Query Server
The query server is, as the name suggests, a query processor. As we are moving towards an API-centric approach for the server, query-server also has API endpoints (v1). Using this API you can get the data from the server in the mentioned format. The API itself is quite intuitive.

API to get data from query-server

GET /api/v1/search/<search-engine>/query=query&format=format

Sample Response Header

 Cache-Control: no-cache
 Connection: keep-alive
 Content-Length: 1395
 Content-Type: application/xml; charset=utf-8
 Date: Wed, 24 May 2017 08:33:42 GMT
 Server: Werkzeug/0.12.1 Python/2.7.13
 Via: 1.1 vegur

The server is built in Flask. The GitHub repository of the server contains a simple Bootstrap front-end, which is used as a testing ground for results. The query string calls the search engine result scraper that is based on the scraper at searss. This scraper takes search engine, presently Google, Bing, DuckDuckGo and Yahoo as additional input and searches on that search engine. The output from the scraper, which can be in XML or in JSON depending on the API parameters is returned, while the search query is stored into MongoDB database with the query string indexing. This is done keeping in mind the capabilities to be added in order to use Kibana analyzing tools.

The frontend prettifies results with the help of PrismJS. The query-server will be used for initial listing of events from different search engines. This will be accessed through the following API.

The query server app can be accessed on heroku.

➢ api/list​: To provide with an initial list of events (titles and links) to be displayed on Open Event search results.

When an event is searched on Open Event, the query is passed on to query-server where a search is made by calling with appending some details for better event hunting. Recent developments with Google include their event search feature. In the Google search app, event searches take over when Google detects that a user is looking for an event.

The feed from the scraper is parsed for events inside query server to generate a list containing Event Titles and Links. Each event in this list is then searched for in the database to check if it exists already. We will be using elastic search to achieve fuzzy searching for events in Open Event database as elastic search is planned for the API to be used.

One example of what we wish to achieve by implementing this type of search in the database follows. The user may search for

-Google Cloud Event Delhi
-Google Event, Delhi
-Google Cloud, Delhi
-google cloud delhi
-Google Cloud Onboard Delhi
-Google Delhi Cloud event

All these searches should match with “Google Cloud Onboard Event, Delhi” with good accuracy. After removing duplicates and events which already exist in the database from this list have been deleted, each event is rendered on search frontend of Open Event as a separate event. The user can click on any of these event, which will make a call to event collect.

Event Collect

The event collect project is developed as a separate module which has two parts

● Site specific scrapers
In its present state, event collect has scrapers for eventbrite and ticket-leap which, given a query, scrape eventbrite (and ticket-leap respectively) search results and downloads JSON files of each event using Loklak‘s API.
The scrapers can be developed in any form or any number of scrapers/scraping tools can be added as long as they are in alignment with the Open Event Import API’s data format. Writing tests for these against the concurrent API formats will take care of this. This part will be covered by using a json-validator​ to check against a pre-generated schema.

The scrapers are exposed through a set of APIs, which will include, but not limited to,
➢ api/fetch-event : ​to scrape any event given the link and compose the data in a predefined JSON format which will be generated based on Open Event Import API. When this function is called on an event link, scrapers are invoked which collect event data such as event, meta, forms etc. This data will be validated against the generated JSON schema. The scraped JSON and directory structure for media files:
➢ api/export : to export all the JSON data containing event information into Open Event Server. As and when the scraping is complete, the data will be added into Open Event’s database as a new event.

How the Import works

The following graphic shows how the import works.

Let’s dive into the workflow. So as the diagram illustrates, the ‘search​’ functionality makes a call to api/list API endpoint provided by query-server which returns with events’ ‘Title’ and ‘Event Link’ from the parsed XML/JSON feed. This list is displayed as Open Event’s search results. Now the results having been displayed, the user can click on any of the events. When the user clicks on any event, the event is searched for in Open Event’s database. Two things happen now:

  • The event page loads if the event is found.
  • If the event does not already exist in the database, clicking on any event will

➢ Insert this event’s title and link in the database and get the event_id

➢ Make a call to api/fetch-event in event-collect which then invokes a site-specific scraper to fetch data about the event the user has chosen

➢ When the data is scraped, it is imported into Open Event database using the previously generated event_id. The page will be loaded using jquery ajax ​as and when the scraping is done.​When the imports are done, the search page refreshes with the new results. The Open Event Orga Server exposes a well documented REST API that can be used by external services to access the data.

Continue ReadingAutomatic Imports of Events to Open Event from online event sites with Query Server and Event Collect

Writing Simple Unit-Tests with JUnit

In the Loklak Server project, we use a number of automation tools like the build testing tool ‘TravisCI’, automated code reviewing tool ‘Codacy’, and ‘Gemnasium’. We are also using JUnit, a java-based unit-testing framework for writing automated Unit-Tests for java projects. It can be used to test methods to check their behaviour whenever there is any change in implementation. These unit-tests are handy and are coded specifically for the project. In the Loklak Server project it is used to test the web-scrapers. Generally JUnit is used to check if there is no change in behaviour of the methods, but in this project, it also helps in keeping check if the website code has been modified, affecting the data that is scraped.

Let’s start with basics, first by setting up, writing a simple Unit-Tests and then Test-Runners. Here we will refer how unit tests have been implemented in Loklak Server to familiarize with the JUnit Framework.


Setting up JUnit with gradle is easy, You have to do just 2 things:-

1) Add JUnit dependency in build.gradle

Dependencies {

. . .

. . .<other compile groups>. . .

compile group: 'com.twitter', name: 'jsr166e', version: '1.1.0'

compile group: 'com.vividsolutions', name: 'jts', version: '1.13'

compile group: 'junit', name: 'junit', version: '4.12'

compile group: 'org.apache.logging.log4j', name: 'log4j-1.2-api', version: '2.6.2'

compile group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.6.2'

. . .

. . .



2) Add source for ‘test’ task from where tests are built (like here).

Save all tests in test directory and keep its internal directory structure identical to src directory structure. Now set the path in build.gradle so that they can be compiled. = ['test']


Writing Unit-Tests

In JUnit FrameWork a Unit-Test is a method that tests a particular behaviour of a section of code. Test methods are identified by annotation @Test.

Unit-Test implements methods of source files to test their behaviour. This can be done by fetching the output and comparing it with expected outputs.

The following test tests if twitter url that is created is valid or not that is to be scraped.


 * This unit-test tests twitter url creation



public void testPrepareSearchURL() {

String url;

String[] query = {"fossasia", "from:loklak_test",

"spacex since:2017-04-03 until:2017-04-05"};

String[] filter = {"video", "image", "video,image", "abc,video"};

String[] out_url = {



"and other output url strings to be matched…..."


// checking simple urls

for (int i = 0; i < query.length; i++) {

url = TwitterScraper.prepareSearchURL(query[i], "");

//compare urls with urls created

assertThat(out_url[i], is(url));


// checking urls having filters

for (int i = 0; i < filter.length; i++) {

url = TwitterScraper.prepareSearchURL(query[0], filter[i]);

//compare urls with urls created

assertThat(out_url[i+3], is(url));




Testing the implementation of code is useless as it will either make code more difficult to change or tests useless  . So be cautious while writing tests and keep difference between Implementation and Behaviour in mind.

This is the perfect example for a simple Unit-Test. As we see there are some points, which needs to be observed like:-

1) There is a annotation @Test .

2) Input array of query which is fed to the method TwitterScraper.prepareSearchURL() .

3) Array of urls out_url[], which are the expected urls to output.

4) asserThat() to compare the expected url (in array out_url[]) and the output url (in variable ‘url’).

NOTE: assertEquals() could also be used here, but we prefer to use assert methods to get error message that is readable (We will discuss about this some time later)

And the TestRunner

When we are working on a project, It is not feasible to run tests using gradle as they are first built  (else verified whether tests are build-ready) and then executed. gradle test shall be used only for building and testing the tests. For testing the project, one shall set-up TestRunner. It allows to run specific set of tests, one wants to run.

TestRunners are built once using gradle (with other tests) and can be run whenever you want. Also it is easy to stack up the test classes you want to run in SuiteClasses and @RunWith to run SuiteClasses with the TestRunner.

In loklak server, TestRunner runs the web-scraper tests. They are used by developers to test the changes they have made.

This is a sample TestRunner, code link here .

package org.loklak;

// Library classes imported

import org.junit.runner.RunWith;

import org.junit.runners.Suite;

// Source files to be tested

import org.loklak.harvester.TwitterScraperTest;

import org.loklak.harvester.YoutubeScraperTest;


* TestRunner for harvesters







public class TestRunner {



You can also add TestRunners for different sections of the project. Like here it is initialized only to test harvesters.

To run the TestRunner

Add classpath of the jar file of the project and run ‘JUnitCore’ with TestRunner to get output on terminal.

java -classpath .:build/libs/<yourProject>.jar:build/classes/test org.junit.runner.JUnitCore org.loklak.TestRunner

In the project we have set up a shell script to run the tests.

Few points

1) Build the project and tests separately. Build tests only when changed as they take time to be built and executed.

2) Whenever you are done with the coding part, run the tests using TestRunner.

3) Write unit-tests whenever you add a new feature to the project to keep it up-to-date.

Now lets end up here.

So for now, Code it, Test it and Repeat.


Continue ReadingWriting Simple Unit-Tests with JUnit

Displaying error notifications in whatsTrending? app

The issue I am solving in the whatsTrending app is to display error notifications when the date fields and the count field are not validated and when a user enters invalid data. Specifically we want to display error notifications for junk values and dates with formats other than YYYY-MM-DD and any other invalid data in the whatsTrending app’s filter option.

The whatsTrending app is a web app that shows the top trending hashtags of twitter messages in a given date range using tweets collected by the loklak search engine. Users can also limit the number of top hash tags they want to see and use filters with start and end dates.

App to know trending hashtags on twitter

What is the problem? The date fields and the count field are not validated which means junk values and date with formats other than YYYY-MM-DD do not show any error.

So how can the problem be solved? Well the format (pattern) of the date can be verified by regular expression. A regular expression describes a pattern in a given text.So the format checking problem can be described as finding the pattern YYYY-MM-DD in the input date where Y, M and D are numbers.The Regex should specify that the pattern should be present at the beginning of the text.

More detailed information about regex can be found here.

The regex for this pattern is :


The pattern says there should be 4 numbers followed by ‘-’ then two numbers then again ‘-’ and then again two numbers.

This can be implemented the following way :

$scope.isValidDate = function(dateString) {
        var regEx = /^\d{4}-\d{2}-\d{2}$/;
        if (dateString.match(regEx) === null) {
            return false;

        dateComp = dateString.split('-');
        var i=0;
        for (i=0; i<dateComp.length; i++) { dateComp[i] = parseInt(dateComp[i]); } if (dateComp.length > 3) {
            return false;

        if (dateComp[1] > 12 || dateComp[1] <= 0) { return false; } if (dateComp[2] > 31 || dateComp[2] <= 0) { return false; } if (((dateComp[1] === 4) || (dateComp[1] === 6) || (dateComp[1] === 9) || (dateComp[1] === 11)) && (dateComp[2] > 30)) {
            return false;

        if (dateComp[1] ===2) {
            if (((dateComp[0] % 4 === 0) && (dateComp[0] % 100 !== 0)) || (dateComp[0] % 400 === 0)) {
                if (dateComp[2] > 29) {
                    return false;
            } else {
                if (dateComp[2] > 28) {
                    return false;

        return true;

So the first part of the code checks for the above mentioned pattern in the input. If not found it returns false.If found then we split the entire date into a list containing year, month and day and the remaining part if any is removed.Each component is converted to integer.Then further validation is done on the month and day as can be seen from the code above.The range of the month and date is checked.Also leap year checking is done.

In the same way the count field is also validated. The regex for this field is much simpler. We just need to check that the input consists only of numbers and nothing else.
So the regex for this is :


This means repetition of digits in the range 0-9.We search for this pattern in the text. If found we return true else false.The function for this is as follows:

$scope.isNumber = function(numString) {
        var regEx = /^[0-9]+$/;
        return String(numString).match(regEx) != null;

Next we need to call these function and see if their is any error. If there is an error we need to display it.This can be done using a modal. Bootstrap has got an inbuilt modal which can be invoked using javascript.

Showing error using modal

So at first we need to define the modal and its content (empty if necessary as in this case)using HTML.The HTML code for this can be found here.

A small yet nice tutorial on Bootstrap modal can be found here
Next we need to set the content of the modal and invoke it from our JS file on encountering an error.

$scope.displayErrorModal = function(val, type) {
        if (type === 0) {
            if (!$scope.isValidDate(val)) {
                $scope.loading = false;
                $('.modal-body').html('Please enter valid date in YYYY-MM-DD format'); 
                return false; 
         } else { 
             if (!$scope.isNumber(val)) { 
                 $scope.loading = false; 
                 $('.modal-body').html('Please enter a valid number'); 
                 return false; 
         return true; 

The above function accepts a parameter val and another parameter type.The parameter type tells what validation needs to be performed, date validation or number validation and calls previous two methods accordingly and passes val which is the value to validated.If any of the validation fails then it sets the content of the modal using : $(‘.modal-body’).html(“your content”) and then invokes it using : $(‘#modalID’).modal(‘show’). This displays a nice modal on the page and the user is notified about the error.

So this is it for this post.Thanks for reading it.My next post will be on fixing the design of the boilerplate app.

Continue ReadingDisplaying error notifications in whatsTrending? app

Generating a documentation site from markup documents with Sphinx and Pandoc

Generating a fully fledged website from a set of markup documents is no easy feat. But thanks to the wonderful tool sphinx, it certainly makes the task easier. Sphinx does the heavy lifting of generating a website with built in javascript based search. But sometimes it’s not enough.

This week we were faced with two issues related to documentation generation on loklak_server and susi_server. First let me give you some context. Now sphinx requires an index.rst file within /docs/  which it uses to generate the first page of the site. A very obvious way to fill it which helps us avoid unnecessary duplication is to use the include directive of reStructuredText to include the README file from the root of the repository.

This leads to the following two problems:

  • Include directive can only properly include a reStructuredText, not a markdown document. Given a markdown document, it tries to parse the markdown as  reStructuredText which leads to errors.
  • Any relative links in README break when it is included in another folder.

To fix the first issue, I used pypandoc, a thin wrapper around Pandoc. Pandoc is a wonderful command line tool which allows us to convert documents from one markup format to another. From the official Pandoc website itself,

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

pypandoc requires a working installation of Pandoc, which can be downloaded and installed automatically using a single line of code.


This gives us a cross-platform way to download pandoc without worrying about the current platform. Now, pypandoc leaves the installer in the current working directory after download, which is fine locally, but creates a problem when run on remote systems like Travis. The installer could get committed accidently to the repository. To solve this, I had to take a look at source code for pypandoc and call an internal method, which pypandoc basically uses to set the name of the installer. I use that method to find out the name of the file and then delete it after installation is over. This is one of many benefits of open-source projects. Had pypandoc not been open source, I would not have been able to do that.

url = pypandoc.pandoc_download._get_pandoc_urls()[0][pf]
filename = url.split(‘/’)[-1]

Here pf is the current platform which can be one of ‘win32’, ‘linux’, or ‘darwin’.

Now let’s take a look at our second issue. To solve that, I used regular expressions to capture any relative links. Capturing links were easy. All links in reStructuredText are in the same following format.

`Title <url>`__

Similarly links in markdown are in the following format


Regular expressions were the perfect candidate to solve this. To detect which links was relative and need to be fixed, I checked which links start with the \docs\ directory and then all I had to do was remove the \docs prefix from those links.

A note about loklak and susi server project

Loklak is a server application which is able to collect messages from various sources, including twitter.

SUSI AI is an intelligent Open Source personal assistant. It is capable of chat and voice interaction and by using APIs to perform actions such as music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, and other real time information

Continue ReadingGenerating a documentation site from markup documents with Sphinx and Pandoc