Scraping in JavaScript using Cheerio in Loklak

FOSSASIA recently started a new project loklak_scraper_js. The objective of the project is to develop a single library for web-scraping that can be used easily in most of the platforms, as maintaining the same logic of scraping in different programming languages and project is a headache and waste of time. An obvious solution to this was writing scrapers in JavaScript, reason JS is lightweight, fast, and its functions and classes can be easily used in many programming languages e.g. Nashorn in Java.

Cheerio is a library that is used to parse HTML. Let’s look at the youtube scraper.

Parsing HTML

Steps involved in web-scraping:

  1. HTML source of the webpage is obtained.
  2. HTML source is parsed and
  3. The parsed HTML is traversed to extract the required data.

For 2nd and 3rd step we use cheerio.

Obtaining the HTML source of a webpage is a piece of cake, and is done by function getHtml, sync-request library is used to send the “GET” request.

Parsing of HTML can be done using the load method by passing the obtained HTML source of the webpage, as in getSearchMatchVideos function.

var $ = cheerio.load(htmlSourceOfWebpage);

 

Since, the API of cheerio is similar to that of jquery, as a convention the variable to reference cheerio object which has parsed HTML is named “$”.

Sometimes, the requirement may be to extract data from a particular HTML tag (the tag contains a large number of nested children tags) rather than the whole HTML that is parsed. In that case, again load method can be used, as used in getVideoDetails function to obtain only the head tag.

var head = cheerio.load($("head").html());

html” method provides the html content of the selected tag i.e. <head> tag. If a parameter is passed to the html method then the content of selected tag (here <head>) will be replaced by the html of new parameter.

Extracting data from parsed HTML

Some of the contents that we see in the webpage are dynamic, they are not static HTML. When a “GET” request is sent the static HTML of webpage is obtained. When Inspect element is done it can be seen that the class attribute has different value in the webpage we are using than the static HTML we obtain from “GET” request using getHtml function. For example, inspecting the link of one of suggested videos, see the different values of class attribute :

 

  • In website (for better view):

  • In static HTML, obtained from “GET” request using getHtml function (for better view):

So, it is recommended to do a check first, whether attributes have same values or not, and then proceed accordingly.

Now, let’s dive into the actual scraping stuff.

As most of the required data are available inside head tag in meta tag. extractMetaAttribute function extracts the value of content attribute based on another provided attribute and its value.

function extractMetaAttribute(cheerioObject, metaAttribute, metaAttributeValue) {
	var selector = 'meta[' + metaAttribute + '="' + metaAttributeValue + '"]';
	return cheerioFunction(selector).attr("content");
}

cheerioObject” here will be the “head” object created above.

For example, our final JSONObject contains a og_url key-value pair, to get that we need to obtain the following html element.

<meta property="og:url" content="https://www.youtube.com/watch?v=KVGRN7Z7T1A">

 

This can be obtained by:

  1. Writing a selector for property attribute of meta. The selector would be ‘meta[property=”og:url”]’.
  2. The selector is passed to cheerioObject.
  3. Then attr method is used to obtain the value of content attribute.
  4. Finally, we set the obtained value of content attribute as the value of JSONObject’s key.

Similarly og:site_name, og:url and other values can be extracted, which in the final JSONObject would be the value of keys og_site_name, og_url and similarly. Since, a lot of data needs to be extracted this way, the extractMetaAttribute function generalizes it, where metaAttribute is “property” and metaAttributeValue is “og:url” in the above example.

If one parameter is provided in attr method, then it is used as a getter method, the value of that attribute is returned. If two parameters are provided then first parameter is the name of attribute and second parameter is the value of attribute, in this case it is used as a setter method.

Now, what if the provided selector matches more than one html element and we need to extract data or perform some operations on all of them. The answer is using each method on the cheerio Object, it iterates over the matched elements and executes the passed function – as a parameter – on them. The passed function has two parameters, the index of matched element and the matched element itself. To break out of the loop early, false is returned.

One of the use case of each method in youtube scraper is to extract related “tags” of the video.

Selector for this would be ‘meta[property=”og:video:tag”]’ and as it is inside a head tag, we can use the already created head tag. Applying the each method, it becomes:

head('meta[property="og:video:tag"]').each(function(i, element) {
    // the logic goes here
});

 

Here for the first iteration the value of “i” will be “0” and “element” will be

<meta property="og:video:tag" content="Iggy">

 

and so on. We need to obtain the value of content attribute, so we can use attr method as used above. Finally all the values are pushed to an array. Hence, the final code snippet with logic.

var ary = [];
head('meta[property="og:video:tag"]').each(function(i, element) {
    ary.push(head(element).attr("content"));
});

 

The same functionality is implemented in extractMetaProperties method.

function extractMetaProperties(cheerioObj, metaProperty) {
	var properties = [];
	var selector = 'meta[property="' + metaProperty + '"]';
	cheerioObj(selector).each(function(i, element) {
		properties.push(cheerioObj(element).attr("content"));
	});
	return properties;}
Continue Reading

Refactor of Dropdown Menu in Susper

The first version of the Susper top menu was providing links to resources and tutorials. In the next version of the menu, we were looking for a menu with more colorful icons, a cleaner UI design and a menu that should appear on the homepage as well. In this blog, I will discuss about refactoring the dropdown menu. This is how earlier dropdown of Susper looks like:

We decided to create a separate component for the menu DropdownComponent.

At first, I created a drop down menu with matching dimensions similar to what Google follows. Then, I gave padding: 28px to create similar UI to market leader. This will make a dropdown menu with clean UI design. I replaced the old icons with colorful icons. In the dropdown we have:

  • Added more projects of FOSSASIA like eventyay, loklak, susi and main website of FOSSASIA. Here how it looks now :

The main problem I faced was aligning the content inside the dropdown and they should not get disturbed when the screen size changes.
I kept the each icon dimensions as 48 x 48 inside drop down menu. I also arranged them in a row. It was easy to use div element to create rows rather than using ul and li tags which were implemented earlier.

To create a horizontal grey line effect, I used the hr element. I made sure, padding remained the same above and below the horizontal line.

At the end of drop down menu, @mariobehling suggested instead of writing ‘more’, it should redirect to projects page of FOSSASIA.

This is how I worked on refactoring drop down menu and added it on the homepage as well.

Resources

Continue Reading

URL Unshortening in Java for loklak server

There are many URL shortening services on the internet. They are useful in converting really long URLs to shorter ones. But apart from redirecting to a longer URL, they are often used to track the people visiting those links.

One of the components of loklak server is its URL unshortening and redirect resolution service, which ensures that websites can’t track the users using those links and enhances the protection of privacy. How this service works in loklak.

Redirect Codes in HTTP

Various standards define 3XX status codes as an indication that the client must perform additional actions to complete the request. These response codes range from 300 to 308, based on the type of redirection.

To check the redirect code of a request, we must first make a request to some URL –

String urlstring = "http://tinyurl.com/8kmfp";
HttpRequestBase req = new HttpGet(urlstring);

Next, we will configure this request to disable redirect and add a nice Use-Agent so that websites do not block us as a robot –

req.setConfig(RequestConfig.custom().setRedirectsEnabled(false).build());
req.setHeader("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36");

Now we need a HTTP client to execute this request. Here, we will use Apache’s CloseableHttpClient

CloseableHttpClient httpClient = HttpClients.custom()
                                   .setConnectionManager(getConnctionManager(true))
                                   .setDefaultRequestConfig(defaultRequestConfig)
                                   .build();

The getConnctionManager returns a pooling connection manager that can reuse the existing TCP connections, making the requests very fast. It is defined in org.loklak.http.ClientConnection.

Now we have a client and a request. Let’s make our client execute the request and we shall get an HTTP entity on which we can work.

HttpResponse httpResponse = httpClient.execute(req);
HttpEntity httpEntity = httpResponse.getEntity();

Now that we have executed the request, we can check the status code of the response by calling the corresponding method –

if (httpEntity != null) {
   int httpStatusCode = httpResponse.getStatusLine().getStatusCode();
   System.out.println("Status code - " + httpStatusCode);
} else {
   System.out.println("Request failed");
}

Hence, we have the HTTP code for the requests we make.

Getting the Redirect URL

We can simply check for the value of the status code and decide whether we have a redirect or not. In the case of a redirect, we can check for the “Location” header to know where it redirects.

if (300 <= httpStatusCode && httpStatusCode <= 308) {
   for (Header header: httpResponse.getAllHeaders()) {
       if (header.getName().equalsIgnoreCase("location")) {
           redirectURL = header.getValue();
       }
   }
}

Handling Multiple Redirects

We now know how to get the redirect for a URL. But in many cases, the URLs redirect multiple times before reaching a final, stable location. To handle these situations, we can repeatedly fetch redirect URL for intermediate links until we saturate. But we also need to take care of cyclic redirects so we set a threshold on the number of redirects that we have undergone –

String urlstring = "http://tinyurl.com/8kmfp";
int termination = 10;
while (termination-- > 0) {
   String unshortened = getRedirect(urlstring);
   if (unshortened.equals(urlstring)) {
       return urlstring;
   }
   urlstring = unshortened;
}

Here, getRedirect is the method which performs single redirect for a URL and returns the same URL in case of non-redirect status code.

Redirect with non-3XX HTTP status – meta refresh

In addition to performing redirects through 3XX codes, some websites also contain a <meta http-equiv=”refresh” … > which performs an unconditional redirect from the client side. To detect these types of redirects, we need to look into the HTML content of a response and parse the URL from it. Let us see how –

String getMetaRedirectURL(HttpEntity httpEntity) throws IOException {
   StringBuilder sb = new StringBuilder();
   BufferedReader reader = new BufferedReader(new InputStreamReader(httpEntity.getContent()));
   String content = null;
   while ((content = reader.readLine()) != null) {
       sb.append(content);
   }
   String html = sb.toString();
   html = html.replace("\n", "");
   if (html.length() == 0)
       return null;
   int indexHttpEquiv = html.toLowerCase().indexOf("http-equiv=\"refresh\"");
   if (indexHttpEquiv < 0) {
       return null;
   }
   html = html.substring(indexHttpEquiv);
   int indexContent = html.toLowerCase().indexOf("content=");
   if (indexContent < 0) {
       return null;
   }
   html = html.substring(indexContent);
   int indexURLStart = html.toLowerCase().indexOf(";url=");
   if (indexURLStart < 0) {
       return null;
   }
   html = html.substring(indexURLStart + 5);
   int indexURLEnd = html.toLowerCase().indexOf("\"");
   if (indexURLEnd < 0) {
       return null;
   }
   return html.substring(0, indexURLEnd);
}

This method tries to find the URL from meta tag and returns null if it is not found. This can be called in case of non-redirect status code as a last attempt to fetch the URL –

String getRedirect(String urlstring) throws IOException {
   ...
   if (300 <= httpStatusCode && httpStatusCode <= 308) {
       ...
   } else {
       String metaURL = getMetaRedirectURL(httpEntity);
       EntityUtils.consumeQuietly(httpEntity);
       if (metaURL != null) {
           if (!metaURL.startsWith("http")) {
               URL u = new URL(new URL(urlstring), metaURL);
               return u.toString();
           }
           return metaURL;
       }
   return urlstring;
   }
   ...
}

In this implementation, we can see that there is a check for metaURL starting with http because there may be relative URLs in the meta tag. The java.net.URL library is used to create a final URL string from the relative URL. It can handle all the possibilities of a valid relative URL.

Conclusion

This blog post explains about the resolution of shortened and redirected URLs in Java. It explains about defining requests, executing them using an HTTP client and processing the resultant response to get a redirect URL. It also explains about how to perform these operations repeatedly to process multiple shortenings/redirects and finally to fetch the redirect URL from meta tag.

Loklak uses an inbuilt URL shortener to resolve redirects for a URL. If you find this blog post interesting, please take a look at the URL shortening service of loklak.

Continue Reading

Improving Harvesting Decision for Kaizen Harvester in loklak server

About Kaizen Harvester

Kaizen is an alternative approach to do harvesting in loklak. It focuses on query and information collecting to generate more queries from collected timelines. It maintains a queue of query that is populated by extracting following information from timelines –

  1. Hashtags in Tweets
  2. User mentions in Tweets
  3. Tweets from areas near to each Tweet in timeline.
  4. Tweets older than oldest Tweet in timeline.

Further, it can also utilise Twitter API to get trending keywords from Twitter and get search suggestions from other loklak peers.

It was introduced by @yukiisbored in pull request loklak/loklak_server#960.

The Problem: Unbiased Harvesting Decision

The Kaizen harvester either searches for queries from the queue, or tries to grab trending queries (using Twitter API or from backend). In the previous version of KaizenHarvester, the decision of “harvesting vs. info-grabbing” was taken based on the value from a random boolean generator –

@Override
public int harvest() {
   if (!queries.isEmpty() && random.nextBoolean())
       return harvestMessages();

   grabSuggestions();

   return 0;
}

[SOURCE]

In sane situations, the Kaizen harvester is configured to use a fixed size queue and drops the queries which are requested to get added once the queue is full. And since the decision doesn’t take into account the amount to which queue is filled, it would often call the grabSuggestions() method.

But since the queue would be full, the grabbed suggestions would simply be lost. This would result in wastage of time and resources in fetching the suggestions (from backend or API). To overcome this, something better was to be done in this part.

The Solution: Making Decision Biased

To solve the problem of dumb harvesting decision, the harvester was triggered based on the following steps –

  1. Calculate the ratio of queue filled (q.size() / q.maxSize()).
  2. Generate a random floating point number between 0 and 1.
  3. If the number is less than the fraction, harvest. Otherwise get harvesting suggestions.

Why would this work?

Initially, when the queue is mostly empty, the ratio would be a small number. So, it would be highly probable that a random number generated between 0 and 1 would be greater than the ratio. And Kaizen would go for grabbing search suggestions.

If this ratio is large (i.e. the queue is almost full), it would be highly likely that the random number generated would be less than it, making it more likely to search for results instead of grabbing suggestions.

Graph?

The following graph shows how the harvester decision would change. It performs 10k iterations for a given queue ratio and plots the number of times harvesting decision was taken.

Change in code

The harvest() method was changed in loklak/loklak_server#1158 to take smart decision of harvesting vs. info-grabbing in following manner –

@Override
public int harvest() {
   float targetProb = random.nextFloat();
   float prob = 0.5F;
   if (QUERIES_LIMIT > 0) {
       prob = queries.size() / (float)QUERIES_LIMIT;
   }
   if (!queries.isEmpty() && targetProb < prob) {
       return harvestMessages();
   }

   grabSuggestions();

   return 0;
}

[SOURCE]

Conclusion

This change brought enhancement in the Kaizen harvester and made it more sensible to how fast its queue if filling. There are no more requests made to backend for suggestions whose queries are not added to the queue.

 

Resources

Continue Reading

Emoticon Map Markers in Emoji Heatmapper App

As I’ve been exploring and trying to learn what’s possible in the maps of OpenLayers 3 using LokLak API, I wondered about map markers. The markers which i used earlier seem to be so dull on the map, and as I am working on Emoji Heatmapper I couldn’t help but think about 🍩,  😻, 🍦 and 🔮. And sure, the more practical 🏡 , 🏢, ☕ and 🌆.

Emojis as map markers? I had to give it a try.

I didn’t know how one acquires the emoji trove, so I searched around Github. Sure enough, I found many solutions on GitHub. I sifted through all of them until Emoji-picker caught my attention. So i tried giving a dropdown using the emoji-picker as searching would be lot more easier for the user.

Emoji-picker will convert an emoji keyword to the image internally. That is why when you hover over an emoji in the drop-down menu, it shows the corresponding keyword. For instance, the image 🚀 when hovered on it, it displays :rocket: .

 

All the emojis are saved as data URIs, so I don’t need to worry about lugging around hundreds of images. All I need is emoji-picker.js, and few more *.js files  hooked up on my page, and a little JavaScript to get everything working accordingly.

Armed with hundreds of emojis, my next step was to swap markers with emoji keywords. After a few clicks around emoji-picker documentation, I landed on data-emoji-input=”unicode” . It allows you to replace the traditional marker with a unicode emojis so the search outputs data. You can add a class to that lead emoji-picker-container and data-emoji-input=”unicode” for the HTML option.

Style the Open Layers 3 map:

var style = new ol.style.Style({
    stroke: new ol.style.Stroke({
        color: [64, 200, 200, 0.5],
        width: 5
    }),
    text: new ol.style.Text({
        font: '30px sans-serif',
        text: document.getElementById('searchField').value !== '' ? document.getElementById('searchField').value : '',
        fill: new ol.style.Fill({
            color: [64, 64, 64, 0.75]
        })
    })
});

 

and 🎇 I have an emoji map marker.

Resources:

Continue Reading

OpenLayers 3 Map that Animates Emojis Using LokLak API

OpenLayers3 maps are fully functional maps which offer additional interactive features. In the Emoji Heatmapper app in Loklak Apps, I am using interactive OpenLayers3 maps to visualize the data. In this blog post, I am going to show you how to build an OpenLayers 3 map that animates emojis according to the query entered and location tracked from the LokLak Search API.

We start with a simple map using just one background layer in a clean style.

var map = new ol.Map({
target: 'map',  // The DOM element that will contains the map
renderer: 'canvas', // Force the renderer to be used
layers: [
// Add a new Tile layer getting tiles from OpenStreetMap source
new ol.layer.Tile({
    source: new ol.source.OSM()
}),
vectorLayer
],
// Create a view centered on the specified location and zoom level
view: new ol.View({
    center: ol.proj.transform([2.1833, 41.3833], 'EPSG:4326', 'EPSG:3857'),
    zoom: 2
})
});

 

Sample Output which displays map:

The data set for the locations of tweets containing emoji in them are tracked using search API of LokLak, which is in the form of simplified extract as JSON file. The file contains a list of coordinates named as location_point, the coordinate consists of lat and long values. With the coordinates, we will create a circle point i.e.,marker on the map showing where the emoji have been recently used from the tweets posted.

In the callback of the AJAX request we loop through the list of coordinates. The coordinate of the resulting line string are in EPSG:4326. Usually, when loading vector data with a different projection, OpenLayers will automatically re-project the geometries to the projection of the map. Because we are loading loading the data ourself, we manually have to transform the line to EPSG:3857. Then we could add the feature to the vector source.

for(var i = 0; i < tweets.statuses.length; i++) {
        if(tweets.statuses[i].location_point !== undefined){
            // Creation of the point with the tweet's coordinates
            //  Coords system swap is required: OpenLayers uses by default
            //  EPSG:3857, while loklak's output is EPSG:4326
            var point = new ol.geom.Point(ol.proj.transform(tweets.statuses[i].location_point, 'EPSG:4326', 'EPSG:3857'));
            vectorSource.addFeature(new ol.Feature({  // Add the point to the data vector
                geometry: point
            }));
        }
    }
});

 

Markers on the Map:

We can also style the markers which gets rendered onto the map using the feature ol.style.Style provided by OpenLayers.

var style = new ol.style.Style({
    stroke: new ol.style.Stroke({
        color: [64, 200, 200, 0.5],
        width: 5
    }),
    text: new ol.style.Text({
        font: '30px sans-serif',
        text: document.getElementById('searchField').value !== '' ? document.getElementById('searchField').value : '', //any text can be given here
        fill: new ol.style.Fill({
            color: [64, 64, 64, 0.75]
        })
    })
});

 

Styled Markers on the Map:

So these were a few tips and tricks to use the interactive OpenLayers3 Maps.

The full code of the example is available here.

Resources:

Continue Reading

How to Geolocate Tweets for Emoji Heatmapper App using LokLak API

Geolocating tweets i.e., getting the location of the tweets posted, is one of the major task in the app Emoji Heatmapper. As I have to plot a marker on the map according to the query searched and the location tracked from the tweet containing that query. This is easily done in the app using the LokLak Search API.
Social media, such as Twitter, are filled with millions of micro messages by people around the world. My geolocation task is to get the geographic location with a Twitter message (having emoji for Emoji Heatmapper) based on the information we have about the user and the message. Twitter provides various data APIs, and tweets are obtained as JSON objects that include the tweet text along with metadata, such as location coordinates (lat and long).

Solutions

So there are few ways to geolocate tweets:

  • Place object from tweets Tweets usually delivered by the Twitter API include a JSON “Place” object which has a location associated with the tweet. These can have fields such as the country and city related to the place, as well as lat and long coordinates. Twitter Users have the option to tag their twitter statuses i.e., tweets with a place; the tagging can also be done automatically based on matches to the user’s current GPS position, if the user allows this (if GPS is switched ON). For the tweets containing place objects, the geolocation has already been done by Twitter.
  • Coordinates from tweets Some tweets are tagged with the coordinates (lat and long) of the user when the message was written, based on the user’s current GPS position. Reverse geocoding using APIs from Google Maps can be done to know the detailed information about the place.
  • Location from user profile Many users provide a location in their profile, a field with values such as “IND”. These locations are mostly always the same(i.e.,static), corresponding to the user’s primary location rather than the location at the time of the message posting. We assume that the users are not avid travelers.
  • Content-based Geo-location Geo-location can also be done on a message or set of messages based on the textual content of the messages.A user’s primary location can be detected based on their dialect or the mention of regional issues like sports teams, for example, as well as the mention of landmarks.

Implementation

So, here comes about the usage of LokLak search API in Emoji Heatmapper app:
I have used the LokLak Search API to get the tweets, which contains the query(i.e., emoji) which is being searched for and the location of the tweets. In LokLak we perform geolocation using location information from tweet metadata and user profiles.
Try this simple query:

http://loklak.org/api/search.json?q=😄

Which shows the data related to search query in which we also have geolocation information. Below is the output of the query, in which a single Twitter status is displayed which has the search query in it and the highlighted part shows the information about the location:

 {
      "timestamp": "2017-06-01T17:46:41.874Z",
      "created_at": "2017-06-01T17:46:25.000Z",
      "screen_name": "CerdaEnterprise",
      "text": "\"These boots were made for wearing... & that's just what we'll do.\" 🎶😄 Coral + guest pins = 💟 http://Www.TfjByCei.com https://pic.twitter.com/469uVWumSN",
      "link": "https://twitter.com/CerdaEnterprise/status/870335741561151488",
      "id_str": "870335741561151488",
      "canonical_id": "",
      "parent": "",
      "source_type": "TWITTER",
      "provider_type": "SCRAPED",
      "retweet_count": 0,
      "favourites_count": 0,
      "place_name": "Made",
      "place_id": "",
      "text_length": 155,
      "place_context": "ABOUT",
      "place_country": "Netherlands",
      "place_country_code": "NL",
      "place_country_center": [
        -3.5756949584817903,
        26.74012558382941
      ],
      "location_point": [
        4.7930598188159195,
        51.67666995510302
      ],
      "location_radius": 0,
      "location_mark": [
        4.793226608645116,
        51.676940788684036
      ],
      "location_source": "ANNOTATION",
      "hosts": [
        "www.tfjbycei.com",
        "pic.twitter.com"
      ],
      "hosts_count": 2,
      "links": [
        "http://Www.TfjByCei.com",
        "https://pic.twitter.com/469uVWumSN"
      ],
      "links_count": 2,
      "unshorten": {},
      "images": [
        "https://pbs.twimg.com/media/DBQNqkDUQAAkntY.jpg",
        "https://pic.twitter.com/469uVWumSN"
      ],
      "images_count": 2,
      "audio": [],
      "audio_count": 0,
      "videos": [],
      "videos_count": 0,
      "mentions": [],
      "mentions_count": 0,
      "hashtags": [],
      "hashtags_count": 0,
      "classifier_language": "english",
      "classifier_language_probability": 3.246518732845919E-16,
      "without_l_len": 96,
      "without_lu_len": 96,
      "without_luh_len": 96,
      "user": {
        "appearance_first": "2017-06-01T17:46:41.874Z",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/769877281497944064/PgsRK9C7_bigger.jpg",
        "screen_name": "CerdaEnterprise",
        "user_id": "529052548",
        "name": "The Fashion Junkies",
        "appearance_latest": "2017-06-01T17:46:41.874Z"
      }
    }

As you can see the above JSON output from the “Search API of LokLak”, the location we get in the JSON output i.e., location_mark/location_point is used to plot the point on the map. If in case the location from where the tweet posted is missing, location from the user profile is used.

Location is pointed with a marker on the map as shown below:

You can find the information about Emoji Heatmapper app here and even can try it out:

http://apps.loklak.org/details.html?q=emojiHeatmapper.

Resources

Continue Reading

Implementing Loklak APIs in Java using Reflections

Loklak server provides a large API to play with the data scraped by it. Methods in java can be implemented to use these API endpoints. A common approach of implementing the methods for using API endpoints is to create the request URL by taking the values passed to the method, and then send GET/POST request. Creating the request URL in every method can be tiresome and in the long run maintaining the library if implemented this way will require a lot of effort. For example, assume a method is to be implemented for suggest API endpoint, which has many parameters, for creating request URL a lot of conditionals needs to be written – whether a parameter is provided or not.

Well, the methods to call API endpoints can be implemented with lesser and easy to maintain code using Reflection in Java. The post ahead elaborates the problem, the approach to solve the problem and finally solution which is implemented in loklak_jlib_api.

Let’s say, the status API endpoint needs to be implemented, a simple approach can be:

public class LoklakAPI {
    
    public static String status(String baseUrl) {
        String requestUrl = baseUrl   "/api/status.json";
        
        // GET request using requestUrl 
    }

    public static void main(String[] argv) {
        JSONObject result = status("https://api.loklak.org");
    }
}

This one is easy, isn’t it, as status API endpoint requires no parameters. But just imagine if a method implements an API endpoint that has a lot of parameters, and most of them are optional parameters. As a developer, you would like to provide methods that cover all the parameters of the API endpoint. For example, how a method would look like if it implements suggest API endpoint, the old SuggestClient implementation in loklak_jlib_api does that:

public static ResultList<QueryEntry> suggest(
           final String hostServerUrl,
           final String query,
           final String source,
           final int count,
           final String order,
           final String orderBy,
           final int timezoneOffset,
           final String since,
           final String until,
           final String selectBy,
           final int random) throws JSONException, IOException {
       ResultList<QueryEntry>  resultList = new ResultList<>();
       String suggestApiUrl = hostServerUrl
               + SUGGEST_API
               + URLEncoder.encode(query.replace(' ', '+'), ENCODING)
               + PARAM_TIMEZONE_OFFSET + timezoneOffset
               + PARAM_COUNT + count
               + PARAM_SOURCE + (source == null ? PARAM_SOURCE_VALUE : source)
               + (order == null ? "" : (PARAM_ORDER + order))
               + (orderBy == null ? "" : (PARAM_ORDER_BY + orderBy))
               + (since == null ? "" : (PARAM_SINCE + since))
               + (until == null ? "" : (PARAM_UNTIL + until))
               + (selectBy == null ? "" : (PARAM_SELECT_BY + selectBy))
               + (random < 0 ? "" : (PARAM_RANDOM + random))
               + PARAM_MINIFIED + PARAM_MINIFIED_VALUE;
 
                // GET request using suggestApiUrl
   }
}

A lot of conditionals!!! The targeted users may also get irritated if they need to provide all the parameters every time even if they don’t need them. The obvious solution to that is overloading the methods. But,  then again for each overloaded method, the same repetitive conditionals need to be written, a form of code duplication!! And what if you have to implement some 30 API endpoints and in future maintaining them, a single thought of it is scary and a nightmare for the developers.

Approach to the problem

To reduce the code size, one obvious thought that comes is to implement static methods that send GET/POST requests provided request URL and the required data. These methods are implemented in loklak_jlib_api by JsonIO and NetworkIO.

You might have noticed the method name i.e. status, suggest is also the part of the request URL. So, the common pattern that we get is:

base_url + "/api/" + methodName + ".json" + "?" + firstParameterName + "=" + firstParameterValue + "&" + secondParameterName + "=" + secondParameterValue ...

Reflection API to rescue

Using Reflection API inspection of classes, interfaces, methods, and variables can be done, yes you guessed it right if we can inspect we can get the method and parameter names. It is also possible to implement interfaces and create objects at runtime.

So, an interface is created and API endpoint methods are defined in it. Using reflection API the interface is implemented lazily at runtime. LoklakAPI interface defines all the API endpoint methods. Some of the defined methods are:

public interface LoklakAPI {
 
    @Target(ElementType.METHOD)
    @Retention(RetentionPolicy.RUNTIME)
    @interface GET {}
 
    @Target(ElementType.METHOD)
    @Retention(RetentionPolicy.RUNTIME)
    @interface POST {}
 
    @GET
    JSONObject search(String q);
 
    @GET
    JSONObject search(String q, int count);
 
    @GET
    JSONObject search(String q, int timezoneOffset, String since, String until);
 
    @GET
    JSONObject search(String q, int timezoneOffset, String since, String until, int count);
 
    @GET
    JSONObject peers();
 
    @GET
    JSONObject hello();
 
    @GET
    JSONObject status();
 
    @POST
    JSONObject push(JSONObject data);
}

Here GET and POST annotations are used to mark the methods which use GET and POST request respectively, so that appropriate method for GET and POST static method can be used JsonIOloadJson for GET request and pushJson for POST request.

A private static inner class ApiInvocationHandler is created in APIGenerator, which implements InvocationHandler – used for implementing interfaces at runtime.

private static class ApiInvocationHandler implements InvocationHandler {
 
   private String mBaseUrl;
 
   public ApiInvocationHandler(String baseUrl) {
       this.mBaseUrl = baseUrl;
   }
 
   @Override
   public Object invoke(Object o, Method method, Object[] values) throws Throwable {
       Parameter[] params = method.getParameters();
       Object[] paramValues = values;
 
       /*
       format of annotation name:
       @packageWhereAnnotationIsDeclared$AnnotationName()
       Example: @org.loklak.client.LoklakAPI$GET()
       */
       Annotation annotation = method.getAnnotations()[0];
       String annotationName = annotation.toString().toLowerCase();
 
       String apiUrl = createGetRequestUrl(mBaseUrl, method.getName(),
               params, paramValues);
       if (annotationName.contains("get")) { // GET REQUEST
           return loadJson(apiUrl);
       } else { // POST REQUEST
           JSONObject jsonObjectToPush = (JSONObject) paramValues[0];
           String postRequestUrl = createPostRequestUrl(mBaseUrl, method.getName());
           return pushJson(postRequestUrl, params[0].getName(), jsonObjectToPush);
       }
   }
}

The base URL is provided while creating the object, in the constructor. The invoke method is called whenever a defined method in the interface is called in actual code. This way an interface is implemented at runtime. The parameters of invoke method:

  • Object o – the object on which to call the method.
  • Method method – method which is called, parameters and annotations of the method are obtained using getParameters and getAnnotations methods respectively which are required to create our request url.
  • Object[] values – an array of parameter values which are passed to the method when calling it.

createGetRequestUrl is used to create the request URL for sending GET requests.

private static String createGetRequestUrl(
       String baseUrl, String methodName, Parameter[] params, Object[] paramValues) {
   String apiEndpointUrl = baseUrl.replaceAll("/+$", "") + "/api/" + methodName + ".json";
   StringBuilder url = new StringBuilder(apiEndpointUrl);
   if (params.length > 0) {
       String queryParamAndVal = "?" + params[0].getName() + "=" + paramValues[0];
       url.append(queryParamAndVal);
       for (int i = 1; i < params.length; i++) {
           String paramAndVal = "&" + params[i].getName()
                   + "=" + String.valueOf(paramValues[i]);
           url.append(paramAndVal);
       }
   }
   return url.toString();
}

Similarly, createPostRequestUrl is used to create request URL for sending POST requests.

private static String createPostRequestUrl(String baseUrl, String methodName) {
   baseUrl = baseUrl.replaceAll("/+$", "");
   return baseUrl + "/api/" + methodName + ".json";
}

Finally, Proxy.newProxyInstance is used to implement the interface at runtime. An instance of ApiInvocationHandler class is passed to the newProxyInstance method. This is all done by static method createApiMethods.

public static <T> T createApiMethods(Class<T> service, final String baseUrl) {
   ApiInvocationHandler apiInvocationHandler = new ApiInvocationHandler(baseUrl);
   return (T) Proxy.newProxyInstance(
           service.getClassLoader(), new Class<?>[]{service}, apiInvocationHandler);
}

Where:

  • Class<T> service – is the interface where API endpoint methods are defined, here LoklakAPI.class.
  • String baseUrl – web address of the server where Loklak server is hosted.

NOTE: For all this to work the library must be build using “-parameters” flag, else parameter names can’t be obtained. Since loklak_jlib_api uses maven, “-parameters” is provided in pom.xml file.

A small example:

String baseUrl = "https://api.loklak.org";
LoklakAPI loklakAPI = APIGenerator.createApiMethods(LoklakAPI.class, baseUrl);
 
JSONObject searchResult = loklakAPI.search("FOSSAsia");
Continue Reading

Setting up Codecov in Susper repository hosted on Github

In this blog post, I’ll be discussing how we setup codecov in Susper.

  • What is Codecov and in what projects it is being used in FOSSASIA?

Codecov is a famous code coverage tool. It can be easily integrated with the services like Travis CI. Codecov also provides more features with the services like Docker.

Projects in FOSSASIA like Open Event Orga Server, Loklak search, Open Event Web App uses Codecov. Recently, in the Susper project also the code coverage tool has been configured.

  • How we setup Codecov in our project repository hosted on Github?

The simplest way to setup Codecov in a project repository is by installing codecov.io using the terminal command:

npm install --save-dev codecov.io

Susper works on tech-stack Angular 2 (we have recently upgraded it to Angular v4.1.3) recently. Angular comes with Karma and Jasmine for testing purpose. There are many repositories of FOSSASIA in which Codecov has been configured like this. But with, Angular this case is a little bit tricky. So, using alone:

bash <(curl -s https://codecov.io/bash)

won’t generate code coverage because of the presence of Karma and Jasmine. It will require two packages: istanbul as coverage reporter and jasmine as html reporter. I have discussed them below.

Install these two packages:

  • Karma-coverage-istanbul-reporter
  • npm install karma-coverage-istanbul-reporter --save-dev
  • Karma-jasmine html reporter
  • npm install karma-jasmine-html-reporter --save-dev

    After installing the codecov.io, the package.json will be updated as follows:

  • "devDependencies": {
      "codecov": "^2.2.0",
      "karma-coverage-istanbul-reporter": "^1.3.0",
      "karma-jasmine-html-reporter": "^0.2.2",
    }

    Add a script for testing:

  • "scripts": {
       "test": "ng test --single-run --code-coverage --reporters=coverage-istanbul"
    }

    Now generally, the codecov works better with Travis CI. With the one line bash <(curl -s https://codecov.io/bash) the code coverage can now be easily reported.

Here is a particular example of travis.yml from the project repository of Susper:

script:
 - ng test --single-run --code-coverage --reporters=coverage-istanbul
 - ng lint
 
after_success:
 - bash <(curl -s https://codecov.io/bash)
 - bash ./deploy.sh

Update karma.config.js as well:

Module.exports = function (config) {
  config.set({
    plugins: [
      require('karma-jasmine-html-reporter'),
      require('karma-coverage-istanbul-reporter')
    ],
    preprocessors: {
      'src/app/**/*.js': ['coverage']
    },
    client {
      clearContext: false
    },
    coverageIstanbulReporter: {
      reports: ['html', 'lcovonly'],
      fixWebpackSourcePaths: true
    },
    reporters: config.angularCli && config.angularCli.codeCoverage
      ? ['progress', 'coverage-istanbul'],
      : ['progress', 'kjhtml'],
  })
}

This karma.config.js is an example from the Susper project. Find out more here: https://github.com/fossasia/susper.com/pull/420
This is how we setup codecov in Susper repository. And like this way, it can be set up in other repositories as well which supports Angular 2 or 4 as tech stack.

Continue Reading

Creating a Script to Review loklak Apps

The Loklak applications site now has a functional store listing page where developers can showcase their apps and users and other developers can get all sorts of information about an app. However, for an app to be showcased properly on store listing page, the app must contain a properly configured app.json file and some necessary assets. But while creating an app a developer might miss out some vital information from app.json or forget to provide some of the required assets, which he will later come to know from his co-developers or reviewers.

Now this will cause inconvenience, both for the developer and reviewer. So to overcome this apps.loklak.org has now got a new script to review a given app. It checks whether the necessary fields are present in app.json or not. If present then it checks whether the fields are empty or not. If any invalid or missing information problem is encountered then it is reported to the developer along with information on what should be the actual case. If the app passes all the checks then the developer is informed that his app is ready to be published.

In order to use this script all the developer needs to do is open his app directory in terminal and execute the following command

../bin/review.sh

This will present the developer with all the necessary informations.

How the script works?

Now let us delve into the working of the script. The initial call to the shell script review.sh calls a python script review.py. This python script performs all the checks on the app and displays necessary informations.

The first thing the script does is, it checks whether there is an app.json present in the app directory or not, if yes then the process continues otherwise it ends immediately with an error.

if "app.json" not in dir_contents:
    print_info("Please include a well configured app.json")
    print_problem("review failed")
    exit(0)

Next it performs one of the most important checks. It verifies whether the name of the app and app directory name is same or not. If it is same then there is no problem else it shows the corresponding error message.

app_dir_name = os.getcwd().split("/")[-1]
    if app_dir_name != app_json.get("name"):
      print_problem("app directory name and name mentioned in app.json are different")
      print_info("app directory name and name mentioned in app.json must be same")
      problems_no += 1

Next the script checks whether there is an index.html present or not. Index.html is a must as it serves as the entry point of the app.

if “index.html” not in dir_contents:
print_problem(“index.html is missing”)
print_info(“app must contain index.html”)
problems_no += 1

After this the script checks whether a number of key fields are present or not. These field includes applicationCategory, oneLineDescription, author. If they are present, it checks whether the fields are empty or not. If the fields are either absent or empty, error is shown to the developer.

if app_json.get("applicationCategory") == None:
    print_problem("key applicationCategory missing in app.json")
    print_info("app.json must contain key applicationCategory with category as value")
    problems_no += 1
  else:
    if app_json.get("applicationCategory") == "":
      print_problem("applicationCategory cannot be an empty string")
      problems_no += 1
  
  if app_json.get("oneLineDescription") == None:
    print_problem("key oneLineDescription missing in app.json")
    print_info("app.json must contain key oneLineDescription with a one line description of the app")
    problems_no += 1
  else:
    if app_json.get("oneLineDescription") == "":
      print_problem("oneLineDescription cannot be an empty string")
      problems_no += 1
  
  if app_json.get("author") == None:
    print_problem("author object missing in app.json")
    print_info("app.json must contain author object containing information about author")
    problems_no += 1
  else:
    author = app_json.get("author")
    if author.get("name") == None:
      print_problem("name of author is mssing in author object")
      print_info("name of author must be mentioned in author object")
      problems_no += 1
    else:
      if author.get("name") == "":
        print_problem("author name cannot be an empty string")
        problems_no += 1

After the above checks are performed, it checks whether the fields corresponding to the image assets are present or not and whether the assets being referred in app.json exist in the directory. If not, then the developer is informed about the same.

if app_json.get("promoImage") == None:
    print_problem("key promoImage missing in app.json")
    print_info("app.json must contain key promoImage with path of promo image as value")
    problems_no += 1
  else:
    if os.path.isfile(app_json.get("promoImage")) == False:
      if app_json.get("promoImage") == "":
        print_problem("promoImage cannot be an empty string")
      else:
        print_problem(app_json.get("promoImage") + " does not exists")
      problems_no += 1
  
  if app_json.get("appImages") == None:
    print_problem("key appImages missing in app.json")
    print_info("app.json must contain key appImages with paths of preview images as value")
    problems_no += 1
  else:
    for image in app_json.get("appImages"):
      if os.path.isfile(image) == False:
        if image == "":
          print_problem("appImages cannot contain empty strings")
        else:
          print_problem(image +" does not exists")
        problems_no += 1

Finally after all the checks are performed, the developer is informed whether his or her app is ready to be published or not. The number of errors are also presented along with all other details.

if problems_no > 0:
    print_problem("Number of problems detected : " + str(problems_no))
    print_problem("App cannot be published")
    print_info("Please check https://github.com/fossasia/apps.loklak.org/blob/master/docs/tutorial.md")
  else:
    print_success("App is ready for publishing")
    print_info("Check https://github.com/fossasia/apps.loklak.org/blob/master/docs/tutorial.md to publish your app")

Here is an sample output for a faulty app

Here is an sample output for app which is ready to go.

The script is presently functional and can be used to review your apps before sending it for being published on apps.loklak.org. However there are a number of things which are presently not checked like version, getStarted.md, appUse.md, etc. These checks will be included soon.

Continue Reading
Close Menu