Using NodeJS modules of Loklak Scraper in Android

Loklak Scraper JS implements scrapers of social media websites so that they can be used in other platforms, like Android or in a native Java project. This way there will be only a single source of scraper, as a result it will be easier to update the scrapers in response to the change in websites. This blog explains how Loklak Wok Android, a peer for Loklak Server on Android platform uses the Twitter JS scraper to scrape tweets.

LiquidCore is a library available for android that can be used to run standard NodeJS modules. But Twitter scraper can’t be used directly, due to the following problems:

  • 3rd party NodeJS libraries are used to implement the scraper, like cheerio and request-promise-native and LiquidCore doesn’t support 3rd party libraries.
  • The scrapers are written in ES6, as of now LiquidCore uses NodeJS 6.10.2, which doesn’t support ES6 completely.

So, if 3rd party NodeJS libraries can be included in our scraper code and ES6 can be converted to ES5, LiquidCore can easily execute Twitter scraper.

3rd party NodeJS libraries can be bundled into Twitter scraper using Webpack and ES6 can be transpiled to ES5 using Babel.

The required dependencies can be installed using:

$npm install --save-dev webpack
$npm install --save-dev babel-core babel-loader babel-preset-es2015

Bundling and Transpiling

Webpack does bundling based on the configurations provided in webpack.config.js, present in root directory of project.

var fs = require('fs');

function listScrapers() {
   var src = "./scrapers/"
   var files = {};
   fs.readdirSync(src).forEach(function(data) {
       var entryName = data.substr(0, data.indexOf("."));
       files[entryName] = src+data;
   });
   return files;
}

module.exports = {
 entry: listScrapers(),
 target: "node",
 module: {
     loaders: [
         {
             loader: "babel-loader",
             test: /\.js?$/,
             query: {
                 presets: ["es2015"],
             }
         },
     ]
 },
 output: {
   path: __dirname + '/build',
   filename: '[name].js',
   libraryTarget: 'var',
   library: '[name]',
 }
};

 

Now let’s break the config file, the function listScrapers returns a JSONObject with key as name of scraper and value as relative location of scraper, ex:

{
   twitter: "./scrapers/twitter.js",
   github: "./scrapers/github.js"
   // same goes for other scrapers
}

The parameters in module.exports as described in the documentation of webpack for multiple inputs and to use the generated output externally:

  • entry: Since a bundle file is required for each scraper we provide the  the JSONObject returned by listScrapers function. The multiple entry points provided generate multiple bundled files.
  • target: As the bundled files are to be used in NodeJS platform,  “node” is set here.
  • module: Using webpack the code can be directly transpiled while bundling, the end users don’t need to run separate commands for transpiling. module contains babel configurations for transpiling.
  • output: options here customize the compilation of webpack
    • path: Location where bundled files are kept after compilation, “__dirname” means the current directory i.e. root directory of the project.
    • filename: Name of bundled file, “[name]“ here refers to the key of JSONObject provided in entry i.e. key of JSONObect returned from listScrapers. Example for Twitter scraper, the filename of bundled file will be “twitter.js”.
    • libraryTarget: by default the functions or methods inside bundled files can’t be used externally – can’t be imported. By providing the “var” the functions in bundled module can be accessed.
    • library: the name of the library.

Now, time to do the compilation work:

$ ./node_modules/.bin/webpack

The bundled files can be found in build directory. But, the generated bundled files are large files – around 77,000 lines. Large files are not encouraged for production purposes. So, a “-p” flag is used to generate bundled files for production – around 400 lines.

$ ./node_modules/.bin/webpack -p

Using LiquidCore to execute bundled files

The generated bundled file can be copied to the raw directory in res (resources directory in Android). Now, events are emitted from Activity/Fragment and in response to those events the scraping function is invoked in the bundled JS file, present in raw directory, the vice-versa is also possible.

So, we handle some events in our JS file and send some events to the android Activity/Fragment. The event handling and event creating code in JS file:

var query = "";
LiquidCore.on("queryEvent", function(msg) {
  query = msg.query;
});

LiquidCore.on("fetchTweets", function() {
  var twitterScraper = new twitter();
  twitterScraper.getTweets(query, function(data) {
    LiquidCore.emit("getTweets", {"query": query, "statuses": data});
  });
});

LiquidCore.emit('start');

 

First a “start” event is emitted from JS file, which is consumed in TweetHarvestingFragment by getScrapedTweet method using startEventListener.

EventListener startEventListener = (service, event, payload) -> {
   JSONObject jsonObject = new JSONObject();
   try {
       jsonObject.put("query", query);
       service.emit(LC_QUERY_EVENT, jsonObject); // value of LC_QUERY_EMIT is  "queryEvent"
   } catch (JSONException e) {
       Log.e(LOG_TAG, e.toString());
   }
   service.emit(LC_FETCH_TWEETS_EVENT); //value of  LC_FETCH_TWEETS_EVENT is  "fetchTweets"
};

 

The startEventListener then emits “queryEvent” with a JSONObject that contains the query to search tweets for scraping. This event is consumed in JS file by:

var query = "";
LiquidCore.on("queryEvent", function(msg) {
  query = msg.query;
});

 

After “queryEvent”, “fetchTweets” event is emitted from fragment, which is handled in JS file by:

LiquidCore.on("fetchTweets", function() {
  var twitterScraper = new twitter(); // scraping object is created
  twitterScraper.getTweets(query, function(data) { // function that scrapes twitter
    LiquidCore.emit("getTweets", {"query": query, "statuses": data});
  });
});

 

Once the scraped data is obtained, it is sent back to fragment by emitting “getTweets” event from JS file, “{“query”: query, “statuses”: data}” contains scraped data. This event is consumed in android by getTweetsEventListener.

EventListener getTweetsEventListener = (service, event, payload) -> { // payload contains scraped data
   Push push = mGson.fromJson(payload.toString(), Push.class);
   emitter.onNext(push);
};

 

LiquidCore creates a NodeJS instance to execute the bundled JS file. The NodeJS instance is called MicroService in LiquidCore terminology. For all this event handling to work, the NodeJS instance is created inside the method with a ServiceStartListner where all EventListener are added.

MicroService.ServiceStartListener serviceStartListener = (service -> {
   service.addEventListener(LC_START_EVENT, startEventListener);
   service.addEventListener(LC_GET_TWEETS_EVENT, getTweetsEventListener);
});
URI uri = URI.create("android.resource://org.loklak.android.wok/raw/twitter"); // Note .js is not used
MicroService microService = new MicroService(getActivity(), uri, serviceStartListener);
microService.start();

Resources

Continue ReadingUsing NodeJS modules of Loklak Scraper in Android

Adding React based World Mood Tracker to loklak Apps

loklak apps is a website that hosts various apps that are built by using loklak API. It uses static pages and angular.js to make API calls and show results from users. As a part of my GSoC project, I had to introduce the World Mood Tracker app using loklak’s mood API. But since I had planned to work on React, I had to go off from the track of typical app development in loklak apps and integrate a React app in apps.loklak.org.

In this blog post, I will be discussing how I introduced a React based app to apps.loklak.org and solved the problem of country-wise visualisation of mood related data on a World map.

Setting up development environment inside apps.loklak.org

After following the steps to create a new app in apps.loklak.org, I needed to add proper tools and libraries for smooth development of the World Mood Tracker app. In this section, I’ll be explaining the basic configuration that made it possible for a React app to be functional in the angular environment.

Pre-requisites

The most obvious prerequisite for the project was Node.js. I used node v8.0.0 while development of the app. Instead of npm, I decided to go with yarn because of offline caching and Internet speed issues in India.

Webpack and Babel

To begin with, I initiated yarn in the app directory inside project and added basic dependencies –

$ yarn init
$ yarn add webpack webpack-dev-server path
$ yarn add babel-loader babel-core babel-preset-es2015 babel-preset-react --dev

 

Next, I configured webpack to set an entry point and output path for the node project in webpack.config.js

module.exports = {
    entry: './js/index.js',
    output: {
        path: path.resolve('.'),
        filename: 'index_bundle.js'
    },
    ...
};

This would signal to look for ./js/index.js as an entry point while bundling. Similarly, I configured babel for es2015 and React presets –

{
  "presets":[
    "es2015", "react"
  ]
}

 

After this, I was in a state to define loaders for module in webpack.config.js. The loaders would check for /\.js$/ and /\.jsx$/ and assign them to babel-loader (with an exclusion of node_modules).

React

After configuring the basic presets and loaders, I added React to dependencies of the project –

$ yarn add react react-dom

 

The React related files needed to be in ./js/ directory so that the webpack can bundle it. I used the file to create a simple React app –

import React from 'react';
import ReactDOM from 'react-dom';

ReactDOM.render(
    <div>World Mood Tracker</div>,
    document.getElementById('app')
);

 

After this, I was in a stage where it was possible to use this app as a part of apps.loklak.org app. But to do this, I first needed to compile these files and bundle them so that the external app can use it.

Configuring the build target for webpack

In apps.loklak.org, we need to have a file by the name of index.html in the app’s root directory. Here, we also needed to place the bundled js properly so it could be included in index.html at app’s root.

HTML Webpack Plugin

Using html-webpack-plugin, I enabled auto building of project in the app’s root directory by using the following configuration in webpack.config.js

...
const HtmlWebpackPlugin = require('html-webpack-plugin');
const HtmlWebpackPluginConfig = new HtmlWebpackPlugin({
    template: './js/index.html',
    filename: 'index.html',
    inject: 'body'
});
module.exports = {
    ...
    plugins: [HtmlWebpackPluginConfig]
};

 

This would build index.html at app’s root that would be discoverable externally.

To enable bundling of the project using simple yarn build command, the following lines were added to package.json

{
  ..
  "scripts": {
    ..
    "build": "webpack -p"
  }
}

After a simple yarn build command, we can see the bundled js and html being created at the app root.

Using datamaps for visualization

Datamaps is a JS library which allows plotting of data on map using D3 as backend. It provides a simple interface for creating visualizations and comes with a handy npm installation –

$ yarn add datamaps

Map declaration and usage as state

A map from datamaps was used a state for React component which allowed fast rendering of changes in the map as the state of React component changes –

export default class WorldMap extends React.Component {
    constructor() {
        super();
        this.state = {
            map: null
        };
    render() {
        return (<div className={styles.container}>
                <div id="map-container"></div></div>)
    }
    componentDidMount() {
        this.setState({map: new Datamap({...})});
    }
    ...
}

 

The declaration of map goes in componentDidMount method because it would not be possible to start the map until we have the div with id=”map-container” in the DOM. It was necessary to draw the map only after the component has mounted otherwise it would fail due to no id=”map-container” in the DOM.

Defining data for countries

Data for every country had two components –

data = {
    positiveScore: someValue,
    negativeScore: someValue
}

 

This data is used to generate popup for the counties –

this.setState({
    map: new Datamap({
        ...
        geographyConfig: {
            ...
            popupTemplate: function (geo, data) {
                // Configure variables pScore so that it gives “No Data” when data.positiveScore is not set (similar for negative)
                return [
                    // Use pScore and nScore to generate results here
                    // geo.properties.name would give current country name
                ].join('');
            }
        }
    })
});

The result for countries with unknown data values look something like this –

Conclusion

In this blog post, I explained about introducing a React based app in app.loklak.org’s angular based environment. I discussed the setup and bundling process of the project so it becomes available from the project’s external HTTP server.

I also discussed using datamaps as a visualisation tool for data about Tweets. The app was first introduced in pull request fossasia/apps.loklak.org#189 and was improved step by step in subsequent patches.

Resources

Continue ReadingAdding React based World Mood Tracker to loklak Apps