Using NodeJS modules of Loklak Scraper in Android
Loklak Scraper JS implements scrapers of social media websites so that they can be used in other platforms, like Android or in a native Java project. This way there will be only a single source of scraper, as a result it will be easier to update the scrapers in response to the change in websites. This blog explains how Loklak Wok Android, a peer for Loklak Server on Android platform uses the Twitter JS scraper to scrape tweets. LiquidCore is a library available for android that can be used to run standard NodeJS modules. But Twitter scraper can’t be used directly, due to the following problems: 3rd party NodeJS libraries are used to implement the scraper, like cheerio and request-promise-native and LiquidCore doesn’t support 3rd party libraries. The scrapers are written in ES6, as of now LiquidCore uses NodeJS 6.10.2, which doesn’t support ES6 completely. So, if 3rd party NodeJS libraries can be included in our scraper code and ES6 can be converted to ES5, LiquidCore can easily execute Twitter scraper. 3rd party NodeJS libraries can be bundled into Twitter scraper using Webpack and ES6 can be transpiled to ES5 using Babel. The required dependencies can be installed using: $npm install --save-dev webpack $npm install --save-dev babel-core babel-loader babel-preset-es2015 Bundling and Transpiling Webpack does bundling based on the configurations provided in webpack.config.js, present in root directory of project. var fs = require('fs'); function listScrapers() { var src = "./scrapers/" var files = {}; fs.readdirSync(src).forEach(function(data) { var entryName = data.substr(0, data.indexOf(".")); files[entryName] = src+data; }); return files; } module.exports = { entry: listScrapers(), target: "node", module: { loaders: [ { loader: "babel-loader", test: /\.js?$/, query: { presets: ["es2015"], } }, ] }, output: { path: __dirname + '/build', filename: '[name].js', libraryTarget: 'var', library: '[name]', } }; Now let’s break the config file, the function listScrapers returns a JSONObject with key as name of scraper and value as relative location of scraper, ex: { twitter: "./scrapers/twitter.js", github: "./scrapers/github.js" // same goes for other scrapers } The parameters in module.exports as described in the documentation of webpack for multiple inputs and to use the generated output externally: entry: Since a bundle file is required for each scraper we provide the the JSONObject returned by listScrapers function. The multiple entry points provided generate multiple bundled files. target: As the bundled files are to be used in NodeJS platform, “node” is set here. module: Using webpack the code can be directly transpiled while bundling, the end users don’t need to run separate commands for transpiling. module contains babel configurations for transpiling. output: options here customize the compilation of webpack path: Location where bundled files are kept after compilation, “__dirname” means the current directory i.e. root directory of the project. filename: Name of bundled file, “[name]“ here refers to the key of JSONObject provided in entry i.e. key of JSONObect returned from listScrapers. Example for Twitter scraper, the filename of bundled file will be “twitter.js”. libraryTarget: by default the functions or methods inside bundled files…
