Implementing Tweet Search Suggestions in Loklak Wok Android

Loklak Wok Android not only is a peer harvester for Loklak Server but it also provides users to search tweets using Loklak’s API endpoints. To provide a better search tweet search experience to the users, the app provides search suggestions using suggest API endpoint. The blog describes how “Search Suggestions” is implemented. Third Party Libraries used to Implement Suggestion Feature Retrofit2: Used for sending network request Gson: Used for serialization, JSON to POJOs (Plain old java objects). RxJava and RxAndroid: Used to implement a clean asynchronous workflow. Retrolambda: Provides support for lambdas in Android. These libraries can be installed by adding the following dependencies in app/build.gradle android { …. // removes rxjava file repetations packagingOptions { exclude 'META-INF/rxjava.properties' } } dependencies { // gson and retrofit2 compile 'com.google.code.gson:gson:2.8.1' compile 'com.squareup.retrofit2:retrofit:2.3.0' compile 'com.squareup.retrofit2:converter-gson:2.3.0' compile 'com.squareup.retrofit2:adapter-rxjava2:2.3.0' // rxjava and rxandroid compile 'io.reactivex.rxjava2:rxjava:2.0.5' compile 'io.reactivex.rxjava2:rxandroid:2.0.1' compile 'com.jakewharton.rxbinding2:rxbinding:2.0.0' }   To add retrolambda // in project's build.gradle dependencies { … classpath 'me.tatarka:gradle-retrolambda:3.2.0' } // in app level build.gradle at the top apply plugin: 'me.tatarka.retrolambda'   Fetching Suggestions Retrofit2 sends a GET request to search API endpoint, the JSON response returned is serialized to Java Objects using the models defined in models.suggest package. The models can be easily generated using JSONSchema2Pojo. The benefit of using Gson is that, the hard work of parsing JSON is easily handled by it. The static method createRestClient creates the retrofit instance to be used for network calls private static void createRestClient() { sRetrofit = new Retrofit.Builder() .baseUrl(BASE_URL) // base url : https://api.loklak.org/api/ .addConverterFactory(GsonConverterFactory.create(gson)) .addCallAdapterFactory(RxJava2CallAdapterFactory.create()) .build(); }   The suggest endpoint is defined in LoklakApi interface public interface LoklakApi { @GET("/api/suggest.json") Observable<SuggestData> getSuggestions(@Query("q") String query); @GET("/api/suggest.json") Observable<SuggestData> getSuggestions(@Query("q") String query, @Query("count") int count); …. }   Now, the suggestions are obtained using fetchSuggestion method. First, it creates the rest client to send network requests using createApi method (which internally calls creteRestClient implemented above). The suggestion query is obtained from the EditText. Then the RxJava Observable is subscribed in a separate thread which is specially meant for doing IO operations and finally the obtained data is observed i.e. views are inflated in the MainUI thread. private void fetchSuggestion() { LoklakApi loklakApi = RestClient.createApi(LoklakApi.class); // rest client created String query = tweetSearchEditText.getText().toString(); // suggestion query from EditText Observable<SuggestData> suggestionObservable = loklakApi.getSuggestions(query); // observable created Disposable disposable = suggestionObservable .subscribeOn(Schedulers.io()) // subscribed on IO thread .observeOn(AndroidSchedulers.mainThread()) // observed on MainUI thread .subscribe(this::onSuccessfulRequest, this::onFailedRequest); // views are manipulated accordingly mCompositeDisposable.add(disposable); }   If the network request is successful onSuccessfulRequest method is called which updates the data in the RecyclerView. private void onSuccessfulRequest(SuggestData suggestData) { if (suggestData != null) { mSuggestAdapter.setQueries(suggestData.getQueries()); // data updated. } setAfterRefreshingState(); }   If the network request fails then onFailedRequest is called which displays a toast saying “Cannot fetch suggestions, Try Again!”. If requests are sent simultaneously and they fail, the previous message i.e. the previous toast is removed. private void onFailedRequest(Throwable throwable) { Log.e(LOG_TAG, throwable.toString()); if (mToast != null) { // checks if a previous toast is present mToast.cancel();…

Continue ReadingImplementing Tweet Search Suggestions in Loklak Wok Android

Using NodeJS modules of Loklak Scraper in Android

Loklak Scraper JS implements scrapers of social media websites so that they can be used in other platforms, like Android or in a native Java project. This way there will be only a single source of scraper, as a result it will be easier to update the scrapers in response to the change in websites. This blog explains how Loklak Wok Android, a peer for Loklak Server on Android platform uses the Twitter JS scraper to scrape tweets. LiquidCore is a library available for android that can be used to run standard NodeJS modules. But Twitter scraper can’t be used directly, due to the following problems: 3rd party NodeJS libraries are used to implement the scraper, like cheerio and request-promise-native and LiquidCore doesn’t support 3rd party libraries. The scrapers are written in ES6, as of now LiquidCore uses NodeJS 6.10.2, which doesn’t support ES6 completely. So, if 3rd party NodeJS libraries can be included in our scraper code and ES6 can be converted to ES5, LiquidCore can easily execute Twitter scraper. 3rd party NodeJS libraries can be bundled into Twitter scraper using Webpack and ES6 can be transpiled to ES5 using Babel. The required dependencies can be installed using: $npm install --save-dev webpack $npm install --save-dev babel-core babel-loader babel-preset-es2015 Bundling and Transpiling Webpack does bundling based on the configurations provided in webpack.config.js, present in root directory of project. var fs = require('fs'); function listScrapers() { var src = "./scrapers/" var files = {}; fs.readdirSync(src).forEach(function(data) { var entryName = data.substr(0, data.indexOf(".")); files[entryName] = src+data; }); return files; } module.exports = { entry: listScrapers(), target: "node", module: { loaders: [ { loader: "babel-loader", test: /\.js?$/, query: { presets: ["es2015"], } }, ] }, output: { path: __dirname + '/build', filename: '[name].js', libraryTarget: 'var', library: '[name]', } };   Now let’s break the config file, the function listScrapers returns a JSONObject with key as name of scraper and value as relative location of scraper, ex: {   twitter: "./scrapers/twitter.js",    github: "./scrapers/github.js"    // same goes for other scrapers } The parameters in module.exports as described in the documentation of webpack for multiple inputs and to use the generated output externally: entry: Since a bundle file is required for each scraper we provide the  the JSONObject returned by listScrapers function. The multiple entry points provided generate multiple bundled files. target: As the bundled files are to be used in NodeJS platform,  “node” is set here. module: Using webpack the code can be directly transpiled while bundling, the end users don’t need to run separate commands for transpiling. module contains babel configurations for transpiling. output: options here customize the compilation of webpack path: Location where bundled files are kept after compilation, “__dirname” means the current directory i.e. root directory of the project. filename: Name of bundled file, “[name]“ here refers to the key of JSONObject provided in entry i.e. key of JSONObect returned from listScrapers. Example for Twitter scraper, the filename of bundled file will be “twitter.js”. libraryTarget: by default the functions or methods inside bundled files…

Continue ReadingUsing NodeJS modules of Loklak Scraper in Android

Resource Injection Using ButterKnife in Loklak Wok Android

Loklak Wok Android being a sophisticated Android app uses a lot of views, and of those most are manipulated at runtime. In Android to play with a View or ViewGroup defined in XML at runtime requires developers to add the following line: (TypeOfView) parentView.findViewById(R.id.id_of_view);   This leads to lengthy code. And very often, more than one Views respond to a particular event. For example, hiding Views if a network request fails and showing a message to the user to “Try Again!”. Let’s say you have to hide 4 Views, are you going to do the following: view1.setVisibility(View.GONE); view2.setVisibility(View.GONE); view3.setVisibility(View.GONE); view4.setVisibility(View.GONE); textView.setVisibility(View.VISIBLE); // has "Try Again!" message. // more 5 lines of code when hiding textView and displaying 4 other Views   Surely not! And the old fashioned way to get a string value defined as a resource in string.xml String appName = getActivity().getResources().getString(R.id.app_name);   Surely, all this works good, but being a developer while working on a sophisticated app you would like to focus on the logic of the app, rather than scratching your head to debug whether you properly did a findViewById or not, did you typecast it to the proper View, or where did you miss to change the visibility of a view in response to an event. Well, all of this can be easily handled by using a library which provides you the dependency, here resources. All you need to do is just declare your resources, and that’s it, the library provides the resources to you, yes you don’t need to initialize it using findViewById. So let’s dive in and see how ButterKnife is used in Loklak Wok Android to handle these issues. Adding ButterKnife to Android Project In the app/build.gradle: dependencies { compile 'com.jakewharton:butterknife:8.6.0' annotationProcessor 'com.jakewharton:butterknife-compiler:8.6.0' ... } Dealing with Views in Fragments When views are declared, BindView annotation is used with its parameter as the ID of the view, for example, views in TweetHarvestingFragment : @BindView(R.id.toolbar) Toolbar toolbar; @BindView(R.id.harvested_tweets_count) TextView harvestedTweetsCountTextView; @BindView(R.id.harvested_tweets_container) RecyclerView recyclerView; @BindView(R.id.network_error) TextView networkErrorTextView; NOTE: Views declared can’t be private. Once Views are declared, then it needs to be injected, it is done using ButterKnife.bind(Object target, View Source). Here in TweetHarvestingFragment the target will be the fragment itself and source i.e. the parent view will be rootView (obtained by inflating the layout file of fragment). All this needs to be done in onCreateView method View rootView = inflater.inflate(R.layout.fragment_tweet_harvesting, container, false); ButterKnife.bind(this, rootView); That’s it, we are done! The same paradigm can be used to bind views to a ViewHolder of a RecyclerView, as implemented in HarvestTweetViewHolder: @BindView(R.id.user_fullname) TextView userFullname; @BindView(R.id.username) TextView username; @BindView(R.id.tweet_date) TextView tweetDate; @BindView(R.id.harvested_tweet_text) TextView harvestedTweetTextView; public HarvestedTweetViewHolder(View itemView) { super(itemView); ButterKnife.bind(this, itemView); }   Injecting resources like strings, dimensions, colors, drawables etc. is even easier, only the related annotation and ID needs to be provided. Example the string app_name is used in TweetHarvestingFragment to display the app name i.e. “Loklak Wok” in toolbar @BindString(R.string.app_name) String appName; // directly used inside onCreateView to set the title in toolbar toolbar.setTitlet(appName);  …

Continue ReadingResource Injection Using ButterKnife in Loklak Wok Android

Editing files and piped data with “sed” in Loklak server

What is sed ? “sed” is used in “Loklak Server” one of the most popular projects of FOSSASIA. “sed” is acronym for “Stream Editor” used for filtering and transforming text, as mentioned in the manual page of sed. Stream can be a file or input from a pipeline or even standard input. Regular expressions are used to filter the text and transformation are carried out using sed commands, either inline or from a file. So, most of the time writing a single line does the work of text substitution, removal or to obtaining a value from a text file. Basic Syntax of “sed” $sed [options]... {inline commands or file having sed commands} [input_file]... Loklak Server uses a config.properties file - contains key-value pairs - which is the basis of the server as it contains configuration values, used by the server during runtime for various operations. Let’s go through a simple sed example that prints line containing word “https” at the beginning in the config.properties file. $sed -n '/^https/p' config.properties Here “-n” option suppresses automatically printing of pattern space (pattern space is where each line is put that is to be processed by sed). Without “-n” option sed will print the whole file. Now, the regular expression part,  “/^https” matches all the lines that has “https” at the start of line and “/p” is print command to print the output in console. Finally we provide the filename i.e. config.properties. If filename is not provided then sed waits for input from standard input. Use cases of “sed” in Loklak Server Displaying proper port number in message while starting or installing Loklak Server The default port of loklak server is port number 9000, but it can be started in any non-occupied port by using “-p” flag with bin/start.sh and bin/installation.sh like $ bin/installation.sh -p 8888 starts installation of Loklak Server in port 8888. To display the proper localhost address so that user can open it in a browser the port number in shortlink.urlstub parameter in config.properties needs to be changed. This is carried out by the function change_shortlink_urlstub in bin/utility.sh. The function is defined as Now let's try to understand what the sed command is doing. “-i” option is used for in-place editing of the specified file i.e. config.properties in conf directory. “s” is substitute command of sed. The regular expression can be divided into two parts, between “/”: \(shortlink\.urlstub=http:.*:\)\(.*\) this is used to find the match in a line. \1'"$1" is used to substitute the matched string in part 1. The regular expressions can be split into groups so that operations can be performed on each group separately. A group is enclosed between “\(“ and “\)”. In our 1st part of regular expressions, there are two groups. Dissecting the first group i.e. \(shortlink\.urlstub=http:.*:\): “shortlink\.urlstub=http:” will match the expression “shortlink.urlstub=http:”, here “\” is used as an escape sequence as “.” in regex represents any character. “.*:”, “.” represents any character and “*” represents 0 or more characters of the previous character. So, it…

Continue ReadingEditing files and piped data with “sed” in Loklak server

Scraping in JavaScript using Cheerio in Loklak

FOSSASIA recently started a new project loklak_scraper_js. The objective of the project is to develop a single library for web-scraping that can be used easily in most of the platforms, as maintaining the same logic of scraping in different programming languages and project is a headache and waste of time. An obvious solution to this was writing scrapers in JavaScript, reason JS is lightweight, fast, and its functions and classes can be easily used in many programming languages e.g. Nashorn in Java. Cheerio is a library that is used to parse HTML. Let’s look at the youtube scraper. Parsing HTML Steps involved in web-scraping: HTML source of the webpage is obtained. HTML source is parsed and The parsed HTML is traversed to extract the required data. For 2nd and 3rd step we use cheerio. Obtaining the HTML source of a webpage is a piece of cake, and is done by function getHtml, sync-request library is used to send the “GET” request. Parsing of HTML can be done using the load method by passing the obtained HTML source of the webpage, as in getSearchMatchVideos function. var $ = cheerio.load(htmlSourceOfWebpage);   Since, the API of cheerio is similar to that of jquery, as a convention the variable to reference cheerio object which has parsed HTML is named “$”. Sometimes, the requirement may be to extract data from a particular HTML tag (the tag contains a large number of nested children tags) rather than the whole HTML that is parsed. In that case, again load method can be used, as used in getVideoDetails function to obtain only the head tag. var head = cheerio.load($("head").html()); “html” method provides the html content of the selected tag i.e. <head> tag. If a parameter is passed to the html method then the content of selected tag (here <head>) will be replaced by the html of new parameter. Extracting data from parsed HTML Some of the contents that we see in the webpage are dynamic, they are not static HTML. When a “GET” request is sent the static HTML of webpage is obtained. When Inspect element is done it can be seen that the class attribute has different value in the webpage we are using than the static HTML we obtain from “GET” request using getHtml function. For example, inspecting the link of one of suggested videos, see the different values of class attribute :   In website (for better view): In static HTML, obtained from “GET” request using getHtml function (for better view): So, it is recommended to do a check first, whether attributes have same values or not, and then proceed accordingly. Now, let’s dive into the actual scraping stuff. As most of the required data are available inside head tag in meta tag. extractMetaAttribute function extracts the value of content attribute based on another provided attribute and its value. function extractMetaAttribute(cheerioObject, metaAttribute, metaAttributeValue) { var selector = 'meta[' + metaAttribute + '="' + metaAttributeValue + '"]'; return cheerioFunction(selector).attr("content"); } “cheerioObject” here will be the “head”…

Continue ReadingScraping in JavaScript using Cheerio in Loklak

Implementing Loklak APIs in Java using Reflections

Loklak server provides a large API to play with the data scraped by it. Methods in java can be implemented to use these API endpoints. A common approach of implementing the methods for using API endpoints is to create the request URL by taking the values passed to the method, and then send GET/POST request. Creating the request URL in every method can be tiresome and in the long run maintaining the library if implemented this way will require a lot of effort. For example, assume a method is to be implemented for suggest API endpoint, which has many parameters, for creating request URL a lot of conditionals needs to be written - whether a parameter is provided or not. Well, the methods to call API endpoints can be implemented with lesser and easy to maintain code using Reflection in Java. The post ahead elaborates the problem, the approach to solve the problem and finally solution which is implemented in loklak_jlib_api. Let's say, the status API endpoint needs to be implemented, a simple approach can be: public class LoklakAPI { public static String status(String baseUrl) { String requestUrl = baseUrl "/api/status.json"; // GET request using requestUrl } public static void main(String[] argv) { JSONObject result = status("https://api.loklak.org"); } } This one is easy, isn’t it, as status API endpoint requires no parameters. But just imagine if a method implements an API endpoint that has a lot of parameters, and most of them are optional parameters. As a developer, you would like to provide methods that cover all the parameters of the API endpoint. For example, how a method would look like if it implements suggest API endpoint, the old SuggestClient implementation in loklak_jlib_api does that: public static ResultList<QueryEntry> suggest( final String hostServerUrl, final String query, final String source, final int count, final String order, final String orderBy, final int timezoneOffset, final String since, final String until, final String selectBy, final int random) throws JSONException, IOException { ResultList<QueryEntry> resultList = new ResultList<>(); String suggestApiUrl = hostServerUrl + SUGGEST_API + URLEncoder.encode(query.replace(' ', '+'), ENCODING) + PARAM_TIMEZONE_OFFSET + timezoneOffset + PARAM_COUNT + count + PARAM_SOURCE + (source == null ? PARAM_SOURCE_VALUE : source) + (order == null ? "" : (PARAM_ORDER + order)) + (orderBy == null ? "" : (PARAM_ORDER_BY + orderBy)) + (since == null ? "" : (PARAM_SINCE + since)) + (until == null ? "" : (PARAM_UNTIL + until)) + (selectBy == null ? "" : (PARAM_SELECT_BY + selectBy)) + (random < 0 ? "" : (PARAM_RANDOM + random)) + PARAM_MINIFIED + PARAM_MINIFIED_VALUE; // GET request using suggestApiUrl } } A lot of conditionals!!! The targeted users may also get irritated if they need to provide all the parameters every time even if they don’t need them. The obvious solution to that is overloading the methods. But,  then again for each overloaded method, the same repetitive conditionals need to be written, a form of code duplication!! And what if you have to implement some 30 API endpoints and in…

Continue ReadingImplementing Loklak APIs in Java using Reflections