Introducing Priority Kaizen Harvester for loklak server

In the previous blog post, I discussed the changes made in loklak’s Kaizen harvester so it could be extended and other harvesting strategies could be introduced. Those changes made it possible to introduce a new harvesting strategy as PriorityKaizen harvester which uses a priority queue to store the queries that are to be processed. In this blog post, I will be discussing the process through which this new harvesting strategy was introduced in loklak. Background, motivation and approach Before jumping into the changes, we first need to understand that why do we need this new harvesting strategy. Let us start by discussing the issue with the Kaizen harvester. The produce consumer imbalance in Kaizen harvester Kaizen uses a simple hash queue to store queries. When the queue is full, new queries are dropped. But numbers of queries produced after searching for one query is much higher than the consumption rate, i.e. the queries are bound to overflow and new queries that arrive would get dropped. (See loklak/loklak_server#1156) Learnings from attempt to add blocking queue for queries As a solution to this problem, I first tried to use a blocking queue to store the queries. In this implementation, the producers would get blocked before putting the queries in the queue if it is full and would wait until there is space for more. This way, we would have a good balance between consumers and producers as the consumers would be waiting until producers can free up space for them - public class BlockingKaizenHarvester extends KaizenHarvester {    ...    public BlockingKaizenHarvester() {        super(new KaizenQueries() {            ...            private BlockingQueue<String> queries = new ArrayBlockingQueue<>(maxSize);            @Override            public boolean addQuery(String query) {                if (this.queries.contains(query)) {                    return false;                }                try {                    this.queries.offer(query, this.blockingTimeout, TimeUnit.SECONDS);                    return true;                } catch (InterruptedException e) {                    DAO.severe("BlockingKaizen Couldn't add query: " + query, e);                    return false;                }            }            @Override            public String getQuery() {                try {                    return this.queries.take();                } catch (InterruptedException e) {                    DAO.severe("BlockingKaizen Couldn't get any query", e);                    return null;                }            }            ...        });    } } [SOURCE, loklak/loklak_server#1210] But there is an issue here. The consumers themselves are producers of even higher rate. When a search is performed, queries are requested to be appended to the KaizenQueries instance for the object (which here, would implement a blocking queue). Now let us consider the case where queue is full and a thread requests a query from the queue and scrapes data. Now when the scraping is finished, many new queries are requested to be inserted to most of them get blocked (because the queue would be full again after one query getting inserted). Therefore, using a blocking queue in KaizenQueries is not a good thing to do. Other considerations After the failure of introducing the Blocking Kaizen harvester, we looked for other alternatives for storing queries. We came across multilevel queues, persistent disk queues and priority queues. Multilevel queues sounded like a good idea at first where we would have multiple queues for storing queries. But eventually, this would just boil down to how…

Continue ReadingIntroducing Priority Kaizen Harvester for loklak server

Fetching URL for Embedded Twitter Videos in loklak server

The primary web service that loklak scrapes is Twitter. Being a news and social networking service, Twitter allows its users to post videos directly to Twitter and they convey more thoughts than what text can. But for an automated scraper, getting the links is not a simple task. Let us see that what were the problems we faced with videos and how we solved them in the loklak server project. Previous setup and embedded videos In the previous version of loklak server, the TwitterScraper searched for videos in 2 ways - Youtube links HTML5 video links To fetch the video URL from HTML5 video, following snippet was used - if ((p = input.indexOf("<source video-src")) >= 0 && input.indexOf("type=\"video/") > p) {    String video_url = new prop(input, p, "video-src").value;    videos.add    continue; } Here, input is the current line from raw HTML that is being processed and prop is a class defined in loklak that is useful in parsing HTML attributes. So in this way, the HTML5 videos were extracted. The Problem - Embedded videos Though the previous setup had no issues, it was useless as Twitter embeds the videos in an iFrame and therefore, can’t be fetched using simple HTML5 tag extraction. If we take the following Tweet for example, the requested HTML from the search page contains video in following format - <src="https://twitter.com/i/videos/tweet/881946694413422593?embed_source=clientlib&player_id=0&rpc_init=1" allowfullscreen="" id="player_tweet_881946694413422593" style="width: 100%; height: 100%; position: absolute; top: 0; left: 0;"> So we needed to come up with a better technique to get those videos. Parsing video URL from iFrame The <div> which contains video is marked with AdaptiveMedia-videoContainer class. So if a Tweet has an iFrame containing video, it will also have the mentioned class. Also, the source of iFrame is of the form https://twitter.com/i/videos/tweet/{Tweet-ID}. So now we can programmatically go to any Tweet’s video and parse it to get results. Extracting video URL from iFrame source Now that we have the source of iFrame, we can easily get the video source using the following flow - public final static Pattern videoURL = Pattern.compile("video_url\\\":\\\"(.*?)\\\""); private static String[] fetchTwitterIframeVideos(String iframeURL) {    // Read fron iframeURL line by line into BufferReader br    while ((line = br.readLine()) != null ) {        int index;        if ((index = line.indexOf("data-config=")) >= 0) {            String jsonEscHTML = (new prop(line, index, "data-config")).value;            String jsonUnescHTML = HtmlEscape.unescapeHtml(jsonEscHTML);            Matcher m = videoURL.matcher(jsonUnescHTML);            if (!m.find()) {                return new String[]{};            }            String url = m.group(1);            url = url.replace("\\/", "/");  // Clean URL            /*             * Play with url and return results             */        }    } } MP4 and M3U8 URLs If we encounter mp4 URLs, we’re fine as it is the direct link to video. But if we encounter m3u8 URL, we need to process it further before we can actually get to the videos. For Twitter, the hosted m3u8 videos contain link to further m3u8 videos which are of different resolution. These m3u8 videos again contain link to various .ts files that contain actual video in parts of 3 seconds length each to support better streaming experience on…

Continue ReadingFetching URL for Embedded Twitter Videos in loklak server

Documenting APIs with Yaydoc

API Documentation is a quick and concise way to tell a user about how to use a library or work with a program. It details classes, functions, parameters, return types and more. Courtesy of Sphinx, Yaydoc had build in support for Documenting APIs for Python based projects right from it’s inception. Sphinx has a built in tool autodoc which provides certain directives such as autoclass, automodule, etc which can be used to automatically extract docstrings from all specified Python packages and modules and use it to generate API documentation. As a user of Yaydoc you could add ReST sources files with appropriate directives provided by autodoc and we would handle the rest. As part of enhancing this feature we wanted to do three things. Enhance support for Python Extend API documentation to other languages apart from Python Automate the process of generating ReST source files For Enhancing support for python projects, we implemented a few things. Since autodoc imports the modules it needs to document, There could be import errors if a dependency was not met. To fix this issue, Now a user can specify certain modules to be mocked. This would really come in handy with projects depending on packages with third party C extensions such as numpy, scipy, etc. {% if mock_modules %} mock_modules = [name.strip() for name in '{{ mock_modules }}'.split(',')] sys.modules.update((mod_name, mock.Mock()) for mod_name in mock_modules) {% endif %} Apart from this, if we detect a setup.py in the repository or a requirements.txt, we automatically try to install from it to meet dependencies. # autodoc imports the module while building source files. To avoid # ImportError, install any packages in requirements.txt of the project # if available if [ -f $ROOT_DIR/setup.py ]; then pip install $ROOT_DIR/ elif [ -f $ROOT_DIR/requirements.txt ]; then pip install -q -r $ROOT_DIR/requirements.txt fi We also crawl the repository to detect any packages and add them to sys.path. With these changes, a user can expected generated API docs without having to extend conf.py. {% if autoapi_python == 'true' %} for (dirpath, dirnames, filenames) in os.walk('{{ root_dir }}'): # Directory contains __init__.py. It should be a python package if '__init__.py' in filenames: # appending instead of inserting at front so that user # cannot overwrite some of our own modules. sys.path.append(os.path.abspath(os.path.dirname(dirpath))) {% endif %} The second goal is a no brainer. We would like to support as many languages as we can. With this week’s update, Java has been added to the officially supported list of languages for which Yaydoc can generate full API documentation without any manual intervention. To extract API documentation for java source files, we used a sphinx extension named javasphinx. From the official javasphinx docs, javasphinx is a Sphinx extension that provides a Sphinx domain for documenting Java projects and a javasphinx-apidoc command line tool for automatically generating API documentation from existing Java source code and Javadoc documentation. javasphinx-apidoc -o source/ $ROOT_DIR/$AUTOAPI_JAVA_PATH/ sphinx-apidoc -o source/ $ROOT_DIR/$AUTOAPI_PYTHON_PATH/ For the third goal, we use the tools sphinx-apidoc and javasphinx-apidoc to generate…

Continue ReadingDocumenting APIs with Yaydoc

JSON Deserialization Using Jackson in Open Event Android App

The Open Event project uses JSON format for transferring event information like tracks, sessions, microlocations and other. The event exported in the zip format from the Open Event server also contains the data in JSON format. The Open Event Android application uses this JSON data. Before we use this data in the app, we have to parse the data to get Java objects that can be used for populating views. Deserialization is the process of converting JSON data to Java objects. In this post I explain how to deserialize JSON data using Jackson. 1. Add dependency In order to use Jackson in your app add following dependencies in your app module’s build.gradle file. dependencies { compile 'com.fasterxml.jackson.core:jackson-core:2.8.9' compile 'com.fasterxml.jackson.core:jackson-annotations:2.8.9' compile 'com.fasterxml.jackson.core:jackson-databind:2.8.9' } 2.  Define entity model of data In the Open Event Android we have so many models like event, session, track, microlocation, speaker etc. Here i am only defining track model because of it’s simplicity and less complexity. public class Track { private int id; private String name; private String description; private String color; @JsonProperty("font-color") private String fontColor; //getters and setters } Here if the property name is same as json attribute key then no need to add JsonProperty annotation like we have done for id, name color property. But if property name is different from json attribute key then it is necessary to add JsonProperty annotation. 3.  Create sample JSON data Let’s create sample JSON format data we want to deserialize. { "id": 273, "name": "Android", "description": "Sample track", "color": "#94868c", "font-color": "#000000" } 4.  Deserialize using ObjectMapper ObjectMapper is Jackson serializer/deserializer. ObjectMapper’s readValue() method is used for simple deserialization. It takes two parameters one is JSON data we want to deserialize and second is Model entity class. Create an ObjectMapper object and initialize it. ObjectMapper objectMapper = new ObjectMapper(); Now create a Model entity object and initialize it with deserialized data from ObjectMapper’s readValue() method. Track track = objectMapper.readValue(json, Track.class); So we have converted JSON data into the Java object. Jackson is very powerful library for JSON serialization and deserialization. To learn more about Jackson features follow the links given below. Jackson-databind: https://github.com/FasterXML/jackson-databind Jackson wiki: http://wiki.fasterxml.com/JacksonInFiveMinutes Tutorial: http://tutorials.jenkov.com/java-json/jackson-installation.html

Continue ReadingJSON Deserialization Using Jackson in Open Event Android App

Cache Thumbnails and Images Using Picasso in Open Event Android

In the event based Android projects like Open Event Android, we have speakers and sponsors. And these projects needs to display image of the speakers and sponsors because it affects project a lot. So instead of every time fetching image from the server it is good to store small images(thumbnails) in the cache and load images even if device is offline. It also reduces data usage. Picasso is mostly used image loading library for Android. It automatically handles ImageView recycling and download cancellation in an adapter, complex image transformations with minimal memory use, memory and disk caching. But one problem is Picasso caches images for only one session by default. I mean if you close the app then all by default cached image will be removed.  If you are offline then Picasso will not load cached images because of it. It will make network calls every time you open the app. In this post I explain how to manually cache images using Picasso so that images load even if the device is offline. It will make a network call only once for a particular image and will cache image in memory. We will use okhttp3 library for OkHttpClient. 1. Add dependency In order to use Picasso in your app add following dependencies in your app module’s build.gradle file. dependencies { compile 'com.squareup.okhttp3:okhttp:3.8.1' compile 'com.squareup.picasso:picasso:2.5.2' } 2. Make static Picasso object Make static Picasso object in the Application class so that we can use it directly from the other activity. public static Picasso picassoWithCache; 3. Initialize cache Create a File object with path as app specific cache and use this object to create a Cache object. File httpCacheDirectory = new File(getCacheDir(), "picasso-cache"); Cache cache = new Cache(httpCacheDirectory, 15 * 1024 * 1024); Here it will create a Cache object with 15MB. getCacheDir() method returns the absolute path to the application specific cache directory on the filesystem. OkHttpClient.Builder okHttpClientBuilder = new OkHttpClient.Builder().cache(cache); 4. Initialize Picasso with cache Now initialize picassoWithCache object using Picass.Builder(). Set downloader for picasso by adding  new OkHttp3Downloader object. picassoWithCache = new Picasso.Builder(this).downloader(new OkHttp3Downloader(okHttpClientBuilder.build())).build(); 5. Use picassoWithCache object As it is a static object you can directly use it from any activity. All the images loaded using this picassoWithCache instance will be cached in memory. Application.picassoWithCache().load(thumbnail/image url);   To know more how i solved this issue in Open Event Project visit this link. To learn more about Picasso features follow the links given below. Documentation : http://square.github.io/picasso/2.x/picasso/ Tutorials : https://futurestud.io/tutorials/picasso-influencing-image-caching

Continue ReadingCache Thumbnails and Images Using Picasso in Open Event Android

Implementing Loklak APIs in Java using Reflections

Loklak server provides a large API to play with the data scraped by it. Methods in java can be implemented to use these API endpoints. A common approach of implementing the methods for using API endpoints is to create the request URL by taking the values passed to the method, and then send GET/POST request. Creating the request URL in every method can be tiresome and in the long run maintaining the library if implemented this way will require a lot of effort. For example, assume a method is to be implemented for suggest API endpoint, which has many parameters, for creating request URL a lot of conditionals needs to be written - whether a parameter is provided or not. Well, the methods to call API endpoints can be implemented with lesser and easy to maintain code using Reflection in Java. The post ahead elaborates the problem, the approach to solve the problem and finally solution which is implemented in loklak_jlib_api. Let's say, the status API endpoint needs to be implemented, a simple approach can be: public class LoklakAPI { public static String status(String baseUrl) { String requestUrl = baseUrl "/api/status.json"; // GET request using requestUrl } public static void main(String[] argv) { JSONObject result = status("https://api.loklak.org"); } } This one is easy, isn’t it, as status API endpoint requires no parameters. But just imagine if a method implements an API endpoint that has a lot of parameters, and most of them are optional parameters. As a developer, you would like to provide methods that cover all the parameters of the API endpoint. For example, how a method would look like if it implements suggest API endpoint, the old SuggestClient implementation in loklak_jlib_api does that: public static ResultList<QueryEntry> suggest( final String hostServerUrl, final String query, final String source, final int count, final String order, final String orderBy, final int timezoneOffset, final String since, final String until, final String selectBy, final int random) throws JSONException, IOException { ResultList<QueryEntry> resultList = new ResultList<>(); String suggestApiUrl = hostServerUrl + SUGGEST_API + URLEncoder.encode(query.replace(' ', '+'), ENCODING) + PARAM_TIMEZONE_OFFSET + timezoneOffset + PARAM_COUNT + count + PARAM_SOURCE + (source == null ? PARAM_SOURCE_VALUE : source) + (order == null ? "" : (PARAM_ORDER + order)) + (orderBy == null ? "" : (PARAM_ORDER_BY + orderBy)) + (since == null ? "" : (PARAM_SINCE + since)) + (until == null ? "" : (PARAM_UNTIL + until)) + (selectBy == null ? "" : (PARAM_SELECT_BY + selectBy)) + (random < 0 ? "" : (PARAM_RANDOM + random)) + PARAM_MINIFIED + PARAM_MINIFIED_VALUE; // GET request using suggestApiUrl } } A lot of conditionals!!! The targeted users may also get irritated if they need to provide all the parameters every time even if they don’t need them. The obvious solution to that is overloading the methods. But,  then again for each overloaded method, the same repetitive conditionals need to be written, a form of code duplication!! And what if you have to implement some 30 API endpoints and in…

Continue ReadingImplementing Loklak APIs in Java using Reflections

Using FastAdapter in Open Event Organizer Android Project

RecyclerView is an important graphical UI component in any android application. Android provides RecyclerView.Adapter class which manages all the functionality of RecyclerView. I don't know why but android people have kept this class in a very abstract form with only basic functionalities implemented by default. On the plus side it opens many doors for custom adapters with new functionalities for example, sticky headers, scroll indicator, drag and drop actions on items, multiview types items etc. A developer should be able to make an adapter of his need by extending RecyclerView.Adapter. There are many custom adapters developers have shared which comes with built in functionalities. FastAdapter is one of them which comes with all the good functionalities built in and also it is very easy to use. I just got to use this in the Open Event Organizer Android App of which the core feature is Attendees Check In. We have used FastAdapter library to show attendees list which needs many features which are absent in plane RecyclerView.Adapter. FastAdapter is built in such way that there are many different ways of using it on developer's need. I have found a simplest way which I will be sharing here. The first part is extending the item model to inherit AbstractItem. public class Attendee extends AbstractItem<Attendee, AttendeeViewHolder> {   @PrimaryKey   private long id;   ...   ...   @Override   public long getIdentifier() {       return id;   }   @Override   public int getType() {       return 0;   }   @Override   public int getLayoutRes() {       return R.layout.attendee_layout;   }   @Override   public AttendeeViewHolder getViewHolder(View view) {       return new AttendeeViewHolder(DataBindingUtil.bind(view));   }   @Override   public void bindView(AttendeeViewHolder holder, List<Object> list) {       super.bindView(holder, list);       holder.bindAttendee(this);   }   @Override   public void unbindView(AttendeeViewHolder holder) {       super.unbindView(holder);       holder.unbindAttendee();   } } The methods are pretty obvious by name. Implement these methods accordingly. You may notice that we have used Databinding here to bind data to views but it is not necessary. Also you will have to create your ViewHolder for adapter. You can either use RecyclerView.ViewHolder or you can just create a custom one by inheriting it as per your need. Once this part is over you are half done as most of the things are been taken care in model itself. Now we will be writing code for adapter and setting it to your RecyclerView. FastItemAdapter<Attendee> fastItemAdapter = new FastItemAdapter<>(); fastItemAdapter.setHasStableIds(true); ... // functionalities related code ... recyclerView.setAdapter(fastItemAdapter); Initialize FastItemAdapter which will be our main adapter handling all the direct functions related to the RecyclerView. Set up some boolean constants according to the project need. In our project we have Attendee model which has id as a primary field. FastItemAdapter can take advantage of distinct field of the model called as identifier . Hence it is set true as Attendee model has id field. But you should be careful about setting it to True as then you must have implemented getIdentifier in the model to return correct field which will be used as an identifier by our adapter. And the adapter is good to set to the RecyclerView. Now we got to decide…

Continue ReadingUsing FastAdapter in Open Event Organizer Android Project

Lambda expressions in Android

What are Lambda expressions Lambda Expressions are one of the most important features added to Java 8. Prior to Lambda Expressions, implementing functional interfaces i.e interfaces with only one abstract method has been done using syntax that has a lot of boilerplate code in it. In cases like this, what we are trying to do is pass a functionality as an argument to a method, such as what happens when a button is clicked. Lambda expressions enables you to do just that, in a way that is much more compact and clear. Syntax of Lambda Expressions A lambda expression consist of the following: A comma separated list of formal parameters enclosed in parentheses. The data types of the parameters in a lambda expression can be omitted. Also the parenthesis can be omitted if there is only one parameter. For example: TextView tView = (TextView) findViewById(R.id.tView); tView.setOnLongClickListener(v -> System.out.println("Testing Long Click")); The arrow token -> A body which contains a single expression or a statement block. If a single expression is specified, the java runtime evaluates the expression and then return its value. To specify a statement block, enclose statements in curly braces "{}" Lambda Expressions in Android To use Lambda Expressions and other Java 8 features in Android, you need to use the Jack tool-chain. Open your module level build.gradle file and add the following: android { ... defaultConfig { ... jackOptions { enabled true } } compileOptions { sourceCompatibility JavaVersion.VERSION_1_8 targetCompatibility JavaVersion.VERSION_1_8 } } Sync your build.gradle file and if you are having any issue with build tools, you may need to update buildToolsVersion in your build.gradle file to "24rc4" or just download the latest Android SDK Build-tools from the SDK Manager, under the Tools (Preview channel). Example Adding a click listener to a button without lambda expression Button button = (Button)findViewById(R.id.button); button.setOnClickListener(button.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { Toast.makeText(this, "Button clicked", Toast.LENGTH_LONG).show(); } });); with lambda expressions It is as simple as: Button button = (Button)findViewById(R.id.button); button.setOnClickListener(v -> Toast.makeText(this, "Button clicked", Toast.LENGTH_LONG).show();); As we can see above, using lambda expressions makes implementing a functional interface clearer and compact. Standard functional interfaces can be found in the java.util.function package [included in Java 8]. These interfaces can be used as target types for lambda expressions and method references. Credits : https://mayojava.github.io/android/java/using-java8-lambda-expressions-in-android/ Another way to have Java 8 features in your Android app is using the RetroLambda plugin.

Continue ReadingLambda expressions in Android