Unifying Data from Different Scrapers of loklak server using Post

Loklak Server project is a software that scrapes data from different websites through different endpoints. It is difficult to create a single endpoint. For a single endpoint, there is a need of a decent design for using multiple scrapers. For such a task, multiple changes are needed. That is why one of the changes I introduced was Post class that acts as both wrapper and an interface for data objects of search scrapers (though implementation in scrapers is in progress). Post is a subclass of JSONObject that helps in working with JSON data in Java. In other words, Post is a JSONObject with an identity (we call it postId) and and a timestamp of the data scraped. It is used to capture data fetched by the web-scrapers. Benefit of JSONObject as superclass is that it provides methods to capture and access data efficiently. Why Post? At present there is a Class MessageEntry which is the superclass of TwitterTweet (data object of TwitterScraper). It has numerous methods that can be used by data objects to clean and analyse data. But it has a disadvantage, it is a specialized for social websites like Twitter, but will become redundant for different types websites like Quora, Github, etc. Whereas Post object is a small but powerful and flexible object with its ability to deal with data like JSONObject. It contains getter and setter methods, identity members used to provide each Post object a unique identity. It doesn’t have any methods for analysis and cleaning of data, but MessageEntry class’ methods can be used for this purpose. Uses of Post Object When I started working on Post Object, it could be used as marker interface for data objects. Following are the advantages I came up with it: 1) Accessing the data object of any scraper using its variable. And yes, this is the primary reason it is an interface. 2) But in addition to accessing the data objects, one can also directly use it to fetch, modify or use data without knowing the scraper it belongs. This feature is useful in Timeline iterator. This is an example how Post interface is used to append two lists of Posts (maybe carrying different type of data) into one. public void mergePost(PostTimeline list) { for (Post post: list) { this.add(post); } }   Post as a wrapper object While working on Post object, I converted it into a class to also use it as a wrapper. But why a wrapper? Wrapper can be used to wrap a list of Post objects into one object. It doesn't have any identity or timestamp. It is just a utility to dump a pack of data objects with homogeneous attributes. This is an example implementation of Post object as wrapper. typeArray is a wrapper which is used to store 2 arrays of data objects in it. These data object arrays are timeline objects that are saved as JSONArray objects in the Post wrapper. Post typeArray = new Post(true); switch(type) { case "users":…

Continue ReadingUnifying Data from Different Scrapers of loklak server using Post

CSS Styling Tips Used for loklak Apps

Cascading Style Sheets (CSS) is one of the main factors which is valuable to create beautiful and dynamic websites. So we use CSS for styling our apps in apps.loklak.org. In this blog post am going to tell you about few rules and tips for using CSS when you style your App: 1.Always try something new - The loklak apps website is very flexible according to the user whomsoever creates an app. The user is always allowed to use any new CSS frameworks to create an app. 2.Strive for Simplicity - As the app grows, we’ll start developing a lot more than we imagine like many CSS rules and elements etc. Some of the rules may also override each other without we noticing it. It’s good practice to always check before adding a new style rule—maybe an existing one could apply. 3.Proper Structured file - Maintain uniform spacing. Always use semantic or “familiar” class/id names. Follow DRY (Don’t Repeat Yourself) Principle. CSS file of Compare Twitter Profiles App: #searchBar { width:500px; } table { border-collapse: collapse; width: 70%; } th, td { padding: 8px; text-align: center; border-bottom: 1px solid#ddd; }   The output screen of the app: Do’s and Don'ts while using CSS: Pages must continue to work when style sheets are disabled. In this case this means that the apps which are written in apps.loklak.org should run in any and every case. Let's say for instance, when a user uses a old browsers or bugs or either because of style conflicts. Do not use the !important attribute to override the user's settings. Using the !important declaration is often considered bad practice because it has side effects that mess with one of CSS's core mechanisms: specificity. In many cases, using it could indicate poor CSS architecture. If you have multiple style sheets, then make sure to use the same CLASS names for the same concept in all of the style sheets. Do not use more than two fonts. Using a lot of fonts simply because you can will result in a messy look. A firm rule for home page design is more is less : the more buttons and options you put on the home page, the less users are capable of quickly finding the information they need. Resources: See more apps in apps.loklak.org. Checkout the code of the apps at: https://github.com/fossasia/apps.loklak.org . More about CSS and styling at https://www.w3.org/Style/CSS/Overview.en.html .

Continue ReadingCSS Styling Tips Used for loklak Apps

How the Compare Twitter Profiles loklak App works

People usually have a tendency to compare their profiles with others, So this is what exactly this app is used for: To compare Twitter profiles. loklak provides so many API’s which serves different functionalities. One among those API’s which I am using to implement this app is loklak’s User Details API. This API actually help in getting all the details of the user we search giving the user name as the query. In this app am going to implement a comparison between two twitter profiles which is shown in the form of tables on the output screen. Usage of loklak’s User Profile API in the app: In this app when the user given in the user names in the search fields as seen below: The queries entered into the search field are taken and used as query in the User Profile API. The query in the code is taken in the following form: var userQueryCommand = 'http://api.loklak.org/api/user.json?' + 'callback=JSON_CALLBACK&screen_name=' + $scope.query; var userQueryCommand1 = 'http://api.loklak.org/api/user.json?' + 'callback=JSON_CALLBACK&screen_name=' + $scope.query1; The query return a json output from which we fetch details which we need. A simple query and its json output: http://api.loklak.org/api/user.json?screen_name=fossasia Sample json output: { "search_metadata": {"client": "162.158.50.42"}, "user": { "$P": "I", "utc_offset": -25200, "friends_count": 282, "profile_image_url_https": "https://pbs.twimg.com/profile_images/1141238022/fossasia-cubelogo_normal.jpg", "listed_count": 185, "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/882420659/14d1d447527f8524c6aa0c568fb421d8.jpeg", "default_profile_image": false, "favourites_count": 1877, "description": "#FOSSASIA #OpenTechSummit 2017, March 17-19 in Singapore https://t.co/aKhIo2s1Ck #OpenTech community of developers & creators #Code #Hardware #OpenDesign", "created_at": "Sun Jun 20 16:13:15 +0000 2010", "is_translator": false, "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/882420659/14d1d447527f8524c6aa0c568fb421d8.jpeg", "protected": false, "screen_name": "fossasia", "id_str": "157702526", "profile_link_color": "DD2E44", "is_translation_enabled": false, "translator_type": "none", "id": 157702526, "geo_enabled": true, "profile_background_color": "F50000", "lang": "en", "has_extended_profile": false, "profile_sidebar_border_color": "000000", "profile_location": null, "profile_text_color": "333333", "verified": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1141238022/fossasia-cubelogo_normal.jpg", "time_zone": "Pacific Time (US & Canada)", "url": "http://t.co/eLxWZtqTHh", "contributors_enabled": false, "profile_background_tile": true, }   I am getting data from the json outputs as shown above, I use different fields from the json output like screen_name, favourites_count etc. Injecting data from loklak API response using Angular: As the loklak’s user profile API returns a json format file, I am using Angular JS to align the data according to the needs in the app. I am using JSONP to retrieve the data from the API. JSONP or "JSON with padding" is a JSON extension wherein a prefix is specified as an input argument of the call itself. This how it is written in code: $http.jsonp(String(userQueryCommand)).success(function (response) { $scope.userData = response.user; }); Here the response is stored into a $scope is an application object here. Using the $scope.userData variable , we access the data and display it on the screen using Javascript, HTML and CSS. <div id="contactCard" style="pull-right"> <div class="panel panel-default"> <div class="panel-heading clearfix"> <h3 class="panel-title pull-left">User 1 Profile</h3> </div> <div class="list-group"> <div class="list-group-item"> <img src="{{userData.profile_image_url}}" alt="" style="pull-left"> <h4 class="list-group-item-heading" >{{userData.name}}</h4> </div> In this app am also adding keyboard action and validations of fields which will not allow users to search for an empty query using this simple line in the input field. ng-keyup="$event.keyCode == 13 && query1 != '' && query != '' ? Search()…

Continue ReadingHow the Compare Twitter Profiles loklak App works

Introducing Priority Kaizen Harvester for loklak server

In the previous blog post, I discussed the changes made in loklak’s Kaizen harvester so it could be extended and other harvesting strategies could be introduced. Those changes made it possible to introduce a new harvesting strategy as PriorityKaizen harvester which uses a priority queue to store the queries that are to be processed. In this blog post, I will be discussing the process through which this new harvesting strategy was introduced in loklak. Background, motivation and approach Before jumping into the changes, we first need to understand that why do we need this new harvesting strategy. Let us start by discussing the issue with the Kaizen harvester. The produce consumer imbalance in Kaizen harvester Kaizen uses a simple hash queue to store queries. When the queue is full, new queries are dropped. But numbers of queries produced after searching for one query is much higher than the consumption rate, i.e. the queries are bound to overflow and new queries that arrive would get dropped. (See loklak/loklak_server#1156) Learnings from attempt to add blocking queue for queries As a solution to this problem, I first tried to use a blocking queue to store the queries. In this implementation, the producers would get blocked before putting the queries in the queue if it is full and would wait until there is space for more. This way, we would have a good balance between consumers and producers as the consumers would be waiting until producers can free up space for them - public class BlockingKaizenHarvester extends KaizenHarvester {    ...    public BlockingKaizenHarvester() {        super(new KaizenQueries() {            ...            private BlockingQueue<String> queries = new ArrayBlockingQueue<>(maxSize);            @Override            public boolean addQuery(String query) {                if (this.queries.contains(query)) {                    return false;                }                try {                    this.queries.offer(query, this.blockingTimeout, TimeUnit.SECONDS);                    return true;                } catch (InterruptedException e) {                    DAO.severe("BlockingKaizen Couldn't add query: " + query, e);                    return false;                }            }            @Override            public String getQuery() {                try {                    return this.queries.take();                } catch (InterruptedException e) {                    DAO.severe("BlockingKaizen Couldn't get any query", e);                    return null;                }            }            ...        });    } } [SOURCE, loklak/loklak_server#1210] But there is an issue here. The consumers themselves are producers of even higher rate. When a search is performed, queries are requested to be appended to the KaizenQueries instance for the object (which here, would implement a blocking queue). Now let us consider the case where queue is full and a thread requests a query from the queue and scrapes data. Now when the scraping is finished, many new queries are requested to be inserted to most of them get blocked (because the queue would be full again after one query getting inserted). Therefore, using a blocking queue in KaizenQueries is not a good thing to do. Other considerations After the failure of introducing the Blocking Kaizen harvester, we looked for other alternatives for storing queries. We came across multilevel queues, persistent disk queues and priority queues. Multilevel queues sounded like a good idea at first where we would have multiple queues for storing queries. But eventually, this would just boil down to how…

Continue ReadingIntroducing Priority Kaizen Harvester for loklak server

Create Scraper in Javascript for Loklak Scraper JS

Loklak Scraper JS is the latest repository in Loklak project. It is one of the interesting projects because of expected benefits of Javascript in web scraping. It has a Node Javascript engine and is used in Loklak Wok project as bundled package. It has potential to be used in different repositories and enhance them. Scraping in Python is easy (at least for Pythonistas) as one needs to just import Request library and BeautifulSoup library (lxml as better option), write some lines of code using Request library to get webpage and some lines of bs4 to walk through html and scrape data. This sums up to about less than a hundred lines of coding, where as Javascript coding isn't easily readable (at least to me) as compared to Python. But it has an advantage, it can easily deal with Javascript in the pages we are scraping. This is one of the motive, Loklak Scraper JS repository was created and we contributed and worked on it. I recently coded a Javascript scraper in loklak_scraper_js repository. While coding, I found it’s libraries similar to the libraries, I use to code in Python. Therefore, this blog is for Pythonistas how they can start scraping in Javascript as they finish reading and also contribute to Loklak Scraper JS. First, replace Python interpreter, Request and Beautifulsoup library with Node JS interpreter, Request and Cheerio JS library. 1) Node JS Interpreter: Node JS Interpreter is used to interpret Javascript files. This is different from Python as it deals with the project instead of a module in case of Python. The most compatible Node for most of the libraries is 6.0.0 , where as latest version available(as I checked) is 8.0.0 TIP: use `--save` with npm like here while installing a library. 2) Request Library :- This is used to load webpage to be processed. Similar to one in Python. Request-promise library, a wrapper around Request with implementation of Bluebird library, improves readability and makes code cleaner (how?).   3) Cheerio Library:- A Pythonista (a rookie one) can call it twin of BeautifulSoup Library. But this is faster and is Javascript. It's selector implementation is nearly identical to jQuery's. Let us code a basic Javascript scraper. I will take TimeAndDate scraper from loklak_scraper_js as example here. It inputs place and outputs its local time. Step#1: fetching HTML from webpage with the help of Request library. We input url to Request function to fetch the webpage and is saved to `html` variable. This scrapeTimeAndDate() function scrapes data from html url = "http://www.timeanddate.com/worldclock/results.html?query=London"; request(url, function(error, response, body) { if(error) { console.log("Error: " + error); process.exit(-1); } html = body; scrapeTimeAndDate() });   Step#2: To scrape important data from html using Cheerio JS list of date and time of locations is embedded in table tag, So we will iterate through <td> and extract text. a) Load html to Cheerio as we do in beautifulsoup In Python soup = BeautifulSoup(html,'html5lib')   In Cheerio JS $ = cheerio.load(html);   b) This line finds…

Continue ReadingCreate Scraper in Javascript for Loklak Scraper JS

Best Practices when writing Tests for loklak Server

Why do we write unit-tests? We write them to ensure that developers’ implementation doesn't change the behaviour of parts of the project. If there is a change in the behaviour, unit-tests throw errors. This keep developers in ease during integration of the software and ensure lower chances of unexpected bugs. After setting up the tests in Loklak Server, we were able to check whether there is any error or not in the test. Test failures didn’t mention the error and the exact test case at which they failed. It was YoutubeScraperTest that brought some of the best practices in the project. We modified the tests according to it. The following are some of the best practices in 5 points that we shall follow while writing unit tests: Assert the assertions There are many assert methods which we can use like assertNull, assertEquals etc. But we should use one which describes the error well (being more descriptive) so that developer's effort is reduced while debugging. Using these assertions related preferences help in getting to the exact errors on test fails, thus helping in easier debugging of the code. Some examples can be:- Using assertThat() over assertTrue assertThat() give more descriptive errors over assertTrue(). Like:- When assertTrue() is used: java.lang.AssertionError: Expected: is <true> but: was <false> at org.loklak.harvester.TwitterScraperTest.testSimpleSearch(TwitterScraperTest.java:142) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at org.hamcr..........   When assertThat() is used: java.lang.AssertionError: Expected: is <true> but: was <false> at org.loklak.harvester.TwitterScraperTest.testSimpleSearch(TwitterScraperTest.java:142) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at org.hamcr...........   NOTE:- In many cases, assertThat() is preferred over other assert method (read this), but in some cases other methods are used to give better descriptive output (like in next examples) Using assertEquals() over assertThat() For assertThat() java.lang.AssertionError: Expected: is "ar photo #test #car https://pic.twitter.com/vd1itvy8Mx" but: was "car photo #test #car https://pic.twitter.com/vd1itvy8Mx" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.junit.Assert.assertThat(Ass........   For assertEquals() org.junit.ComparisonFailure: expected:<[c]ar photo #test #car ...> but was:<[]ar photo #test #car ...> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.loklak.harvester.Twitter.........   We can clearly see that second example gives better error description than the first one.(An SO link) One Test per Behaviour Each test shall be independent of other with none having mutual dependencies. It shall test only a specific behaviour of the module that is tested. Have a look of this snippet. This test checks the method that creates the twitter url by comparing the output url method with the expected output url. @Test public void testPrepareSearchURL() { String url; String[] query = { "fossasia", "from:loklak_test", "spacex since:2017-04-03 until:2017-04-05" }; String[] filter = {"video", "image", "video,image", "abc,video"}; String[] out_url = { "https://twitter.com/search?f=tweets&vertical=default&q=fossasia&src=typd", "https://twitter.com/search?f=tweets&vertical=default&q=fossasia&src=typd", }; // checking simple urls for (int i = 0; i < query.length; i++) { url = TwitterScraper.prepareSearchURL(query[i], ""); //compare urls with urls created assertThat(out_url[i], is(url)); } }   This unit-test tests whether the method-under-test is able to create twitter link according to query or not. Selecting test cases for the test We shall remember that testing is a very costly task in terms of processing. It takes time to execute. That is why, we need to keep the…

Continue ReadingBest Practices when writing Tests for loklak Server

URL Unshortening in Java for loklak server

There are many URL shortening services on the internet. They are useful in converting really long URLs to shorter ones. But apart from redirecting to a longer URL, they are often used to track the people visiting those links. One of the components of loklak server is its URL unshortening and redirect resolution service, which ensures that websites can’t track the users using those links and enhances the protection of privacy. How this service works in loklak. Redirect Codes in HTTP Various standards define 3XX status codes as an indication that the client must perform additional actions to complete the request. These response codes range from 300 to 308, based on the type of redirection. To check the redirect code of a request, we must first make a request to some URL - String urlstring = "http://tinyurl.com/8kmfp"; HttpRequestBase req = new HttpGet(urlstring); Next, we will configure this request to disable redirect and add a nice Use-Agent so that websites do not block us as a robot - req.setConfig(RequestConfig.custom().setRedirectsEnabled(false).build()); req.setHeader("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36"); Now we need a HTTP client to execute this request. Here, we will use Apache’s CloseableHttpClient - CloseableHttpClient httpClient = HttpClients.custom()                                    .setConnectionManager(getConnctionManager(true))                                    .setDefaultRequestConfig(defaultRequestConfig)                                    .build(); The getConnctionManager returns a pooling connection manager that can reuse the existing TCP connections, making the requests very fast. It is defined in org.loklak.http.ClientConnection. Now we have a client and a request. Let’s make our client execute the request and we shall get an HTTP entity on which we can work. HttpResponse httpResponse = httpClient.execute(req); HttpEntity httpEntity = httpResponse.getEntity(); Now that we have executed the request, we can check the status code of the response by calling the corresponding method - if (httpEntity != null) {    int httpStatusCode = httpResponse.getStatusLine().getStatusCode();    System.out.println("Status code - " + httpStatusCode); } else {    System.out.println("Request failed"); } Hence, we have the HTTP code for the requests we make. Getting the Redirect URL We can simply check for the value of the status code and decide whether we have a redirect or not. In the case of a redirect, we can check for the “Location” header to know where it redirects. if (300 <= httpStatusCode && httpStatusCode <= 308) {    for (Header header: httpResponse.getAllHeaders()) {        if (header.getName().equalsIgnoreCase("location")) {            redirectURL = header.getValue();        }    } } Handling Multiple Redirects We now know how to get the redirect for a URL. But in many cases, the URLs redirect multiple times before reaching a final, stable location. To handle these situations, we can repeatedly fetch redirect URL for intermediate links until we saturate. But we also need to take care of cyclic redirects so we set a threshold on the number of redirects that we have undergone - String urlstring = "http://tinyurl.com/8kmfp"; int termination = 10; while (termination-- > 0) {    String unshortened = getRedirect(urlstring);    if (unshortened.equals(urlstring)) {        return urlstring;    }    urlstring = unshortened; } Here, getRedirect is the method which performs single redirect for a URL and returns the same URL in case…

Continue ReadingURL Unshortening in Java for loklak server

Improving Harvesting Decision for Kaizen Harvester in loklak server

About Kaizen Harvester Kaizen is an alternative approach to do harvesting in loklak. It focuses on query and information collecting to generate more queries from collected timelines. It maintains a queue of query that is populated by extracting following information from timelines - Hashtags in Tweets User mentions in Tweets Tweets from areas near to each Tweet in timeline. Tweets older than oldest Tweet in timeline. Further, it can also utilise Twitter API to get trending keywords from Twitter and get search suggestions from other loklak peers. It was introduced by @yukiisbored in pull request loklak/loklak_server#960. The Problem: Unbiased Harvesting Decision The Kaizen harvester either searches for queries from the queue, or tries to grab trending queries (using Twitter API or from backend). In the previous version of KaizenHarvester, the decision of “harvesting vs. info-grabbing” was taken based on the value from a random boolean generator - @Override public int harvest() {    if (!queries.isEmpty() && random.nextBoolean())        return harvestMessages();    grabSuggestions();    return 0; } [SOURCE] In sane situations, the Kaizen harvester is configured to use a fixed size queue and drops the queries which are requested to get added once the queue is full. And since the decision doesn’t take into account the amount to which queue is filled, it would often call the grabSuggestions() method. But since the queue would be full, the grabbed suggestions would simply be lost. This would result in wastage of time and resources in fetching the suggestions (from backend or API). To overcome this, something better was to be done in this part. The Solution: Making Decision Biased To solve the problem of dumb harvesting decision, the harvester was triggered based on the following steps - Calculate the ratio of queue filled (q.size() / q.maxSize()). Generate a random floating point number between 0 and 1. If the number is less than the fraction, harvest. Otherwise get harvesting suggestions. Why would this work? Initially, when the queue is mostly empty, the ratio would be a small number. So, it would be highly probable that a random number generated between 0 and 1 would be greater than the ratio. And Kaizen would go for grabbing search suggestions. If this ratio is large (i.e. the queue is almost full), it would be highly likely that the random number generated would be less than it, making it more likely to search for results instead of grabbing suggestions. Graph? The following graph shows how the harvester decision would change. It performs 10k iterations for a given queue ratio and plots the number of times harvesting decision was taken. Change in code The harvest() method was changed in loklak/loklak_server#1158 to take smart decision of harvesting vs. info-grabbing in following manner - @Override public int harvest() {    float targetProb = random.nextFloat();    float prob = 0.5F;    if (QUERIES_LIMIT > 0) {        prob = queries.size() / (float)QUERIES_LIMIT;    }    if (!queries.isEmpty() && targetProb < prob) {        return harvestMessages();    }    grabSuggestions();    return 0; } [SOURCE] Conclusion This change brought enhancement in the Kaizen harvester and made it…

Continue ReadingImproving Harvesting Decision for Kaizen Harvester in loklak server

Writing Simple Unit-Tests with JUnit

In the Loklak Server project, we use a number of automation tools like the build testing tool ‘TravisCI’, automated code reviewing tool ‘Codacy’, and ‘Gemnasium’. We are also using JUnit, a java-based unit-testing framework for writing automated Unit-Tests for java projects. It can be used to test methods to check their behaviour whenever there is any change in implementation. These unit-tests are handy and are coded specifically for the project. In the Loklak Server project it is used to test the web-scrapers. Generally JUnit is used to check if there is no change in behaviour of the methods, but in this project, it also helps in keeping check if the website code has been modified, affecting the data that is scraped. Let’s start with basics, first by setting up, writing a simple Unit-Tests and then Test-Runners. Here we will refer how unit tests have been implemented in Loklak Server to familiarize with the JUnit Framework. Setting-UP Setting up JUnit with gradle is easy, You have to do just 2 things:- 1) Add JUnit dependency in build.gradle Dependencies { . . . . . .<other compile groups>. . . compile group: 'com.twitter', name: 'jsr166e', version: '1.1.0' compile group: 'com.vividsolutions', name: 'jts', version: '1.13' compile group: 'junit', name: 'junit', version: '4.12' compile group: 'org.apache.logging.log4j', name: 'log4j-1.2-api', version: '2.6.2' compile group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.6.2' . . . . . . }   2) Add source for 'test' task from where tests are built (like here). Save all tests in test directory and keep its internal directory structure identical to src directory structure. Now set the path in build.gradle so that they can be compiled. sourceSets.test.java.srcDirs = ['test']   Writing Unit-Tests In JUnit FrameWork a Unit-Test is a method that tests a particular behaviour of a section of code. Test methods are identified by annotation @Test. Unit-Test implements methods of source files to test their behaviour. This can be done by fetching the output and comparing it with expected outputs. The following test tests if twitter url that is created is valid or not that is to be scraped. /** * This unit-test tests twitter url creation */ @Test public void testPrepareSearchURL() { String url; String[] query = {"fossasia", "from:loklak_test", "spacex since:2017-04-03 until:2017-04-05"}; String[] filter = {"video", "image", "video,image", "abc,video"}; String[] out_url = { "https://twitter.com/search?f=tweets&vertical=default&q=fossasia&src=typd", "https://twitter.com/search?f=tweets&vertical=default&q=from%3Aloklak_test&src=typd", "and other output url strings to be matched…..." }; // checking simple urls for (int i = 0; i < query.length; i++) { url = TwitterScraper.prepareSearchURL(query[i], ""); //compare urls with urls created assertThat(out_url[i], is(url)); } // checking urls having filters for (int i = 0; i < filter.length; i++) { url = TwitterScraper.prepareSearchURL(query[0], filter[i]); //compare urls with urls created assertThat(out_url[i+3], is(url)); } }   Testing the implementation of code is useless as it will either make code more difficult to change or tests useless  . So be cautious while writing tests and keep difference between Implementation and Behaviour in mind. This is the perfect example for a simple Unit-Test. As we see there are some points,…

Continue ReadingWriting Simple Unit-Tests with JUnit