Developing LoklakWordCloud app for Loklak apps site

LoklakWordCloud app is an app to visualise data returned by loklak in form of a word cloud.

The app is presently hosted on Loklak apps site.

Word clouds provide a very simple, easy, yet interesting and effective way to analyse and visualise data. This app will allow users to create word cloud out of twitter data via Loklak API.

Presently the app is at its very early stage of development and more work is left to be done. The app consists of a input field where user can enter a query word and on pressing search button a word cloud will be generated using the words related to the query word entered.

Loklak API is used to fetch all the tweets which contain the query word entered by the user.

These tweets are processed to generate the word cloud.

Related issue: https://github.com/fossasia/apps.loklak.org/pull/279

Live app: http://apps.loklak.org/LoklakWordCloud/

Developing the app

The main challenge in developing this app is implementing its prime feature, that is, generating the word cloud. How do we get a dynamic word cloud which can be easily generated by the user based on the word he has entered? Well, here comes in Jqcloud. An awesome lightweight Jquery plugin for generating word clouds. All we need to do is provide list of words along with their weights.

Let us see step by step how this app (first version) works. First we require all the tweets which contain the entered word. For this we use Loklak search service. Once we get all the tweets, then we can parse the tweet body to create a list of words along with their frequency.

var url = "http://35.184.151.104/api/search.json?callback=JSON_CALLBACK&count=100&q=" + query;
        $http.jsonp(url)
            .then(function (response) {
                $scope.createWordCloudData(response.data.statuses);
                $scope.tweet = null;
            });

Once we have all the tweets, we need to extract the tweet texts and create a list of valid words. What are valid words? Well words like ‘the’, ‘is’, ‘a’, ‘for’, ‘of’, ‘then’, does not provide us with any important information and will not help us in doing any kind of analysis. So there is no use of including them in our word cloud. Such words are called stop words and we need to get rid of them. For this we are using a list of commonly used stop words. Such lists can be very easily found over the internet. Here is the list which we are using. Once we are able to extract the text from the tweets, we need to filter stop words and insert the valid words into a list.

 tweet = data[i];
            tweetWords = tweet.text.replace(", ", " ").split(" ");

            for (var j = 0; j < tweetWords.length; j++) {
                word = tweetWords[j];
                word = word.trim();
                if (word.startsWith("'") || word.startsWith('"') || word.startsWith("(") || word.startsWith("[")) {
                    word = word.substring(1);
                }
                if (word.endsWith("'") || word.endsWith('"') || word.endsWith(")") || word.endsWith("]") ||
                    word.endsWith("?") || word.endsWith(".")) {
                    word = word.substring(0, word.length - 1);
                }
                if (stopwords.indexOf(word.toLowerCase()) !== -1) {
                    continue;
                }
                if (word.startsWith("#") || word.startsWith("@")) {
                    continue;
                }
                if (word.startsWith("http") || word.startsWith("https")) {
                    continue;
                }
                $scope.filteredWords.push(word);
            }

What are we actually doing in the above snippet? We are simply iterating over each of the statuses returned by Loklak API. For each tweet, first we are splitting the text into words and then we are iterating over those words. For a given word we do a number of checks. First we check if the word begins or ends with a special character, for example quotation marks or brackets. If so we remove those character as it will cause trouble in calculating frequencies. Next we also check if the word is beginning with ‘#’ or ‘@’. If it is true, then we discard such words as we are handling hashtags and mentions separately. Finally we check whether the word is a stop word or not. If it is a stop word then we discard it. If a word passes all the checks, we add it to our list of valid words.

Once we are done with the tweet bodies, next we need to handle hashtags and mentions.

tweet.hashtags.forEach(function (hashtag) {
                $scope.filteredWords.push("#" + hashtag);
            });

            tweet.mentions.forEach(function (mention) {
                $scope.filteredWords.push("@" + mention);
            });

The above code simply iterates over the hashtags and mentions and inserts them into the filteredWords list. We have handled hashtags and mentions separately so that we can apply filters in future.

Once we are done with generating list of valid words, we need to calculate weight for each of the word. Here weight is nothing but the number of times a particular word is present in the list. We calculate this using JavaScript object. We iterate over the list of valid words. If word is not present in the object (or dictionary as you wish to call it), we create a new key by the name of that word and set its value to one. If a word is already present as a key, then we simply increment its value by one.

for (var word in $scope.wordFreq) {
            $scope.wordCloudData.push({
                text: word,
                weight: $scope.wordFreq[word]
            });
        }

The above code snippet calculates the frequency of each word by the process mentioned above.

Now we are all set to generate our word cloud. We simply use Jqcloud’s interface to configure it with the words and their respective frequencies, provide a list of color codes for a color gradient, and set autoResize to true so that our word cloud resizes itself when the screen size changes.

$scope.generateWordCloud = function() {
        if ($scope.wordCloud === null) {
            $scope.wordCloud = $('.wordcloud').jQCloud($scope.wordCloudData, {
                colors: ["#D50000", "#FF5722", "#FF9800", "#4CAF50", "#8BC34A", "#4DB6AC", "#7986CB", "#5C6BC0", "#64B5F6"],
                fontSize: {
                    from: 0.06,
                    to: 0.01
                },
                autoResize: true
            });
        } else {
            $scope.wordCloud = $(".wordcloud").jQCloud('update', $scope.wordCloudData);
        }
    }

Whenever the user searches for a new word, we simply update the existing word cloud with the cloud of the new word.

Future roadmap

  • Make the words in the cloud clickable. On clicking a word, the cloud should get replaced by the selected word’s cloud.
  • Add filters for hashtags, mentions, date.
  • Add option for exporting the cloud to an image, so that user’s can also use this app as a tool to generate word clouds as images and save them.
  • Add a loader and error notification for invalid or empty input.

Important resources

  • View the app source code here.
  • Learn more about Loklak API here.
  • Learn more about Jqcloud here.
  • Learn more about AngularJS here.
Continue ReadingDeveloping LoklakWordCloud app for Loklak apps site

Advanced functionality in SUSI Tweetbot

SUSI AI is integrated to Twitter (blog). During the initial phase, SUSI Tweetbot had basic UI and functionalities like just “plain text” replies. Twitter provides with many more features like quick replies i.e. presenting to the user with some choices to choose from or visiting SUSI server repository by just clicking buttons during the chat etc.

All these features are provided to enhance the user experience with our chatbot on Twitter.

This blog post walks you through on adding these functionalities to the SUSI Tweetbot:

  1. Quick replies
  2. Buttons

    Quick replies:

    This feature provides options to the user to choose from.

    The user doesn’t need to type the next query but rather select a quick reply from the options available. This speeds up the process and makes it easy for the user. Also, it helps developers know all the possible queries which can come next, from the user. Hence, it helps in efficient coding on how to handle those queries.In SUSI Tweetbot this feature is used to welcome a new user to the SUSI A.I.’s chat window, as shown in the image above. The user can select any option among “Get started” and “Start chatting”.The “Get started” option is basically for introduction of SUSI A.I. to the user. While, “Start chatting” when clicked shows the user of what all queries the user can try.Let’s come to the code part on how to show these options and what events happen when a user selects one of the options.

    To show the Welcome message, we call SUSI API with the query as string “Welcome” and store the reply in message variable. The code snippet used:

var queryUrl = 'http://api.susi.ai/susi/chat.json?q=Welcome';
var message = '';
request({
    url: queryUrl,
    json: true
}, function (err, response, data) {
    if (!err && response.statusCode === 200) {
        message = data.answers[0].actions[0].expression;
    } 
    else {
        // handle error
    }
});

To show options with the message:

var msg =  {
        "welcome_message" : {
                    "message_data": {
                        "text": message,
                        "quick_reply": {
                              "type": "options",
                              "options": [
                                {
                                  "label": "Get started",
                                  "metadata": "external_id_1"
                                },
                                {
                                  "label": "Start chatting",
                                  "metadata": "external_id_2"
                                }
                              ]
                            }
                    }
                      }
       };
T.post('direct_messages/welcome_messages/new', msg, sent);

The line T.post() makes a POST request to the Twitter API, to register the welcome message with Twitter for our chatbot. The return value from this request includes a welcome message id in it corresponding to this welcome message.

We set up a welcome message rule for this welcome message using it’s id. By setting up the rule is to set this welcome message as the default welcome message shown to new users. Twitter also provides with custom welcome messages, information about which can be found in their official docs.

The welcome message rule is set up by sending the welcome message id as a key in the request body:

var welcomeId = data.welcome_message.id;
var welcomeRule = {
            "welcome_message_rule": {
                "welcome_message_id": welcomeId
            }
};
T.post('direct_messages/welcome_messages/rules/new', welcomeRule, sent);

Now, we are all set to show the new users with a welcome message.

Buttons:

Let’s go a bit further. If the user clicks on the option “Get started”, we want to show a basic introduction of SUSI A.I. to the user. Along with that we provide some buttons to visit the SUSI A.I. repository or experience chatting with SUSI A.I. on the web client.

The procedure followed to show buttons is almost the same as followed in case of options.This doc by Twitter proves to be helpful to get familiar with buttons.

As soon as a person clicks on “Get started” option, Twitter sends a message to our bot with the query as “Get started”.

For the message part, we call SUSI API with the query as string “Get started” and store the reply in a message variable. The code snippet used:

var queryUrl = 'http://api.susi.ai/susi/chat.json?q=Get+started';
var message = '';
request({
    url: queryUrl,
    json: true
}, function (err, response, data) {
    if (!err && response.statusCode === 200) {
        message = data.answers[0].actions[0].expression;
    } 
    else {
        // handle error
    }
});

Both the buttons to be shown with the message should have a corresponding url. So that after clicking the button a person is redirected to that url in a new browser tab.

To show buttons a simple message won’t help, we need to create an event. This event constitutes of our message and buttons. The buttons are referred to as call-to-action i.e. CTAs by Twitter dev’s. The maximum number of buttons in an event can not be more than three in number.

The code used to make an event in our case:

var msg = {
        "event": {
                "type": "message_create",
                "message_create": {
                              "target": {
                                "recipient_id": sender
                              },
                              "message_data": {
                                "text": message,
                                "ctas": [
                                  {
                                    "type": "web_url",
                                    "label": "View Repository",
                                    "url": "https://www.github.com/fossasia/susi_server"
                                  },
                                  {
                                    "type": "web_url",
                                    "label": "Chat on the web client",
                                    "url": "http://chat.susi.ai"
                                  }
                                ]
                              }
                            }
            }
        };

T.post('direct_messages/events/new', msg, sent);

The line T.post() makes a POST request to the Twitter API, to send this event to the concerned recipient guiding them on how to get started with the bot.

Resources:

  1. Speed up customer service with quick replies and welcome messages by Ian Cairns from Twitter blog.
  2. Drive discovery of bots and other customer experiences in direct messages by Travis Lull from Twitter blog.
Continue ReadingAdvanced functionality in SUSI Tweetbot

Scraping Concurrently with Loklak Server

At Present, SearchScraper in Loklak Server uses numerous threads to scrape Twitter website. The data fetched is cleaned and more data is extracted from it. But just scraping Twitter is under-performance.

Concurrent scraping of other websites like Quora, Youtube, Github, etc can be added to diversify the application. In this way, single endpoint search.json can serve multiple services.

As this Feature is under-refinement, We will discuss only the basic structure of the system with new changes. I tried to implement more abstract way of Scraping by:-

1) Fetching the input data in SearchServlet

Instead of selecting the input get-parameters and referencing them to be used, Now complete Map object is referenced, helping to be able to add more functionality based on input get-parameters. The dataArray object (as JSONArray) is fetched from DAO.scrapeLoklak method and is embedded in output with key results

    // start a scraper
    inputMap.put("query", query);
    DAO.log(request.getServletPath() + " scraping with query: "
           + query + " scraper: " + scraper);
    dataArray = DAO.scrapeLoklak(inputMap, true, true);

 

2) Scraping the selected Scrapers concurrently

In DAO.java, the useful get parameters of inputMap are fetched and cleaned. They are used to choose the scrapers that shall be scraped, using getScraperObjects() method.

Timeline2.Order order= getOrder(inputMap.get("order"));
Timeline2 dataSet = new Timeline2(order);
List<String> scraperList = Arrays.asList(inputMap.get("scraper").trim().split("\\s*,\\s*"));

 

Threads are created to fetch data from different scrapers according to size of list of scraper objects fetched. input map is passed as argument to the scrapers for further get parameters related to them and output data according to them.

List<BaseScraper> scraperObjList = getScraperObjects(scraperList, inputMap);
ExecutorService scraperRunner = Executors.newFixedThreadPool(scraperObjList.size());

try{
    for (BaseScraper scraper : scraperObjList)
    {
        scraperRunner.execute(() -> {
            dataSet.mergePost(scraper.getData());
        });

    }

} finally {
    scraperRunner.shutdown();

    try {
        scraperRunner.awaitTermination(24L, TimeUnit.HOURS);
    } catch (InterruptedException e) { }
}

 

3) Fetching the selected Scraper Objects in DAO.java

Here the variable of abstract class BaseScraper (SuperClass of all search scrapers) is used to create List of scrapers to be scraped. All the scrapers’ constructors are fed with input map to be scraped accordingly.

List<BaseScraper> scraperObjList = new ArrayList<BaseScraper>();
BaseScraper scraperObj = null;

if (scraperList.contains("github") || scraperList.contains("all")) {
    scraperObj = new GithubProfileScraper(inputMap);
    scraperObjList.add(scraperObj);
}
.
.
.

 

References:

Continue ReadingScraping Concurrently with Loklak Server

Sharing Images on Twitter from Phimpme Android App Using twitter4j

As sharing an image to the social media platform is an important feature in Phimpme android. In my previous blog, I have explained how to authenticate the Android application with Twitter. In this blog, I will discuss how to upload an image directly on Twitter from the application after successfully logging to Twitter.

To check if the application is authenticated to Twitter or not.

When the application is successfully authenticated Twitter issues a Token which tells the application if it is connected to Twitter or not. In LoginActivity.java the function isActive returns a boolean value. True if the Twitter token is successfully issued or else false.  

public static boolean isActive(Context ctx) {
        SharedPreferences sharedPrefs = ctx.getSharedPreferences(AppConstant.SHARED_PREF_NAME, Context.MODE_PRIVATE);
        return sharedPrefs.getString(AppConstant.SHARED_PREF_KEY_TOKEN, null) != null;
    }

We call isActive function from LoginActive class to check if the application is authenticated to Twitter or not. We call it before using the share function in sharingActivity:

if (LoginActivity.isActive(context)) {
                try {
                    // Send Image function
} catch (Exception ex) {
                    Toast.makeText(context, "ERROR", Toast.LENGTH_SHORT).show();
 }

We have saved the image in the internal storage of the device and use saveFilePath to use the path of the saved image. In Phimpme we used HelperMethod class where our share function resides, and while the image is being shared an alert dialog box with spinner pops on the screen.

Sending the image to HelperMethod class

First,

We need to get the image and convert it into Bitmaps. Since, the image captured by the phone camera is usually large to upload and it will take a lot of time we need to compress the Bitmap first. BitmapFactory.decodeFile(specify name of the file) is used to fetch the file and convert it into bitmap.

To send the data we used FileOutStream to the set the path of the file or image in this case. Bitmap.compress method is used to compress the image to desired value and format. In Phimpme we are converting it into PNG.  

Bitmap bmp = BitmapFactory.decodeFile(saveFilePath);
                    String filename = Environment.getExternalStorageDirectory().toString() + File.separator + "1.png";
                    Log.d("BITMAP", filename);
                    FileOutputStream out = new FileOutputStream(saveFilePath);
                    bmp.compress(Bitmap.CompressFormat.PNG, 90, out);

                    HelperMethods.postToTwitterWithImage(context, ((Activity) context), saveFilePath, caption, new HelperMethods.TwitterCallback() {

                        @Override
                        public void onFinsihed(Boolean response) {
                            mAlertBuilder.dismiss();
                            Snackbar.make(parent, R.string.tweet_posted_on_twitter, Snackbar.LENGTH_LONG).show();
                        }

Post image function

To post the image on Twitter we will use ConfigurationBuilder class. We will create a new object of the class and then attach Twitter consumer key, consumer secret key, Twitter access token, and twitter token secret.

  • setOAuthConsumerKey() function is used to set the consumer key which is generated by the Twitter when creating the application in the Twitter development environment.
  • Similarly, setOAuthConsumerSecret() function is used to set the consumer secret key.
  • Specify the token key which generated after successfully connecting to twitter in setOAuthAcessToken() fuction and Token secret in setOAuthAcessTokenSecret() function.  
ConfigurationBuilder configurationBuilder = new ConfigurationBuilder();       configurationBuilder.setOAuthConsumerKey(context.getResources().getString(R.string.twitter_consumer_key));
configurationBuilder.setOAuthConsumerSecret(context.getResources().getString(R.string.twitter_consumer_secret));
configurationBuilder.setOAuthAccessToken(LoginActivity.getAccessToken((context)));
configurationBuilder.setOAuthAccessTokenSecret(LoginActivity.getAccessTokenSecret(context));
        Configuration configuration = configurationBuilder.build();
final Twitter twitter = new TwitterFactory(configuration).getInstance();

Sending Image to twitter:

  • The image is uploaded to twitter using statusUpdate class specified in Twitter4j API.
  • Pass the image file name in status.setMedia(file).
  • Pass the caption in status.updateStatus(caption).
  • updateStatus is used to finally upload the image with the caption.
final File file = new File(imageUrl);

        new Thread(new Runnable() {
            @Override
            public void run() {
                boolean success = true;
                try {
                    if (file.exists()) {
                        StatusUpdate status = new StatusUpdate(message);
                        status.setMedia(file);
                        twitter.updateStatus(status);
                    }else{
                        Log.d(TAG, "----- Invalid File ----------");
                        success = false;
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                    success = false;
                }

 Conclusion:                                                                                                                      Using Twitter4j API allows sharing image on Twitter without leaving the  application and opening any additional view.

Github

Resources

Continue ReadingSharing Images on Twitter from Phimpme Android App Using twitter4j

Integrating Twitter Authenticating using Twitter4j in Phimpme Android Application

We have used Twitter4j API to authenticate Twitter in Phimpme application. Below are the following steps in setting up the Twitter4j API in Phimpme and Login to Twitter from Phimpme android application.

Setting up the environment

Download the Twitter4j package from http://twitter4j.org/en/. For sharing images we will only need twitter4j-core-3.0.5.jar and twitter4j-media-support-3.0.5.jar files. Copy these files and save it in the libs folder of the application.

Go to build.gradle and add the following codes in dependencies:

dependencies {
compile files('libs/twitter4j-core-3.0.5.jar')
compile files('libs/twitter4j-media-support-3.0.5.jar')
}

Adding Phimpme application in Twitter development page

Go to https://dev.twitter.com/->My apps-> Create new apps. Create an application window opens where we have to fill all the necessary details about the application. It is mandatory to fill all the fields. In website field, if you are making an android application then anything can be filled in website field for example www.google.com. But it is necessary to fill this field also.

After filling all the details click on “Create your Twitter application” button.

Adding Twitter Consumer Key and Secret Key

This generates twitter consumer key and twitter secret key. We need to add this in our string.xml folder.

<string name="twitter_consumer_key">ry1PDPXM6rwFVC1KhQ585bJPy</string>
<string name="twitter_consumer_secret">O3qUqqBLinr8qrRvx3GXHWBB1AN10Ax26vXZdNlYlEBF3vzPFt</string> 

Twitter Authentication

Make a new JAVA class say LoginActivity. Where we have to first fetch the twitter consumer key and Twitter secret key.

private static Twitter twitter;
    private static RequestToken requestToken;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_twitter_login);
        twitterConsumerKey = getResources().getString(R.string.twitter_consumer_key);
        twitterConsumerSecret = getResources().getString(R.string.twitter_consumer_secret);  

We are using a web view to interact with the Twitter login page.

twitterLoginWebView = (WebView)findViewById(R.id.twitterLoginWebView);
        twitterLoginWebView.setBackgroundColor(Color.TRANSPARENT);
        twitterLoginWebView.setWebViewClient( new WebViewClient(){
            @Override
            public boolean shouldOverrideUrlLoading(WebView view, String url){

                if( url.contains(AppConstant.TWITTER_CALLBACK_URL)){
                    Uri uri = Uri.parse(url);
                    LoginActivity.this.saveAccessTokenAndFinish(uri);
                    return true;
                }
                return false;
            }             

If the access Token is already saved then the user is already signed in or else it sends the Twitter consumer key and the Twitter secret key to gain access Token. ConfigurationBuilder function is used to set the consumer key and consumer secret key.

ConfigurationBuilder configurationBuilder = new ConfigurationBuilder();
        configurationBuilder.setOAuthConsumerKey(twitterConsumerKey);     configurationBuilder.setOAuthConsumerSecret(twitterConsumerSecret);
        Configuration configuration = configurationBuilder.build();
        twitter = new TwitterFactory(configuration).getInstance();

It is followed by the following Runnable thread to check if the request token is received or not. If authentication fails, an error Toast message pops.

new Thread(new Runnable() {
            @Override
            public void run() {
                try {
                    requestToken = twitter.getOAuthRequestToken(AppConstant.TWITTER_CALLBACK_URL);
                } catch (Exception e) {
                    final String errorString = e.toString();
                    LoginActivity.this.runOnUiThread(new Runnable() {
                        @Override
                        public void run() {
                            mAlertBuilder.cancel();
                            Toast.makeText(LoginActivity.this, errorString, Toast.LENGTH_SHORT).show();
                            finish();
                        }
                    });
                    return;
                }

                LoginActivity.this.runOnUiThread(new Runnable() {
                    @Override
                    public void run() {
                        twitterLoginWebView.loadUrl(requestToken.getAuthenticationURL());
                    }
                });
            }
        }).start();

Conclusion

It offers seamless integration of Twitter in any application. Without leaving actual application, easier to authenticate. Further, it is used to upload the photo to Twitter directly from Phimpme Android application, fetch profile picture and username.

Github

Resources

 

Continue ReadingIntegrating Twitter Authenticating using Twitter4j in Phimpme Android Application

Visualising Tweet Statistics in MultiLinePlotter App for Loklak Apps

MultiLinePlotter app is now a part of Loklak apps site. This app can be used to compare aggregations of tweets containing a particular query word and visualise the data for better comparison. Recently there has been a new addition to the app. A feature for showing tweet statistics like the maximum number of tweets (along with date) containing the given query word and the average number of tweets over a period of time. Such statistics is visualised for all the query words for better comparison.

Related issue: https://github.com/fossasia/apps.loklak.org/issues/236

Obtaining Maximum number of tweets and average number of tweets

Before visualising the statistics we need to obtain them. For this we simply need to process the aggregations returned by the Loklak API. Let us start with maximum number of tweets containing the given keyword. What we actually require is what is the maximum number of tweets that were posted and contained the user given keyword and on which date the number was maximum. For this we can use a function which will iterate over all the aggregations and return the largest along with date.

$scope.getMaxTweetNumAndDate = function(aggregations) {
        var maxTweetDate = null;
        var maxTweetNum = -1;

        for (date in aggregations) {
            if (aggregations[date] > maxTweetNum) {
                maxTweetNum = aggregations[date];
                maxTweetDate = date;
            }
        }

        return {date: maxTweetDate, count: maxTweetNum};
    }

The above function maintains two variables, one for maximum number of tweets and another for date. We iterate over all the aggregations and for each aggregation we compare the number of tweets with the value stored in the maxTweetNum variable. If the current value is more than the value stored in that variable then we simply update it and keep track of the date. Finally we return an object containing both maximum number of tweets and the corresponding date.Next we need to obtain average number of tweets. We can do this by summing up all the tweet frequencies and dividing it by number of aggregations.

$scope.getAverageTweetNum = function(aggregations) {
        var avg = 0;
        var sum = 0;

        for (date in aggregations) {
            sum += aggregations[date];
        }

        return parseInt(sum / Object.keys(aggregations).length);
    }

The above function calculates average number of tweets in the way mentioned before the snippet.

Next for every tweet we need to store these values in a format which can easily be understood by morris.js. For this we use a list and store the statistics values for individual query words as objects and later pass it as a parameter to morris.

var maxStat = $scope.getMaxTweetNumAndDate(aggregations);
        var avg = $scope.getAverageTweetNum(aggregations);

        $scope.tweetStat.push({
            tweet: $scope.tweet,
            maxTweetCount: maxStat.count,
            maxTweetOn: maxStat.date,
            averageTweetsPerDay: avg,
            aggregationsLength: Object.keys(aggregations).length
        });

We maintain a list called tweetStat and the list contains objects which stores the query word and the corresponding values.

Apart from plotting these statistics, the app also displays the statistics when user clicks on an individual treat present in the search record section. For this we filter tweetStat list mentioned above and get the required object corresponding to the query word the user selected bind it to angular scope. Next we display it using HTML.

<div class="tweet-stat max-tweet">
                  <div class="stat-label"> <h4>Maximum number of tweets containing '{{modalHeading}}' :</h4></div>
                  <div class="stat-value"> <strong>{{selectedTweetStat.maxTweetCount}}</strong> tweets on
                    <strong>{{selectedTweetStat.maxTweetOn}}</strong>
                  </div>
                </div>

Finally we need to plot the statistics. For this we use a function called plotStatGraph dedicated only for plotting statistics graph. We pass the tweetStat list as a parameter to morris and configure all the other parameters.

$scope.plotStatGraph = function() {
        $scope.plotStat = new Morris.Bar({
            element: 'graph',
            data: $scope.tweetStat,
            xkey: 'tweet',
            ykeys: ['maxTweetCount', 'averageTweetsPerDay'],
            labels: ['Maximum no. of tweets : ', 'Average no. of tweets/day'],
            parseTime: false,
            hideHover: 'auto',
            resize: true,
            stacked: true,
            barSizeRatio: 0.40
        });
        $scope.graphLoading = false;
    }

But now we have two graphs. One for showing variations in aggregation and the other for showing statistics. How do we manage them? Somehow we need to show them in the same page as this is a single page app. Also we need to avoid vertical scrolling as it would degrade both UI and UX. So we need to implement a switching mechanism. The user should be able to switch between the two graph views as per their wish. How to achieve that? Well, for this we maintain a global variable which will keep track of the current plot type. If the current graph type is aggregations then we call the function to plot aggregations otherwise we call the above mentioned function to plot statistics.

$scope.plotData = function() {
        $(".plot-data").html("");
        if ($scope.currentGraphType === "aggregations") {
            $scope.plotAggregationGraph();
        } else {
            $scope.plotStatGraph();
        }
    }

Lastly we integrate this state variable (currentGraphType) with the UI so that users can easily toggle between graph views with just a click.

<div class="switch" ng-click="toggle()">
                <span ng-if="queryRecords.length !== 0" class="glyphicon glyphicon-stats"></span>
              </div>

Important resources

Continue ReadingVisualising Tweet Statistics in MultiLinePlotter App for Loklak Apps

Developing MultiLinePlotter App for Loklak

MultiLinePlotter is a web application which uses Loklak API under the hood to plot multiple tweet aggregations related to different user provided query words in the same graph. The user can give several query words and multiple lines for different queries will be plotted in the same graph. In this way, users will be able to compare tweet distribution for various keywords and visualise the comparison. All the searched queries are shown under the search record section. Clicking on a record causes a dialogue box to pop up where the individual tweets related to the query word is displayed. Users can also remove a series from the plot dynamically by just pressing the Remove button beside the query word in record section. The app is presently hosted on Loklak apps site.

Related issue – https://github.com/fossasia/apps.loklak.org/issues/225

Getting started with the app

Let us delve into the working of the app. The app uses Loklak aggregation API to get the data.

A call to the API looks something like this:

http://api.loklak.org/api/search.json?q=fossasia&source=cache&count=0&fields=created_at

A small snippet of the aggregation returned by the above API request is shown below.

"aggregations": {"created_at": {
    "2017-07-03": 3,
    "2017-07-04": 9,
    "2017-07-05": 12,
    "2017-07-06": 8,
}}

The API provides a nice date v/s number of tweets aggregation. Now we need to plot this. For plotting Morris.js has been used. It is a lightweight javascript library for visualising data.

One of the main features of this app is addition and removal of multiple series from the graph dynamically. How do we achieve that? Well, this can be achieved by manipulating the morris.js data list whenever a new query is made. Let us understand this in steps.

At first, the data is fetched using angular HTTP service.

$http.jsonp('http://api.loklak.org/api/search.json?callback=JSON_CALLBACK',
            {params: {q: $scope.tweet, source: 'cache', count: '0', fields: 'created_at'}})
                .then(function (response) {
                    $scope.getData(response.data.aggregations.created_at);
                    $scope.plotData();
                    $scope.queryRecords.push($scope.tweet);
                });

Once we get the data, getData function is called and the aggregation data is passed to it. The query word is also stored in queryRecords list for future use.

In order to plot a line graph morris.js requires a data object which will contain the required values for a series. Given below is an example of such a data object.

data: [
    { x: '2006', a: 100, b: 90 },
    { x: '2007', a: 75,  b: 65 },
    { x: '2008', a: 50,  b: 40 },
    { x: '2009', a: 75,  b: 65 },
],

For every ‘x’, ‘a’ and ‘b’ will be plotted. Thus two lines will be drawn. Our app will also maintain a data list like the one shown above, however, in our case, the data objects will have a variable number of keys. One key will determine the ‘x’ value and other keys will determine the ordinates (number of tweets).

All the data objects present in the data list needs to be updated whenever a new search is done.

The getData function does this for us.

var value = $scope.tweet;
        for (date in aggregations) {
            var present = false;
            for (var i = 0; i < $scope.data.length; i++) {
                var item = $scope.data[i];
                if (item['day'] === date) {
                    item[[value]] = aggregations[date];
                    $scope.data[i] = item
                    present = true;
                    break;
                }
            }
            if (!present) {
                $scope.data.push({day: date, [value]: aggregations[date]});
            }
        }


The for loop in the above code snippet updates the global data list used by morris.js. It simply iterates over the dates in the aggregation, extracts the object corresponding to a particular date, adds the new query word as a key and, the number of tweets on that date as the value.If a date is not already present in the list, then it inserts a new object corresponding to the date and query word. Once our data list is updated, we are ready to redraw the graph with the updated data. This is done using plotData function. The plotData function simply checks the user selected graph type. If the selected type is aggregations then it calls plotAggregationGraph() to redraw the aggregations plot.

$scope.remove = function(record) {
        $scope.queryRecords = $scope.queryRecords.filter(function(e) {
            return e !== record });

        $scope.data.forEach(function(item) {
            delete item[record];
        });

        $scope.data = $scope.data.filter(function(item) {
            return Object.keys(item).length !== 1;
        });

        $scope.ykeys = $scope.ykeys.filter(function(item) {
            return item !== record;
        });

        $scope.labels = $scope.labels.filter(function(item) {
            return item !== record;
        });

        $scope.plotData();
}

The above function simply scans the data list, filters the objects which contains selected record as a key and removes them using filter method of javascript arrays. It also removes the corresponding labels and entries from labels and ykeys arrays. Finally, it once again calls plotData function to redraw the plot.

Given below is a sample plot generated by this app with the query words – google, android, microsoft, samsung.

 

Conclusion

This blog post explained how multiple series are plotted dynamically in the MultiLinePlotter app. Apart from aggregations plot it also plots tweet statistics like maximum tweets and average tweets containing a query word and visualises them using stacked bar chart. I will be discussing about them in my subsequent blogs.

Important resources

Continue ReadingDeveloping MultiLinePlotter App for Loklak

Integration of SUSI AI in Twitter

We will be making a Susi messenger bot on Twitter. The messenger bot will tweet back to your tweets and reply instantly when you chat with it. Feel free to tweet to the already made SUSI AI account (mentioning @SusiAI1 in it). Follow it, to have a personal chat.

Make a new account, which you want to use as the bot account. You can make one from sign up option from https://www.twitter.com.

Prerequisites

To create your account on -:
1. Twitter
2. Github
3. Heroku
4. Node js

Setup your own Messenger Bot

1. Make a new app here, to know the access token and other properties for our application. These properties will help us communicate with Twitter.

Click “modify the app permissions” link, as shown here:

Select the Read, Write and Access direct messages option:

Don’t forget to click the update settings button at the bottom.

Click the Generate My Access Token and Token Secret button.

3. Create a new heroku app here.

This app will accept the requests from Twitter and Susi api.

4. Create a config variable by switching to settings page of your app.
  
  The name of your first config variable should be HEROKU_URL and its value is the url address of the heroku app created by you.
 

The other config variables that need to be created will be these:
 

The corresponding names of these variables in the same order are:
  i) Access token
  ii) Access token secret
  iii) Consumer key
  iv) Consumer secret
  
We need to visit our app from here, the keys and access tokens tab will help us with the values of these variables.

  1. Let’s start with the code part of the integration of SUSI AI to Twitter. We will be using Node js to achieve this integration.

First we need to require some packages:

Now using the Twit module, we need to authenticate our requests, by using our environment variables as set up in step 4:

Now let’s make a user stream:

var stream = T.stream('user');

We will be using the capabilities of this stream, to catch events of getting tweeted or receiving a direct message by using:

stream.on('tweet', functionToBeCalledWhenTweeted);
stream.on('follow', functionToBeCalledWhenFollowed);
stream.on('direct_message', functionToBeCalledWhenDirectMessaged);

So, when a person tweets to our account like this:

We can catch it with ‘tweet’ event and execute a set of instructions:

stream.on('tweet', tweetEvent);

    function tweetEvent(eventMsg) {
        var replyto = eventMsg.in_reply_to_screen_name;     

       // to store the message tweeted excluding '@SusiAI1' substring
        var text = eventMsg.text.substring(9);

        // to store the name of the tweeter
        var from = eventMsg.user.screen_name;
        
        if (replyto === 'SusiAI1') {
            var queryUrl = 'http://api.asksusi.com/susi/chat.json?q=' + encodeURI(text);
            var message = '';
            request({
                url: queryUrl,
                json: true
            }, function (err, response, data) {
                if (!err && response.statusCode === 200) {
                    // fetching the answer from the data object returned
                                        message = data.answers[0].actions[0].expression + data;


                }
                else {
                    message = 'Oops, Looks like Susi is taking a break';    
                    console.log(err);
                }
                console.log(message);
                // If the message length is more than tweet limit
                if(message.length > 140){
                    tweetIt('@' + from + ' Sorry due to tweet word limit, I have sent you a personal message. Check inbox'+date);
                    sendMessage(from, message);
                }
                else{
                    tweetIt('@' + from + ' ' + message + date);
                }
            });
        }
    }
  • When we a person follows this SUSI AI account, we can thank him/her by making use of the “follow” event. Also, we need to follow him/her back, to enable personal chat between Susi and that person (according to the rules of twitter):
stream.on('follow',followed);

function followed(eventMsg) {
        console.log('Follow event !');
        var name = eventMsg.source.name;
        var screenName = eventMsg.source.screen_name;
        var user_id1 = eventMsg.source.id_str;

        // To follow back the person.
        T.post('friendships/create', {user_id : user_id1},  function(err, tweets, response){
            if (err) {
                console.log("Couldn't follow back!");
            } 
            else {    
tweetIt('@' + screenName + ' Thank you for following me! I followed you back, you can also direct message me now! ');
                console.log("Followed back!");
            } 
        }); 
    }

When we personally message this SUSI AI account

This can be handled through the ‘direct_message’ event:

stream.on('direct_message', reply);
    function reply(directMsg) {
        console.log('You receive a message!');
        // If its our own bot messaging, ignore it, as this can lead to an infinite loop when we answer a user.
        if (directMsg.direct_message.sender_screen_name === 'SusiAI1') {
            return;
        }

        // code to fetch the reply from SUSI API
        
        // reply the user with the SUSI API's message
        sendMessage(directMsg.direct_message.sender_screen_name, message);
        });
    }
  • The tweetIt and sendMessage function code can be seen from the repo code.

6. Connect the heroku app to the forked repository.

Connect the app to Github by selecting the name of this forked repository.

7. Deploy on development branch. If you intend to contribute, it is recommended to Enable Automatic Deploys.

Branch Deployment.

Successful Deployment.

  1. Visit your own personal account and tweet to this new bot account with your query and enjoy a tweet back from the bot account! Also, you can enjoy personal chatting with Susi.

    Feel free to play around with the already made SUSI AI account on twitter here. Follow it, to have a personal chat with it.

    Resources
Continue ReadingIntegration of SUSI AI in Twitter

Introducing Priority Kaizen Harvester for loklak server

In the previous blog post, I discussed the changes made in loklak’s Kaizen harvester so it could be extended and other harvesting strategies could be introduced. Those changes made it possible to introduce a new harvesting strategy as PriorityKaizen harvester which uses a priority queue to store the queries that are to be processed. In this blog post, I will be discussing the process through which this new harvesting strategy was introduced in loklak.

Background, motivation and approach

Before jumping into the changes, we first need to understand that why do we need this new harvesting strategy. Let us start by discussing the issue with the Kaizen harvester.

The produce consumer imbalance in Kaizen harvester

Kaizen uses a simple hash queue to store queries. When the queue is full, new queries are dropped. But numbers of queries produced after searching for one query is much higher than the consumption rate, i.e. the queries are bound to overflow and new queries that arrive would get dropped. (See loklak/loklak_server#1156)

Learnings from attempt to add blocking queue for queries

As a solution to this problem, I first tried to use a blocking queue to store the queries. In this implementation, the producers would get blocked before putting the queries in the queue if it is full and would wait until there is space for more. This way, we would have a good balance between consumers and producers as the consumers would be waiting until producers can free up space for them –

public class BlockingKaizenHarvester extends KaizenHarvester {
   ...
   public BlockingKaizenHarvester() {
       super(new KaizenQueries() {
           ...
           private BlockingQueue<String> queries = new ArrayBlockingQueue<>(maxSize);

           @Override
           public boolean addQuery(String query) {
               if (this.queries.contains(query)) {
                   return false;
               }
               try {
                   this.queries.offer(query, this.blockingTimeout, TimeUnit.SECONDS);
                   return true;
               } catch (InterruptedException e) {
                   DAO.severe("BlockingKaizen Couldn't add query: " + query, e);
                   return false;
               }
           }
           @Override
           public String getQuery() {
               try {
                   return this.queries.take();
               } catch (InterruptedException e) {
                   DAO.severe("BlockingKaizen Couldn't get any query", e);
                   return null;
               }
           }
           ...
       });
   }
}

[SOURCE, loklak/loklak_server#1210]

But there is an issue here. The consumers themselves are producers of even higher rate. When a search is performed, queries are requested to be appended to the KaizenQueries instance for the object (which here, would implement a blocking queue). Now let us consider the case where queue is full and a thread requests a query from the queue and scrapes data. Now when the scraping is finished, many new queries are requested to be inserted to most of them get blocked (because the queue would be full again after one query getting inserted).

Therefore, using a blocking queue in KaizenQueries is not a good thing to do.

Other considerations

After the failure of introducing the Blocking Kaizen harvester, we looked for other alternatives for storing queries. We came across multilevel queues, persistent disk queues and priority queues.

Multilevel queues sounded like a good idea at first where we would have multiple queues for storing queries. But eventually, this would just boil down to how much queue size are we allowing and the queries would eventually get dropped.

Persistent disk queues would allow us to store greater number of queries but the major disadvantage was lookup time. It would terribly slow to check if a query already exists in the disk queue when the queue is large. Also, since the queries would always increase practically, the disk queue would also go out of hand at some point in time.

So by now, we were clear that not dropping queries is not an alternative. So what we had to use the limited size queue smartly so that we do not drop queries that are important.

Solution: Priority Queue

So a good solution to our problem was a priority queue. We could assign a higher score to queries that come from more popular Tweets and they would go higher in the queue and do not drop off until we have even higher priority queried in the queue.

Assigning score to a Tweet

Score for a tweet was decided using the following formula –

α= 5* (retweet count)+(favourite count)

score=α/(α+10*exp(-0.01*α))

This equation generates a score between zero and one from the retweet and favourite count of a Tweet. This normalisation of score would ensure we do not assign an insanely large score to Tweets with a high retweet and favourite count. You can see the behaviour for the second mentioned equation here.

Graph?

Changes required in existing Kaizen harvester

To take a score into account, it became necessary to add an interface to also provide a score as a parameter to the addQuery() method in KaizenQueries. Also, not all queries can have a score associated with it, for example, if we add a query that would search for Tweets older than the oldest in the current timeline, giving it a score wouldn’t be possible as it would not be associated with a single Tweet. To tackle this, a default score of 0.5 was given to these queries –

public abstract class KaizenQueries {

   public boolean addQuery(String query) {
       return this.addQuery(query, 0.5);
   }

   public abstract boolean addQuery(String query, double score);
   ...
}

[SOURCE]

Defining appropriate KaizenQueries object

The KaizenQueries object for a priority queue had to define a wrapper class that would hold the query and its score together so that they could be inserted in a queue as a single object.

ScoreWrapper and comparator

The ScoreWrapper is a simple class that stores score and query object together –

private class ScoreWrapper {

   private double score;
   private String query;

   ScoreWrapper(String m, double score) {
       this.query = m;
       this.score = score;
   }

}

[SOURCE]

In order to define a way to sort the ScoreWrapper objects in the priority queue, we need to define a Comparator for it –

private Comparator<ScoreWrapper> scoreComparator = (scoreWrapper, t1) -> (int) (scoreWrapper.score - t1.score);

[SOURCE]

Putting things together

Now that we have all the ingredients to declare our priority queue, we can also declare the strategy to getQuery and putQuery in the corresponding KaizenQueries object –

public class PriorityKaizenHarvester extends KaizenHarvester {

   private static class PriorityKaizenQueries extends KaizenQueries {
       ...
       private Queue<ScoreWrapper> queue;
       private int maxSize;

       public PriorityKaizenQueries(int size) {
           this.maxSize = size;
           queue = new PriorityQueue<>(size, scoreComparator);
       }

       @Override
       public boolean addQuery(String query, double score) {
           ScoreWrapper sw = new ScoreWrapper(query, score);
           if (this.queue.contains(sw)) {
               return false;
           }
           try {
               this.queue.add(sw);
               return true;
           } catch (IllegalStateException e) {
               return false;
           }
       }

       @Override
       public String getQuery() {
           return this.queue.poll().query;
       }
       ...
}

[SOURCE]

Conclusion

In this blog post, I discussed the process in which PriorityKaizen harvester was introduced to loklak. This strategy is a flavour of Kaizen harvester which uses a priority queue to store queries that are to be processed. These changes were possible because of a previous patch which allowed extending of Kaizen harvester.

The changes were introduced in pull request loklak/loklak#1240 by @singhpratyush (me).

Resources

Continue ReadingIntroducing Priority Kaizen Harvester for loklak server

Fetching URL for Embedded Twitter Videos in loklak server

The primary web service that loklak scrapes is Twitter. Being a news and social networking service, Twitter allows its users to post videos directly to Twitter and they convey more thoughts than what text can. But for an automated scraper, getting the links is not a simple task.

Let us see that what were the problems we faced with videos and how we solved them in the loklak server project.

Previous setup and embedded videos

In the previous version of loklak server, the TwitterScraper searched for videos in 2 ways –

  1. Youtube links
  2. HTML5 video links

To fetch the video URL from HTML5 video, following snippet was used –

if ((p = input.indexOf("<source video-src")) >= 0 && input.indexOf("type=\"video/") > p) {
   String video_url = new prop(input, p, "video-src").value;
   videos.add
   continue;
}

Here, input is the current line from raw HTML that is being processed and prop is a class defined in loklak that is useful in parsing HTML attributes. So in this way, the HTML5 videos were extracted.

The Problem – Embedded videos

Though the previous setup had no issues, it was useless as Twitter embeds the videos in an iFrame and therefore, can’t be fetched using simple HTML5 tag extraction.

If we take the following Tweet for example,

the requested HTML from the search page contains video in following format –

<src="https://twitter.com/i/videos/tweet/881946694413422593?embed_source=clientlib&player_id=0&rpc_init=1" allowfullscreen="" id="player_tweet_881946694413422593" style="width: 100%; height: 100%; position: absolute; top: 0; left: 0;">

So we needed to come up with a better technique to get those videos.

Parsing video URL from iFrame

The <div> which contains video is marked with AdaptiveMedia-videoContainer class. So if a Tweet has an iFrame containing video, it will also have the mentioned class.

Also, the source of iFrame is of the form https://twitter.com/i/videos/tweet/{Tweet-ID}. So now we can programmatically go to any Tweet’s video and parse it to get results.

Extracting video URL from iFrame source

Now that we have the source of iFrame, we can easily get the video source using the following flow –

public final static Pattern videoURL = Pattern.compile("video_url\\\":\\\"(.*?)\\\"");

private static String[] fetchTwitterIframeVideos(String iframeURL) {
   // Read fron iframeURL line by line into BufferReader br
   while ((line = br.readLine()) != null ) {
       int index;
       if ((index = line.indexOf("data-config=")) >= 0) {
           String jsonEscHTML = (new prop(line, index, "data-config")).value;
           String jsonUnescHTML = HtmlEscape.unescapeHtml(jsonEscHTML);
           Matcher m = videoURL.matcher(jsonUnescHTML);
           if (!m.find()) {
               return new String[]{};
           }
           String url = m.group(1);
           url = url.replace("\\/", "/");  // Clean URL
           /*
            * Play with url and return results
            */
       }
   }
}

MP4 and M3U8 URLs

If we encounter mp4 URLs, we’re fine as it is the direct link to video. But if we encounter m3u8 URL, we need to process it further before we can actually get to the videos.

For Twitter, the hosted m3u8 videos contain link to further m3u8 videos which are of different resolution. These m3u8 videos again contain link to various .ts files that contain actual video in parts of 3 seconds length each to support better streaming experience on the web.

To resolve videos in such a setup, we need to recursively parse m3u8 files and collect all the .ts videos.

private static String[] extractM3u8(String url) {
   return extractM3u8(url, "https://video.twimg.com/");
}

private static String[] extractM3u8(String url, String baseURL) {
   // Read from baseURL + url line by line
   while ((line = br.readLine()) != null) {
       if (line.startsWith("#")) {  // Skip comments in m3u8
           continue;
       }
       String currentURL = (new URL(new URL(baseURL), line)).toString();
       if (currentURL.endsWith(".m3u8")) {
           String[] more = extractM3u8(currentURL, baseURL);  // Recursively add all
           Collections.addAll(links, more);
       } else {
           links.add(currentURL);
       }
   }
   return links.toArray(new String[links.size()]);
}

And then in fetchTwitterIframeVideos, we can return the all .ts URLs for the video –

if (url.endsWith(".mp4")) {
   return new String[]{url};
} else if (url.endsWith(".m3u8")) {
   return extractM3u8(url);
}

Putting things together

Finally, the TwitterScraper can discover the video links by tweaking a little –

if (input.indexOf("AdaptiveMedia-videoContainer") > 0) {
   // Fetch Tweet ID
   String tweetURL = props.get("tweetstatusurl").value;
   int slashIndex = tweetURL.lastIndexOf('/');
   if (slashIndex < 0) {
       continue;
   }
   String tweetID = tweetURL.substring(slashIndex + 1);
   String iframeURL = "https://twitter.com/i/videos/tweet/" + tweetID;
   String[] videoURLs = fetchTwitterIframeVideos(iframeURL);
   Collections.addAll(videos, videoURLs);
}

Conclusion

This blog post explained the process of extracting video URL from Twitter and the problem faced. The discussed change enabled loklak to extract and serve URLs to video for tweets. It was introduced in PR loklak/loklak_server#1193 by me (@singhpratyush).

The service was further enhanced to collect single mp4 link for videos (see PR loklak/loklak_server#1206), which is discussed in another blog post.

Resources

Continue ReadingFetching URL for Embedded Twitter Videos in loklak server