Developing LoklakWordCloud app for Loklak apps site

LoklakWordCloud app is an app to visualise data returned by loklak in form of a word cloud.

The app is presently hosted on Loklak apps site.

Word clouds provide a very simple, easy, yet interesting and effective way to analyse and visualise data. This app will allow users to create word cloud out of twitter data via Loklak API.

Presently the app is at its very early stage of development and more work is left to be done. The app consists of a input field where user can enter a query word and on pressing search button a word cloud will be generated using the words related to the query word entered.

Loklak API is used to fetch all the tweets which contain the query word entered by the user.

These tweets are processed to generate the word cloud.

Related issue: https://github.com/fossasia/apps.loklak.org/pull/279

Live app: http://apps.loklak.org/LoklakWordCloud/

Developing the app

The main challenge in developing this app is implementing its prime feature, that is, generating the word cloud. How do we get a dynamic word cloud which can be easily generated by the user based on the word he has entered? Well, here comes in Jqcloud. An awesome lightweight Jquery plugin for generating word clouds. All we need to do is provide list of words along with their weights.

Let us see step by step how this app (first version) works. First we require all the tweets which contain the entered word. For this we use Loklak search service. Once we get all the tweets, then we can parse the tweet body to create a list of words along with their frequency.

var url = "http://35.184.151.104/api/search.json?callback=JSON_CALLBACK&count=100&q=" + query;
        $http.jsonp(url)
            .then(function (response) {
                $scope.createWordCloudData(response.data.statuses);
                $scope.tweet = null;
            });

Once we have all the tweets, we need to extract the tweet texts and create a list of valid words. What are valid words? Well words like ‘the’, ‘is’, ‘a’, ‘for’, ‘of’, ‘then’, does not provide us with any important information and will not help us in doing any kind of analysis. So there is no use of including them in our word cloud. Such words are called stop words and we need to get rid of them. For this we are using a list of commonly used stop words. Such lists can be very easily found over the internet. Here is the list which we are using. Once we are able to extract the text from the tweets, we need to filter stop words and insert the valid words into a list.

 tweet = data[i];
            tweetWords = tweet.text.replace(", ", " ").split(" ");

            for (var j = 0; j < tweetWords.length; j++) {
                word = tweetWords[j];
                word = word.trim();
                if (word.startsWith("'") || word.startsWith('"') || word.startsWith("(") || word.startsWith("[")) {
                    word = word.substring(1);
                }
                if (word.endsWith("'") || word.endsWith('"') || word.endsWith(")") || word.endsWith("]") ||
                    word.endsWith("?") || word.endsWith(".")) {
                    word = word.substring(0, word.length - 1);
                }
                if (stopwords.indexOf(word.toLowerCase()) !== -1) {
                    continue;
                }
                if (word.startsWith("#") || word.startsWith("@")) {
                    continue;
                }
                if (word.startsWith("http") || word.startsWith("https")) {
                    continue;
                }
                $scope.filteredWords.push(word);
            }

What are we actually doing in the above snippet? We are simply iterating over each of the statuses returned by Loklak API. For each tweet, first we are splitting the text into words and then we are iterating over those words. For a given word we do a number of checks. First we check if the word begins or ends with a special character, for example quotation marks or brackets. If so we remove those character as it will cause trouble in calculating frequencies. Next we also check if the word is beginning with ‘#’ or [email protected] If it is true, then we discard such words as we are handling hashtags and mentions separately. Finally we check whether the word is a stop word or not. If it is a stop word then we discard it. If a word passes all the checks, we add it to our list of valid words.

Once we are done with the tweet bodies, next we need to handle hashtags and mentions.

tweet.hashtags.forEach(function (hashtag) {
                $scope.filteredWords.push("#" + hashtag);
            });

            tweet.mentions.forEach(function (mention) {
                $scope.filteredWords.push("@" + mention);
            });

The above code simply iterates over the hashtags and mentions and inserts them into the filteredWords list. We have handled hashtags and mentions separately so that we can apply filters in future.

Once we are done with generating list of valid words, we need to calculate weight for each of the word. Here weight is nothing but the number of times a particular word is present in the list. We calculate this using JavaScript object. We iterate over the list of valid words. If word is not present in the object (or dictionary as you wish to call it), we create a new key by the name of that word and set its value to one. If a word is already present as a key, then we simply increment its value by one.

for (var word in $scope.wordFreq) {
            $scope.wordCloudData.push({
                text: word,
                weight: $scope.wordFreq[word]
            });
        }

The above code snippet calculates the frequency of each word by the process mentioned above.

Now we are all set to generate our word cloud. We simply use Jqcloud’s interface to configure it with the words and their respective frequencies, provide a list of color codes for a color gradient, and set autoResize to true so that our word cloud resizes itself when the screen size changes.

$scope.generateWordCloud = function() {
        if ($scope.wordCloud === null) {
            $scope.wordCloud = $('.wordcloud').jQCloud($scope.wordCloudData, {
                colors: ["#D50000", "#FF5722", "#FF9800", "#4CAF50", "#8BC34A", "#4DB6AC", "#7986CB", "#5C6BC0", "#64B5F6"],
                fontSize: {
                    from: 0.06,
                    to: 0.01
                },
                autoResize: true
            });
        } else {
            $scope.wordCloud = $(".wordcloud").jQCloud('update', $scope.wordCloudData);
        }
    }

Whenever the user searches for a new word, we simply update the existing word cloud with the cloud of the new word.

Future roadmap

  • Make the words in the cloud clickable. On clicking a word, the cloud should get replaced by the selected word’s cloud.
  • Add filters for hashtags, mentions, date.
  • Add option for exporting the cloud to an image, so that user’s can also use this app as a tool to generate word clouds as images and save them.
  • Add a loader and error notification for invalid or empty input.

Important resources

  • View the app source code here.
  • Learn more about Loklak API here.
  • Learn more about Jqcloud here.
  • Learn more about AngularJS here.

Published by

Deepjyoti Mondal

A web and mobile application developer and an enthusiastic learner. I like trying out new technologies. JavaScript and Python are my favorites

Leave a Reply

Your email address will not be published. Required fields are marked *