Developing LoklakWordCloud app for Loklak apps site
LoklakWordCloud app is an app to visualise data returned by loklak in form of a word cloud. The app is presently hosted on Loklak apps site. Word clouds provide a very simple, easy, yet interesting and effective way to analyse and visualise data. This app will allow users to create word cloud out of twitter data via Loklak API. Presently the app is at its very early stage of development and more work is left to be done. The app consists of a input field where user can enter a query word and on pressing search button a word cloud will be generated using the words related to the query word entered. Loklak API is used to fetch all the tweets which contain the query word entered by the user. These tweets are processed to generate the word cloud. Related issue: https://github.com/fossasia/apps.loklak.org/pull/279 Live app: http://apps.loklak.org/LoklakWordCloud/ Developing the app The main challenge in developing this app is implementing its prime feature, that is, generating the word cloud. How do we get a dynamic word cloud which can be easily generated by the user based on the word he has entered? Well, here comes in Jqcloud. An awesome lightweight Jquery plugin for generating word clouds. All we need to do is provide list of words along with their weights. Let us see step by step how this app (first version) works. First we require all the tweets which contain the entered word. For this we use Loklak search service. Once we get all the tweets, then we can parse the tweet body to create a list of words along with their frequency. var url = "http://35.184.151.104/api/search.json?callback=JSON_CALLBACK&count=100&q=" + query; $http.jsonp(url) .then(function (response) { $scope.createWordCloudData(response.data.statuses); $scope.tweet = null; }); Once we have all the tweets, we need to extract the tweet texts and create a list of valid words. What are valid words? Well words like ‘the’, ‘is’, ‘a’, ‘for’, ‘of’, ‘then’, does not provide us with any important information and will not help us in doing any kind of analysis. So there is no use of including them in our word cloud. Such words are called stop words and we need to get rid of them. For this we are using a list of commonly used stop words. Such lists can be very easily found over the internet. Here is the list which we are using. Once we are able to extract the text from the tweets, we need to filter stop words and insert the valid words into a list. tweet = data[i]; tweetWords = tweet.text.replace(", ", " ").split(" "); for (var j = 0; j < tweetWords.length; j++) { word = tweetWords[j]; word = word.trim(); if (word.startsWith("'") || word.startsWith('"') || word.startsWith("(") || word.startsWith("[")) { word = word.substring(1); } if (word.endsWith("'") || word.endsWith('"') || word.endsWith(")") || word.endsWith("]") || word.endsWith("?") || word.endsWith(".")) { word = word.substring(0, word.length - 1); } if (stopwords.indexOf(word.toLowerCase()) !== -1) { continue; } if (word.startsWith("#") || word.startsWith("@")) { continue; } if (word.startsWith("http") || word.startsWith("https")) { continue; } $scope.filteredWords.push(word); }…
