Avoiding Nested Callbacks using RxJS in Loklak Scraper JS

Loklak Scraper JS, as suggested by the name, is a set of scrapers for social media websites written in NodeJS. One of the most common requirement while scraping is, there is a parent webpage which provides links for related child webpages. And the required data that needs to be scraped is present in both parent webpage and child webpages. For example, let’s say we want to scrape quora user profiles matching search query “Siddhant”. The matching profiles webpage for this example will be https://www.quora.com/search?q=Siddhant&type=profile which is the parent webpage, and the child webpages are links of each matched profiles.

Now, a simplistic approach is to first obtain the HTML of parent webpage and then synchronously fetch the HTML of child webpages and parse them to get the desired data. The problem with this approach is that, it is slower as it is synchronous.

A different approach can be using request-promise-native to implement the logic in asynchronous way. But, there are limitations with this approach. The HTML of child webpages that needs to be fetched can only be obtained after HTML of parent webpage is obtained and number of child webpages are dynamic. So, there is a request dependency between parent and child i.e. if only we have data from parent webpage we can extract data from child webpages. The code would look like this

request(parent_url)
   .then(data => {
       ...
       request(child_url)
           .then(data => {
               // again nesting of child urls
           })
           .catch(error => {

           });
   })
   .catch(error => {

   });

 

Firstly, with this approach there is callback hell. Horrible, isn’t it? And then we don’t know how many nested callbacks to use as the number of child webpages are dynamic.

The saviour: RxJS

The solution to our problem is reactive extensions in JavaScript. Using rxjs we can obtain the required data without callback hell and asynchronously!

The promise-request object of the parent webpage is obtained. Using this promise-request object an observable is generated by using Rx.Observable.fromPromise. flatmap operator is used to parse the HTML of the parent webpage and obtain the links of child webpages. Then map method is used transform the links to promise-request objects which are again transformed into observables. The returned value – HTML – from the resulting observables is parsed and accumulated using zip operator. Finally, the accumulated data is subscribed. This is implemented in getScrapedData method of Quora JS scraper.

getScrapedData(query, callback) {
   // observable from parent webpage
   Rx.Observable.fromPromise(this.getSearchQueryPromise(query))
     .flatMap((t, i) => { // t is html of parent webpage
       // request-promise object of child webpages
       let profileLinkPromises = this.getProfileLinkPromises(t);
       // request-promise object to observable transformation
       let obs = profileLinkPromises.map(elem => Rx.Observable.fromPromise(elem));

       // each Quora profile is parsed
       return Rx.Observable.zip( // accumulation of data from child webpages
         ...obs,
         (...profileLinkObservables) => {
           let scrapedProfiles = [];
           for (let i = 0; i < profileLinkObservables.length; i++) {
             let $ = cheerio.load(profileLinkObservables[i]);
             scrapedProfiles.push(this.scrape($));
           }
           return scrapedProfiles; // accumulated data returned
         }
       )
     })
     .subscribe( // desired data is subscribed
       scrapedData => callback({profiles: scrapedData}),
       error => callback(error)
     );
 }

 

Resources:

Posting Tweet from Loklak Wok Android

Loklak Wok Android is a peer harvester that posts collected tweets to the Loklak Server. Not only it is a peer harvester, but also provides users to post their tweets from the app. Images and location of the user can also be attached in the tweet. This blog explains

Adding Dependencies to the project

In app/build.gradle:

apply plugin: 'com.android.application'
apply plugin: 'me.tatarka.retrolambda'

android {
   ...
   packagingOptions {
       exclude 'META-INF/rxjava.properties'
   }
}

dependencies {
   ...
   compile 'com.google.code.gson:gson:2.8.1'

   compile 'com.squareup.retrofit2:retrofit:2.3.0'
   compile 'com.squareup.retrofit2:converter-gson:2.3.0'
   compile 'com.squareup.retrofit2:adapter-rxjava2:2.3.0'

   compile 'io.reactivex.rxjava2:rxjava:2.0.5'
   compile 'io.reactivex.rxjava2:rxandroid:2.0.1'
}

 

In build.gradle project level:

dependencies {
   classpath 'com.android.tools.build:gradle:2.3.3'
   classpath 'me.tatarka:gradle-retrolambda:3.2.0'
}

 

Implementation

User first authorize the application, so that they are able to post tweet from the app. For posting tweet statuses/update API endpoint of twitter is used and for attaching images with tweet media/upload API endpoint is used.

As, photos and location can be attached in a tweet, for Android Marshmallow and above we need to ask runtime permissions for camera, gallery and location. The related permissions are mentioned in Manifest file first

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" />
// for location
<uses-feature android:name="android.hardware.location.gps"/>
<uses-feature android:name="android.hardware.location.network"/>

 

If, the device is using an OS below Android Marshmallow, there will be no runtime permissions, the user will be asked permissions at the time of installing the app.

Now, runtime permissions are asked, if the user had already granted the permission the related activity (camera, gallery or location) is started.

For camera permissions, onClickCameraButton is called

@OnClick(R.id.camera)
public void onClickCameraButton() {
   int permission = ContextCompat.checkSelfPermission(
           getActivity(), Manifest.permission.CAMERA);
   if (isAndroidMarshmallowAndAbove && permission != PackageManager.PERMISSION_GRANTED) {
       String[] permissions = {
               Manifest.permission.CAMERA,
               Manifest.permission.WRITE_EXTERNAL_STORAGE,
               Manifest.permission.READ_EXTERNAL_STORAGE
       };
       requestPermissions(permissions, CAMERA_PERMISSION);
   } else {
       startCameraActivity();
   }
}

 

To start the camera activity if the permission is already granted, startCameraActivity method is called

private void startCameraActivity() {
   Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
   File dir = getActivity().getExternalFilesDir(Environment.DIRECTORY_PICTURES);
   mCapturedPhotoFile = new File(dir, createFileName());
   Uri capturedPhotoUri = getImageFileUri(mCapturedPhotoFile);
   intent.putExtra(MediaStore.EXTRA_OUTPUT, capturedPhotoUri);
   startActivityForResult(intent, REQUEST_CAPTURE_PHOTO);
}

 

If the user decides to save the photo clicked from camera activity, the photo should be saved by creating a file and its uri is required to display the saved photo. The filename is created using createFileName method

private String createFileName() {
   String timeStamp = new SimpleDateFormat("ddMMyyyy_HHmmss").format(new Date());
   return "JPEG_" + timeStamp + ".jpg";
}

 

and uri is obtained using getImageFileUri

private Uri getImageFileUri(File file) {
   if (Build.VERSION.SDK_INT < Build.VERSION_CODES.N) {
       return Uri.fromFile(file);
   } else {
       return FileProvider.getUriForFile(getActivity(), "org.loklak.android.provider", file);
   }
}

 

Similarly, for the gallery, onClickGalleryButton method is implemented to ask runtime permissions and launch gallery activity if the permission is already granted.

@OnClick(R.id.gallery)
public void onClickGalleryButton() {
   int permission = ContextCompat.checkSelfPermission(
           getActivity(), Manifest.permission.WRITE_EXTERNAL_STORAGE);
   if (isAndroidMarshmallowAndAbove && permission != PackageManager.PERMISSION_GRANTED) {
       String[] permissions = {
               Manifest.permission.WRITE_EXTERNAL_STORAGE,
               Manifest.permission.READ_EXTERNAL_STORAGE
       };
       requestPermissions(permissions, GALLERY_PERMISSION);
   } else {
       startGalleryActivity();
   }
}

 

For starting the gallery activity, startGalleryActivity is used

private void startGalleryActivity() {
   Intent intent = new Intent();
   intent.setType("image/*");
   intent.setAction(Intent.ACTION_GET_CONTENT);
   intent.putExtra(Intent.EXTRA_ALLOW_MULTIPLE, true);
   startActivityForResult(
           Intent.createChooser(intent, "Select images"), REQUEST_GALLERY_MEDIA_SELECTION);
}

 

And finally for location onClickAddLocationButton is implemented

@OnClick(R.id.location)
public void onClickAddLocationButton() {
   int permission = ContextCompat.checkSelfPermission(
           getActivity(), Manifest.permission.ACCESS_FINE_LOCATION);
   if (isAndroidMarshmallowAndAbove && permission != PackageManager.PERMISSION_GRANTED) {
       String[] permissions = {Manifest.permission.ACCESS_FINE_LOCATION};
       requestPermissions(permissions, LOCATION_PERMISSION);
   } else {
       getLatitudeLongitude();
   }
}

 

If, the permission is already granted getLatitudeLongitude is called. Using LocationManager last known location is tried to obtain, if there is no last known location, current location is requested using a LocationListener.

private void getLatitudeLongitude() {
   mLocationManager =
           (LocationManager) getActivity().getSystemService(Context.LOCATION_SERVICE);

   // last known location from network provider
   Location location = mLocationManager.getLastKnownLocation(LocationManager.NETWORK_PROVIDER);
   if (location == null) { // last known location from gps
       location = mLocationManager.getLastKnownLocation(LocationManager.GPS_PROVIDER);
   }

   if (location != null) { // last known loaction available
       mLatitude = location.getLatitude();
       mLongitude = location.getLongitude();
       setLocation();
   } else { // last known location not available
       mLocationListener = new TweetLocationListener();
       // current location requested
       mLocationManager.requestLocationUpdates("gps", 1000, 1000, mLocationListener);
   }
}

 

TweetLocationListener implements a LocationListener that provides the current location. If GPS is disabled, settings is launched so that user can enable GPS. This is implemented in onProviderDisabled callback of the listener.

private class TweetLocationListener implements LocationListener {

   @Override
   public void onLocationChanged(Location location) {
       mLatitude = location.getLatitude();
       mLongitude = location.getLongitude();
       setLocation();
   }

   @Override
   public void onStatusChanged(String s, int i, Bundle bundle) {

   }

   @Override
   public void onProviderEnabled(String s) {

   }

   @Override
   public void onProviderDisabled(String s) {
       Intent intent = new Intent(Settings.ACTION_LOCATION_SOURCE_SETTINGS);
       startActivity(intent);
   }
}

 

If the user is asked for permissions, onRequestPermissionResult callback is invoked, if the permission is granted then the respective activities are opened or latitude and longitude are obtained.

@Override
public void onRequestPermissionsResult(
       int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
   boolean isResultGranted = grantResults[0] == PackageManager.PERMISSION_GRANTED;
   switch (requestCode) {
       case CAMERA_PERMISSION:
           if (grantResults.length > 0 && isResultGranted) {
               startCameraActivity();
           }
           break;
       case GALLERY_PERMISSION:
           if (grantResults.length > 0 && isResultGranted) {
               startGalleryActivity();
           }
           break;
       case LOCATION_PERMISSION:
           if (grantResults.length > 0 && isResultGranted) {
               getLatitudeLongitude();
           }
   }
}

 

Since, the camera and gallery activities are started to obtain a result i.e. photo(s). So, onActivityResult callback is called

@Override
public void onActivityResult(int requestCode, int resultCode, Intent data) {
   switch (requestCode) {
       case REQUEST_CAPTURE_PHOTO:
           if (resultCode == Activity.RESULT_OK) {
               onSuccessfulCameraActivityResult();
           }
           break;
       case REQUEST_GALLERY_MEDIA_SELECTION:
           if (resultCode == Activity.RESULT_OK) {
               onSuccessfulGalleryActivityResult(data);
           }
           break;
       default:
           super.onActivityResult(requestCode, resultCode, data);
   }
}

 

If the result of Camera activity is success i.e. the image is saved by the user. The saved image is displayed in a RecyclerView in TweetPostingFragment. This is implemented in onSuccessfulCameraActivityResult mehtod

private void onSuccessfulCameraActivityResult() {
   tweetMultimediaContainer.setVisibility(View.VISIBLE);
   Bitmap bitmap = BitmapFactory.decodeFile(mCapturedPhotoFile.getAbsolutePath());
   mTweetMediaAdapter.clearAdapter();
   mTweetMediaAdapter.addBitmap(bitmap);
}

 

For a gallery activity, if a single image is selected then the uri of image can be obtained using getData method of an Intent. If multiple images are selected, the uri of images are stored in ClipData. After uris of images are obtained, it is checked if more than 4 images are selected as Twitter allows at most 4 images in a tweet. If more than 4 images are selected than the uris of extra images are removed. Using the uris of the images, the file is obtained and then from file Bitmap is obtained which is displayed in RecyclerView. This is implemented in onSuccessfulGalleryActivityResult

private void onSuccessfulGalleryActivityResult(Intent intent) {
   tweetMultimediaContainer.setVisibility(View.VISIBLE);
   Context context = getActivity();

   // get uris of selected images
   ClipData clipData = intent.getClipData();
   List<Uri> uris = new ArrayList<>();
   if (clipData != null) {
       for (int i = 0; i < clipData.getItemCount(); i++) {
           ClipData.Item item = clipData.getItemAt(i);
           uris.add(item.getUri());
       }
   } else {
       uris.add(intent.getData());
   }

   // remove of more than 4 images
   int numberOfSelectedImages = uris.size();
   if (numberOfSelectedImages > 4) {
       while (numberOfSelectedImages-- > 4) {
           uris.remove(numberOfSelectedImages);
       }
       Utility.displayToast(mToast, context, moreImagesMessage);
   }

   // get bitmap from uris of images
   List<Bitmap> bitmaps = new ArrayList<>();
   for (Uri uri : uris) {
       String filePath = FileUtils.getPath(context, uri);
       Bitmap bitmap = BitmapFactory.decodeFile(filePath);
       bitmaps.add(bitmap);
   }

   // display images in RecyclerView
   mTweetMediaAdapter.setBitmapList(bitmaps);
}

 

Now, to post images with tweet, first the ID of the image needs to be obtained using media/upload API endpoint, a multipart post request and then the obtained ID(s) is passed as the value of “media_ids” in statuses/update API endpoint. Since, there can be more than one image, a single observable is created for each image. The bitmap is converted to raw bytes for the multipart post request. As the process includes a network request and converting bitmap to bytes – a heavy resource consuming task which shouldn’t be on the main thread -, so an observable is created for the same as a result of which the tasks are performed concurrently i.e. in a separate thread.

private Observable<String> getImageId(Bitmap bitmap) {
   return Observable
           .defer(() -> {
               // convert bitmap to bytes
               ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
               bitmap.compress(Bitmap.CompressFormat.JPEG, 100, byteArrayOutputStream);
               byte[] bytes = byteArrayOutputStream.toByteArray();
               RequestBody mediaBinary = RequestBody.create(MultipartBody.FORM, bytes);
               return Observable.just(mediaBinary);
           })
           .flatMap(mediaBinary -> mTwitterMediaApi.getMediaId(mediaBinary, null))
           .flatMap(mediaUpload -> Observable.just(mediaUpload.getMediaIdString()))
           .subscribeOn(Schedulers.newThread());
}

 

The tweet is posted when the “Tweet” button is clicked by invoking onClickTweetPostButton mehtod

@OnClick(R.id.tweet_post_button)
public void onClickTweetPostButton() {
   String status = tweetPostEditText.getText().toString();

   List<Bitmap> bitmaps = mTweetMediaAdapter.getBitmapList();
   List<Observable<String>> mediaIdObservables = new ArrayList<>();
   for (Bitmap bitmap : bitmaps) { // observables for images is created
       mediaIdObservables.add(getImageId(bitmap));
   }

   if (mediaIdObservables.size() > 0) {
       // Post tweet with image
       postImageAndTextTweet(mediaIdObservables, status);
   } else if (status.length() > 0) {
       // Post text only tweet
       postTextOnlyTweet(status);
   } else {
       Utility.displayToast(mToast, getActivity(), tweetEmptyMessage);
   }
}

 

Tweet containing images are posted by calling postImageAndTextTweet, once the tweet data is obtained, the data is cross posted to loklak server. The image IDs are obtained concurrently by using the zip operator.

private void postImageAndTextTweet(List<Observable<String>> imageIdObservables, String status) {
   mProgressDialog.show();
   ConnectableObservable<StatusUpdate> observable = Observable.zip(
           imageIdObservables,
           mediaIdArray -> {
               String mediaIds = "";
               for (Object mediaId : mediaIdArray) {
                   mediaIds = mediaIds + String.valueOf(mediaId) + ",";
               }
               return mediaIds.substring(0, mediaIds.length() - 1);
           })
           .flatMap(imageIds -> mTwitterApi.postTweet(status, imageIds, mLatitude, mLongitude))
           .subscribeOn(Schedulers.io())
           .publish();

   Disposable postingDisposable = observable
           .subscribeOn(Schedulers.io())
           .observeOn(AndroidSchedulers.mainThread())
           .subscribe(this::onSuccessfulTweetPosting, this::onErrorTweetPosting);
   mCompositeDisposable.add(postingDisposable);

   // cross posting to loklak server   
   Disposable crossPostingDisposable = observable
           .flatMap(this::pushTweetToLoklak)
           .subscribeOn(Schedulers.io())
           .observeOn(AndroidSchedulers.mainThread())
           .subscribe(
                   push -> {},
                   t -> Log.e(LOG_TAG, "Cross posting failed: " + t.toString())
           );
   mCompositeDisposable.add(crossPostingDisposable);

   Disposable publishDisposable = observable.connect();
   mCompositeDisposable.add(publishDisposable);
}

 

In case of only text tweets, the text is obtained from editText and mediaIds are passed as null. And once the tweet data is obtained it is cross posted to loklak_server. This is executed by calling postTextOnlyTweet

private void postTextOnlyTweet(String status) {
   mProgressDialog.show();
   ConnectableObservable<StatusUpdate> observable =
           mTwitterApi.postTweet(status, null, mLatitude, mLongitude)
           .subscribeOn(Schedulers.io())
           .publish();

   Disposable postingDisposable = observable
           .subscribeOn(Schedulers.io())
           .observeOn(AndroidSchedulers.mainThread())
           .subscribe(this::onSuccessfulTweetPosting, this::onErrorTweetPosting);
   mCompositeDisposable.add(postingDisposable);


   // cross posting to loklak server
   Disposable crossPostingDisposable = observable
           .flatMap(this::pushTweetToLoklak)
           .subscribeOn(Schedulers.io())
           .observeOn(AndroidSchedulers.mainThread())
           .subscribe(
                   push -> Log.e(LOG_TAG, push.getStatus()),
                   t -> Log.e(LOG_TAG, "Cross posting failed: " + t.toString())
           );
   mCompositeDisposable.add(crossPostingDisposable);

   Disposable publishDisposable = observable.connect();
   mCompositeDisposable.add(publishDisposable);
}

 

Resources

Optimising Docker Images for loklak Server

The loklak server is in a process of moving to Kubernetes. In order to do so, we needed to have different Docker images that suit these deployments. In this blog post, I will be discussing the process through which I optimised the size of Docker image for these deployments.

Initial Image

The image that I started with used Ubuntu as base. It installed all the components needed and then modified the configurations as required –

FROM ubuntu:latest

# Env Vars
ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
ENV DEBIAN_FRONTEND noninteractive

WORKDIR /loklak_server

RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y git openjdk-8-jdk
RUN git clone https://github.com/loklak/loklak_server.git /loklak_server
RUN git checkout development
RUN ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport
RUN sed -i.bak 's/^\(port.http=\).*/\180/' conf/config.properties
... # More configurations
RUN echo "while true; do sleep 10;done" >> bin/start.sh

# Start
CMD ["bin/start.sh", "-Idn"]

The size of images built using this Dockerfile was quite huge –

REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE

loklak_server       latest              a92f506b360d        About a minute ago   1.114 GB

ubuntu              latest              ccc7a11d65b1        3 days ago           120.1 MB

But since this size is not acceptable, we needed to reduce it.

Moving to Apline

Alpine Linux is an extremely lightweight Linux distro, built mainly for the container environment. Its size is so tiny that it hardly puts any impact on the overall size of images. So, I replaced Ubuntu with Alpine –

FROM alpine:latest

...
RUN apk update
RUN apk add git openjdk8 bash
...

And now we had much smaller images –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      668.8 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

As we can see that due to no caching and small size of Alpine, the image size is reduced to almost half the original.

Reducing Content Size

There are many things in a project which are no longer needed while running the project, like the .git folder (which is huge in case of loklak) –

$ du -sh loklak_server/.git
236M loklak_server/.git

We can remove such files from the Docker image and save a lot of space –

rm -rf .[^.] .??*

Optimizing Number of Layers

The number of layers also affect the size of the image. More the number of layers, more will be the size of image. In the Dockerfile, we can club together the RUN commands for lower number of images.

RUN apk update && apk add openjdk8 git bash && \
  git clone https://github.com/loklak/loklak_server.git /loklak_server && \
  ...

After this, the effective size is again reduced by a major factor –

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

loklak_server       latest              54b507ee9187        17 seconds ago      422.3 MB

alpine              latest              7328f6f8b418        6 weeks ago         3.966 MB

Conclusion

In this blog post, I discussed the process of optimising the size of Dockerfile for Kubernetes deployments of loklak server. The size was reduced to 426 MB from 1.234 GB and this provided much faster push/pull time for Docker images, and therefore, faster updates for Kubernetes deployments.

Resources

Working of One Click Deployment Buttons in loklak

Today’s topic is deployment. It’s called one-click deployment for a reason: Developers are lazy. It’s hard to do less than clicking on one button, so that’s our goal to make use of one click button in loklak.

For one click buttons we only need a central build server, which is our loklak_server. Everything written here was based on Apache ant, but later on ant build was deprecated and loklak server started to use gradle build. We wanted to make the process of provisioning and setting up a complete infrastructure of your own, from server to continuous integration tasks, as easy as possible. These button allows you to do all of that in one click.

How does it work?

You can see the one click buttons in the README page of loklak_server repository.

These repositories may include a different files like scalingo.json for scalingo, docker-compose.yml and docker-cloud.yml for docker cloud etc files at their root, allowing them to define a few things like a name, description, logo and build environment (Gradle build in the case of loklak server). Once you’ve clicked on any of the buttons, you will be redirected to respective apps and prompted with this information for you to review before confirming the fork.

This will effectively fork the repository in your account. Once the repo is ready, you can click on it. You will then be asked to “activate” or “deploy” your branch, allowing it to provision actual servers and run tasks. At the same time, you will be asked to review and potentially modify a few variables that were defined in the predefined files (for eg: app.json for heroku) of the apps. These are usually things like the Git URL of the repo for loklak, or some of the details related to the cloud provider you want to use (eg: Digital Ocean).

Once you confirmed this last step, your branch i.e., most probably master branch of loklak server repo is activated and the button will start provisioning and configuring your servers, along with the tasks which may allow you to build and deploy your app. In most of the cases, you can go to the tasks/setup section and run the build task that will fetch loklak server’s code, build it and deploy it on your server, all configurations included and will get a public IP.

What’s next

In loklak we are also introducing new one click “AZURE” button, then the users can also start deploying loklak in azure platform.

Resources

Implementing 3 legged Authorization in Loklak Wok Android for Twitter

Loklak Wok Android is a peer harvester that posts collected tweets to the Loklak Server. Not only it is a peer harvester, but also provides users to post their tweets from the app. Posting tweets from the app requires users to authorize the Loklak Wok app, the client app created https://apps.twitter.com/ . This blog explains in detail about the authorization process.

Adding Dependencies to the project

In app/build.gradle:

apply plugin: 'com.android.application'
apply plugin: 'me.tatarka.retrolambda'

android {
   ...
   packagingOptions {
       exclude 'META-INF/rxjava.properties'
   }
}

dependencies {
   ...
   compile 'com.google.code.gson:gson:2.8.1'

   compile 'com.squareup.retrofit2:retrofit:2.3.0'
   compile 'com.squareup.retrofit2:converter-gson:2.3.0'
   compile 'com.squareup.retrofit2:adapter-rxjava2:2.3.0'

   compile 'io.reactivex.rxjava2:rxjava:2.0.5'
   compile 'io.reactivex.rxjava2:rxandroid:2.0.1'
}

 

In build.gradle project level:

dependencies {
   classpath 'com.android.tools.build:gradle:2.3.3'
   classpath 'me.tatarka:gradle-retrolambda:3.2.0'
}

 

Steps of Authorization

Step 1: Create client app in Twitter

Create a twitter client app at https://apps.twitter.com/. Provide the mandatory entries and also Callback url (would be used in next steps). Then go to “Keys and Access Token” and save your consumer key and consumer secret. In case you want to use Twitter API for yourself, click on “Create my access token”, which provides access token and access token secret.

Step 2: Obtaining a request token

Using the “consumer key” and “consumer secret” request token is obtained by sending a POST request to oauth/request_token. As Twitter API are Oauth1 based the sent request needs to be signed by generating oauth_signature. The oauth_signature is generated by intercepting the network request sent by retrofit rest API client, the oauth interceptor used in Loklak Wok Android is a modified version of this snippet. The retrofit TwitterAPI interface is defined

public interface TwitterAPI {

   String BASE_URL = "https://api.twitter.com/";

   @POST("/oauth/request_token")
   Observable<ResponseBody> getRequestToken();

   @FormUrlEncoded
   @POST("/oauth/access_token")
   Observable<ResponseBody> getAccessTokenAndSecret(@Field("oauth_verifier") String oauthVerifier);
}

 

And the retrofit REST client is implemented in TwitterRestClient. createTwitterAPIWithoutAccessToken method returns a twitter API client which can be called without providing access keys, this is used as we don’t have access tokens right now.

public static TwitterAPI createTwitterAPIWithoutAccessToken() {
   if (sWithoutAccessTokenRetrofit == null) {
       sLoggingInterceptor.setLevel(HttpLoggingInterceptor.Level.BODY);
       // uncomment to debug network requests
       // sWithoutAccessTokenClient.addInterceptor(sLoggingInterceptor);
       sWithoutAccessTokenRetrofit = sRetrofitBuilder
               .client(sWithoutAccessTokenClient.build()).build();
   }
   return sWithoutAccessTokenRetrofit.create(TwitterAPI.class);
}

 

So, getRequestToken method is used to obtain the request token, if the request is successful oauth_token is returned.

@OnClick(R.id.twitter_authorize)
public void onClickTwitterAuthorizeButton(View view) {
   mTwitterApi.getRequestToken()
           .subscribeOn(Schedulers.io())
           .observeOn(AndroidSchedulers.mainThread())
           .subscribe(this::parseRequestTokenResponse, this::onFetchRequestTokenError);
}

 

Step 3: Redirecting the user

Using the oauth_token obtained in Step 2, the user is redirected to login page using WebView.

private void setAuthorizationView() {
   ...
   webView.setVisibility(View.VISIBLE);
   webView.loadUrl(mAuthorizationUrl);
}

 

A WebView client is created by extending WebViewClient, this is used to keep track of which webpage is opened by overriding shouldOverrideUrlLoading.

@Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
   if (url.contains("github")) {
       String[] tokenAndVerifier = url.split("&");
       mOAuthVerifier = tokenAndVerifier[1].substring(tokenAndVerifier[1].indexOf('=') + 1);
       getAccessTokenAndSecret();
       return true;
   }
   return false;
}

 

As the link provided in callback url while creating our twitter app is a github page. The WebViewClient checks if it is a github page or not. If yes, then it parses the oauth_verifier from the github url.

Step 4: Converting the request token to an access token

A new rest client is created using the access token obtained in step 2, as implemented in createTwitterAPIWithAccessToken method.

public static TwitterAPI createTwitterAPIWithAccessToken(String token) {
   TwitterOAuthInterceptor withAccessTokenInterceptor =
           sInterceptorBuilder.accessToken(token).accessSecret("").build();
   OkHttpClient withAccessTokenClient = new OkHttpClient.Builder()
           .addInterceptor(withAccessTokenInterceptor)
           //.addInterceptor(loggingInterceptor) // uncomment to debug network requests
           .build();
   Retrofit withAccessTokenRetrofit = sRetrofitBuilder.client(withAccessTokenClient).build();
   return withAccessTokenRetrofit.create(TwitterAPI.class);
}

 

Now, to obtain access token and access token secret oauth_verifier obtained in step 3 is passed as a parameter to getAccessTokenAndSecret method defined in TwitterAPI interface which calls oauth/access_token endpoint from the rest client created above. This is implemented in getAccessTokenAndSecret method of WebViewClient class

private void getAccessTokenAndSecret() {
   mTwitterApi = TwitterRestClient.createTwitterAPIWithAccessToken(mOauthToken);
   mTwitterApi.getAccessTokenAndSecret(mOAuthVerifier)
           .flatMap(this::saveAccessTokenAndSecret)
           ....
}

 

Finally the obtained access_token and access_token_secret is saved in SharedPreference so that it can be used to call other Twitter API endpoints as in saveAccessTokenAndSecret

private Observable<Integer> saveAccessTokenAndSecret(ResponseBody responseBody)
       throws IOException {
   String[] responseValues = responseBody.string().split("&");

   String token = responseValues[0].substring(responseValues[0].indexOf("=") + 1);
   SharedPrefUtil.setSharedPrefString(getActivity(), OAUTH_ACCESS_TOKEN_KEY, token);
   mOauthToken = token; // here access_token that would be used for API calls

   String tokenSecret = responseValues[1].substring(responseValues[1].indexOf("=") + 1);
   SharedPrefUtil.setSharedPrefString(
           getActivity(), OAUTH_ACCESS_TOKEN_SECRET_KEY, tokenSecret);
   mOauthTokenSecret = tokenSecret;
   return Observable.just(1);
}

 

Resources:

Route Based Chunking in Loklak Search

The loklak search application running at loklak.org is growing in size as the features are being added into the application, this growth is a linear one, and traditional SPA, tend to ship all the code is required to run the application in one pass, as a single monolithic JavaScript file, along with the index.html. This approach is suitable for the application with few pages which are frequently used, and have context switching between those logical pages at a high rate and almost simultaneously as the application loads.

But generally, only a fraction of code is what is accessed most frequently by most users, so as the application size grows it does not make sense to include all the code for the entire application at the first request, as there is always some code in the application, some views, are rarely accessed. The loading of such part of the application can be delayed until they are accessed in the application. The angular router provides an easy way to set up such system and is used in the latest version of loklak search.

The technique which is used here is to load the content according to the route. This makes sure only the route which is viewed is loaded on the initial load, and subsequent loading is done at the runtime as and when required.

Old setup for baseline

Here are the compiled file sizes, of the project without the chunking the application. Now as we can see that the file sizes are huge, especially the vendor bundle, which is of 5.4M and main bundle which is about 0.5M now, these are the files which are loaded on the first load and due to their huge sizes, the first paint of the application suffers, to a great extent. These numbers will act as a baseline upon which we will measure the impact of route based chunking.

Setup for route based chunking

The setup for route based chunking is fairly simple, this is our routing configuration, the part of the modules which we want to lazy load are to be passed as loadChildren attribute of the route, this attribute is a string which is a path of the feature module which, and part after the hash symbol is the actual class name of the module, in that file. This setup enables the router to load that module lazily when accessed by the user.

const routes: Routes = [
{
path: '',
pathMatch: 'full',
loadChildren: './home/home.module#HomeModule',
data: { preload: true }

},
{
path: 'about',
loadChildren: './about/about.module#AboutModule'
},

{
path: 'contact',
loadChildren: './contact/contact.module#ContactModule'
},

{
path: 'search',
loadChildren: './feed/feed.module#FeedModule',
data: { preload: true }
},
{
path: 'terms',

loadChildren: './terms/terms.module#TermsModule'
},
{
path: 'wall',
loadChildren: './media-wall/media-wall.module#MediaWallModule'
}
];

Preloading of some routes

As we can see that in two of the configurations above, there is a data attribute, on which preload: true attribute is specified. Sometimes we need to preload some part of theapplication, which we know we will access, soon enough. So angular also enables us to set up our own preloading strategy to preload some critical parts of the application, which we know are going to be accessed. In our case, Home and Feed module are the core parts of the application, and we can be sure that, if someone has to use our application, these two modules need to be loaded. Defining the preloading strategy is also really simple, it is a class which implements PreloadingStrategy interface, and have a preload method, this method receives the route and load function as an argument, and this preload method either returns the load() observable or null if preload is set to true.

export class CustomPreloadStrategy implements PreloadingStrategy {
preload(route: Route, load: Function): Observable<any> {
return route.data && route.data.preload ? load() : of(null);
}
}

Results of route based chunking

The results of route based chunking are the 50% reduction in the file size of vendor bundle and 70% reduction in the file size of the main bundle this provides the edge which every application needs to perform well at the load time, as unnecessary bytes are not at all loaded until required.

Resources

Lazy Loading Images in Loklak Search

In last blog post, I discussed the basic Web API’s which helps us to create the lazy image loader component. I also discussed the structure which is used in the application, to load the images lazily. The core idea is to wrap the <img> element in a wrapper, <app-lazy-img> element. This enables us the detection of the element in the viewport and corresponding loading only if the image is present in the viewport.

In this blog post, I will be discussing the implementation details about how this is achieved in Loklak search in an optimized manner.

The logic for lazy loading of images in the application is divided into a Component and a corresponding Service. The reason for this splitting of logic will be explained as we discuss the core parts of the code for this feature.

Detecting the Intersection with Viewport

The lazy image service is a service for the lazy image component which is registered at the by the modules which intend to use this app lazy image component. The task of this service is to register the elements with the intersection observer, and, then emit an event when the element comes in the viewport, which the element can react on and then use the other methods of services to actually fetch the image.

@Injectable()
export class LazyImgService {
private intersectionObserver: IntersectionObserver
= new IntersectionObserver(this.observerCallback.bind(this), { rootMargin: '50% 50%' });
private elementSubscriberMap: Map<Element, Subscriber<boolean>>
= new Map<Element, Subscriber<boolean>>();
}

The service has two member attributes, one is IntersectionObserver, and the other is a Map which stores the the reference of the subscribers of this intersection observer. This reference is then later used to emit the event when the element comes in viewport. The rootMargin of the intersection observer is set to 50% this makes sure that when the element is 50% away from the viewport.

The obvserve public method of the service, takes an element and pass it to intersection observer to observe, also put the element in the subscriber map.

public observe(element: Element): Observable<boolean> {
const observable: Observable<boolean> = new Observable<boolean>(subscriber => {
this.elementSubscriberMap.set(element, subscriber);
});
this.intersectionObserver.observe(element);
return observable;
}

Then there is the observer callback, this method, as an argument receives all the objects intersecting the root of the observer, when this callback is fired, we find all the intersecting elements and emit the intersection event. Indicating that the element is nearby the viewport and this is the time to load it.

private observerCallback(entries: IntersectionObserverEntry[], observer: IntersectionObserver) {
entries.forEach(entry => {
if (this.elementSubscriberMap.has(entry.target)) {
if (entry.intersectionRatio > 0) {
const subscriber = this.elementSubscriberMap.get(entry.target);
subscriber.next(true);
this.elementSubscriberMap.delete(entry.target);
}
}
});
}

Now, our LazyImgComponent enables us to uses this service to register its element, with the intersection observer and then reacting to it later, when the event is emitted. This method sets up the IO, to load the image, and subscribes to the event emittes by the service and eventually calls the loadImage method when the element intersects with the viewport.

private setupIntersectionObserver() {
this.lazyImgService
.observe(this.elementRef.nativeElement)
.subscribe(value => {
if (value) {
this.loadImage();
}
});
}

Loading and rendering the image

Our lazy image service has another public API method fetch to fetch the image resource, this method returns an observable, which on successful fetching of image emits a Base64 image string.

public fetch(resource: string): Observable<string> {
return new Observable<string>(subscriber => {
fetch(resource)
.then(this.processStatus)
.then(this.getBufferResponse)
.then(this.arrayBufferToBase64)
.then(strBuffer => {
subscriber.next(strBuffer);
subscriber.complete();
})
.catch((error) => {
subscriber.error(error);
subscriber.complete();
});
});
}

The intermediate promise then chain is for converting the raw response buffer to a Base64 string, this string is then emited as the observable emmision. The component then subscribes to this fetch Observable, when the load image method is called.

private loadImage() {
this.isLoading = true;
this.lazyImgService
.fetch(this.src)
.subscribe(this.handleResponse.bind(this), this.handleError.bind(this));
}

The handler methods for the response and errors then contain the code to handle the effects of loading of results, ie. rendering the image inside the img element. The intresting thing to note here is, if we give the Base64 string as the src attribute of an img tag, instead of resource path then also it renders the image properly.

private handleResponse(imageStr: string) {
const base64Flag = `data:image/${this.imageType};base64,`;
this.elementRef.nativeElement.querySelector('img').src = base64Flag + imageStr;
}

And this completes our workflow of the app-lazy-img and gives us, a robust lazy image loader, and also is compliant with accessibility guidelines, including all the necessary attributes like, title, width, height etc. for the generation of proper accessibility tree. This technique can be extended to any level, and is more or less platform and framework independent, as this relies solely on Web Standards API’s. This is an optimized solution, as at a time only one intersection observer is active on a page and is seeing all the images, rather than per component instance based intersection observers which can be a performane bottleneck in low memory devices.

Resources and Links

  • Intersection observer API
  • Intersection Observer polyfill for the browsers which don’t support Intersection Observer
  • Fetch API documentation
  • Fetch API polyfill for the browsers which don’t support fetch.
  • Loklak Search Repo

Enhancing LoklakWordCloud app present on Loklak apps site

LoklakWordCloud app is presently hosted on loklak apps site. Before moving into the content of this blog, let us get a brief overview of the app. What does the app do? The app generates a word cloud using twitter data returned by loklak based on the query word provided by the user. The user enters a word in the input field and presses the search button. After that a word cloud is created using the content (text body, hashtags and mentioned) of the various tweets which contains the user provided query word.

In my previous post I wrote about creating the basic functional app. In this post I will be describing the next steps that have been implemented in the app.

Making the word cloud clickable

This is one of the most important and interesting features added to the app. The words in the cloud are now clickable.Whenever an user clicks on a word present in the cloud, the cloud is replaced by the word cloud of that selected word. How do we achieve this behaviour? Well, for this we use Jqcloud’s handler feature. While creating the list of objects for each word and its frequency, we also specify a handler corresponding to each of the word. The handler is supposed to handle a click event. Whenever a click event occurs, we set the value of $scope.tweet to the selected word and invoke the search function, which calls the loklak API and regenerates the word cloud.

for (var word in $scope.wordFreq) {
            $scope.wordCloudData.push({
                text: word,
                weight: $scope.wordFreq[word],
                handlers: {
                    click: function(e) {
                        $scope.tweet = e.target.textContent;
                        $scope.search();
                    }
                }
            });
        }

As it can be seen in the above snippet, handlers is simply an JavaScript object, which takes a function for the click event. In the function we pass the word selected as value of the tweet variable and call search method.

Adding filters to the app

Previously the app generated word cloud using the entire tweet content, that is, hashtags, mentions and tweet body. Thus the app was not flexible. User was not able to decide on which field he wants his word cloud to be generated. User might want to generate his  word cloud using only the hashtags or the mentions or simply the tweet body. In order to make this possible, filters have been introduced. Now we have filters for hashtags, mentions, tweet body and date.

<div class="col-md-6 tweet-filters">
              <strong>Filters</strong>
              <hr>
              <div class="filters">
                <label class="checkbox-inline"><input type="checkbox" value="" ng-model="hashtags">Hashtags</label>
                <label class="checkbox-inline"><input type="checkbox" value="" ng-model="mentions">Mentions</label>
                <label class="checkbox-inline"><input type="checkbox" value="" ng-model="tweetbody">Tweet body</label>
              </div>
              <div class="filter-all">
                <span class="select-all" ng-click="selectAll()"> Select all </span>
              </div>
            </div>

We have used checkboxes for the individual filters and have kept an option to select all the filters at once. Next we require to hook this HTML to AngularJS code to make the filters functional.

if ($scope.hashtags) {
                tweet.hashtags.forEach(function (hashtag) {
                    $scope.filteredWords.push("#" + hashtag);
                });
            }

            if ($scope.mentions) {
                tweet.mentions.forEach(function (mention) {
                    $scope.filteredWords.push("@" + mention);
                });
            }

In the above snippet, before adding the hashtags to the list of filtered words, we first make sure that the checkbox for hashtags is selected. Once we find out the the variable bound to the hashtags checkbox is true, we proceed further and add the hashtags associated with a given tweet to the list of filteredWords. The same strategy is applied for both mentions (shown in the snippet) and tweet bodies.

Adding error notification

Next, we handle certain errors to notify the users that there is problem in their input. Such cases include empty input. If user provides empty input then we notify him or her and break the search. Next we check whether From date is before To date or not. If From date is after To date then we notify the user about the problem.

if ($scope.tweet === "" || $scope.tweet === undefined) {
            $scope.error = "Please enter a valid query word";
            $scope.showError();
            return;
}

In the above snippet we check for empty or undefined input and display snackbar along with error accordingly.

if ((sinceDate !== "" && sinceDate !== undefined) && (endDate !== "" && endDate !== undefined)) {
            var date1 = new Date(sinceDate);
            var date2 = new Date(endDate);
            if (date1 > date2) {
                $scope.error = "To date should be after From date";
                $scope.showError();
                return;
            }
        }

The above snippet compares date. For comparing dates, first we fetch the values entered (via jquery date widget) into the respective input fields and then create JavaScript Date objects out of them. Finally we compare those Date objects to find out if there is any error or not.

Now it might happen that a particular search is taking a long time (perhaps due to network problem), however the user becomes impatient and tries to search again. In that case we need to inform the user that the previous search is still going on. For this purpose we use a boolean variable  to keep track whether the previous search is completed or still going on. If the previous search is going on and user tries to make a new search then we provide a proper notification and prevent the user from making further searches.

Finally we need to make sure that the user is online and has an active internet connection before the search can take place and Loklak API can be called. For this we have used navigator. We have polled the onLine property of navigator to find out whether the user is online or not. If the user is offline then we inform him that we cannot initiate a search due to internet connectivity problem.

if ($scope.isLoading === true) {
            $scope.error = "Previous search not completed. Please wait...";
            $scope.showError();
            return;
        }
        if (!navigator.onLine) {
            $scope.error = "You are currently offline. Please check your internet connection!";
            $scope.showError();
            return;
        }

Important resources

  • View the app source here.
  • View loklak apps site source here.
  • View Loklak API documentation here
  • View Jqcloud documentation here.
  • Learn more about AngularJS here.

One Click Deployment Button for loklak Using Heroku with Gradle Build

The one click deploy button makes it easy for the users of loklak to get their own cloud instance created and deployed in their heroku account and can be used according to their flexibility. Heroku uses an app.json manifest in the code repo to figure out what add-ons, config and other deployment steps are required to make the code run. This is used to configure and deploy the app.

Once you have provide the app name and then click on deploy button, Heroku will start deploying the loklak server to a new app on your account:

When setup is complete, you can open the deployed app in your browser or inspect it in Dashboard.

All these steps and requirements can now be encoded in an app.json file and placed in a repo alongside a button that kicks off the setup with a single click.

App.json is a manifest format for describing apps and specifying what their config requirements are. Heroku uses this file to figure out how code in a particular repo should be deployed on the platform. Here is the loklak’s app.json file which used gradle build pack:

{
	"name": "Loklak Server",
	"description": "Distributed Tweet Search Server",
	"logo": "https://raw.githubusercontent.com/loklak/loklak_server/master/html/images/loklak_anonymous.png",
	"website": "http://api.loklak.org",
	"repository": "https://github.com/loklak/loklak_server.git",
	"image": "loklak/loklak_server:latest-master",
	"env": {
		"BUILDPACK_URL": "https://github.com/heroku/heroku-buildpack-gradle.git"
	}
}

 

If you are interested you can try deploying the peer from here itself. Checkout how simple it can be to deploy.

Deploy button:

Deploy

Resources:

Auto Deploying loklak Server on Google Cloud Using Travis

This is a setup for loklak server which want to check in only the source files, but have the development branch in Kubernetes deployment automatically updated with some compiled output every time the push using details from Travis build.

How to achieve it?

Unix commands and shell script is one of the best option to automate all deployment and build activities. I explored Kubernetes Gcloud which can be accessed through unix command.

1.Checking for Travis build details before deployment:

Firstly check whether the repository is loklak_server, pull request is available and branches are either master or development, and then decide to update the docker image or not. The code of the aforementioned things is as follows:

if [ "$TRAVIS_REPO_SLUG" != "loklak/loklak_server" ]; then
    echo "Skipping image update for repo $TRAVIS_REPO_SLUG"
    exit 0
fi

if [ "$TRAVIS_PULL_REQUEST" != "false" ]; then
    echo "Skipping image update for pull request"
    exit 0
fi

if [ "$TRAVIS_BRANCH" != "master" ] && [ "$TRAVIS_BRANCH" != "development" ]; then
    echo "Skipping image update for branch $TRAVIS_BRANCH"
    exit 0
fi

2. Setting up Tag and Decrypting the credentials:

For the Kubernetes deployment, each time the travis build is successful, it takes the commit details from travis and appended into tag details for deployment and gcloud credentials is decrypted from the json file.

openssl aes-256-cbc -K $encrypted_48d01dc243a6_key -iv $encrypted_48d01dc243a6_iv  -in kubernetes/gcloud-credentials.json.enc -out kubernetes/gcloud-credentials.json -d

3. Install, Authenticate and Configure GCloud details with Kubernetes:

In this step, initially Google Cloud SDK should be installed with Kubernetes-

curl https://sdk.cloud.google.com | bash > /dev/null
source ~/google-cloud-sdk/path.bash.inc
gcloud components install kubectl

 

Then, Authenticate Google Cloud using the above mentioned decrypted credentials and finally configure the Google Cloud with the details like zone, project name, cluster details, number of nodes etc.

4. Update the Kubernetes deployment:

Since, in this issue it is specific to the loklak_server/development branch, so in here it checks if the branch is development or not and then updates the deployment using following command:

if [ $TRAVIS_BRANCH == "development" ]; then
    kubectl set image deployment/server --namespace=web server=$TAG
fi

 

Conclusion:

In this post, how to write a script in such a way that with each successful push after travis build how to update the deployment on Kubernetes GCloud.

Resources: