Adding JSONAPI Support in Open Event Android App

The Open Event API Server exposes a well documented JSONAPI compliant REST API that can be used in The Open Even App Generator and Frontend to access and manipulate data. So it is also needed to add support of JSONAPI in external services like The Open Even App Generator and Frontend. In this post I explain how to add JSONAPI support in Android.

There are many client libraries to implement JSONAPI support in Android or Java like moshi-jsonapi, morpheus etc. You can find the list here. The main problem is most of the libraries require to inherit attributes from Resource model but in the Open Event Android App we already inherit from a RealmObject class and in Java we can’t inherit from more than one model or class. So we will be using the jsonapi-converter library which uses annotation processing to add JSONAPI support.

1. Add dependency

In order to use jsonapi-converter in your app add following dependencies in your app module’s build.gradle file.

dependencies {
	compile 'com.github.jasminb:jsonapi-converter:0.7'
}

2.  Write model class

Models will be used to represent requests and responses. To support JSONAPI we need to take care of followings when writing the models.

  • Each model class must be annotated with com.github.jasminb.jsonapi.annotations.Type annotation
  • Each class must contain a String attribute annotated with com.github.jasminb.jsonapi.annotations.Id annotation
  • All relationships must be annotated with com.github.jasminb.jsonapi.annotations.Relationship annotation

In the Open Event Android we have so many models like event, session, track, microlocation, speaker etc. Here I am only defining track model because of its simplicity and less complexity.

@Type("track")
public class Track extends RealmObject {

        	@Id(IntegerIdHandler.class)
        	private int id;
        	private String name;
        	private String description;
        	private String color;
        	private String fontColor;
        	@Relationship("sessions")
        	private RealmList<Session> sessions;

        	//getters and setters
}

Jsonapi-converter uses Jackson for data parsing. To know how to use Jackson for parsing follow my previous blog.

Type annotation is used to instruct the serialization/deserialization library on how to process given model class. Each resource must have the id attribute. Id annotation is used to flag an attribute of a class as an id attribute. In above class the id attribute is int so we need to specify IntegerIdHandler class which is ResourceHandler in the annotation. Relationship annotation is used to designate other resource types as a relationship. The value in the Relationship annotation should be as per JSONAPI specification of the server. In the Open Event Project each track has the sessions so we need to add a Relationship annotation for it.

3.  Setup API service and retrofit

After defining models, define API service interface as you would usually do with standard JSON APIs.

public interface OpenEventAPI {
    @GET("tracks?include=sessions&fields[session]=title")
    Call<List<Track>> getTracks();
}

Now create an ObjectMapper & a retrofit object and initialize them.

ObjectMapper objectMapper = OpenEventApp.getObjectMapper();
Class[] classes = {Track.class, Session.class};

OpenEventAPI openEventAPI = new Retrofit.Builder()
                    .client(okHttpClient)
                    .baseUrl(Urls.BASE_URL)
                    .addConverterFactory(new JSONAPIConverterFactory(objectMapper, classes))
                    .build()
                    .create(OpenEventAPI.class);

 

The classes array instance contains a list of all the model classes which will be supported by this retrofit builder and API service. Here the main task is to add a JSONAPIConverterFactory which will be used to serialize and deserialize data according to JSONAPI specification. The JSONAPIConverterFactory constructor takes two parameters ObjectMapper and list of classes.

4.  Use API service  

Now after setting up all the things according to above steps, you can use the openEventAPI instance to fetch data from the server.

openEventAPI.getTracks();

Conclusion

JSON API is designed to minimize both the number of requests and the amount of data transmitted between clients and servers

Continue ReadingAdding JSONAPI Support in Open Event Android App

Reset SUSI.AI User Password & Parameter extraction from Link

In this blog I will discuss how does Accounts handle the incoming request to reset the password. If a user forgets his/her password, They can use forgot password button on http://accounts.susi.ai and It’s implementation is quite straightforward. As soon as a user enter his/her e-mail id and hits RESET button, an ajax call is made which first checks if the user email id is registered with SUSI.AI or not. If not, a failure message is thrown to user. But if the user is found to be registered in the database, An email is sent to him/her which contains a 30 characters long token. On the server token is hashed against the user’s email id and a validity of 7 days is set to it.
Let us have a look at the Reset Password link one receives.

http://accounts.susi.ai/?token={30 characters long token}

On clicking this link, what it does is that user is redirected to http://accounts.susi.ai with token as a request parameter. At the client side, A search is made which evaluates whether the URL has a token parameter or not. This was the overview. Since, http://accounts.susi.ai is based on ReactJS framework, it is not easy alike the native php functionality, but much more logical and systematic. Let us now take a closer look at how this parameter is searched for, token extracted and validated.

As you can see http://accounts.susi.ai and http://accounts.susi.ai/?token={token}, Both redirect the user to the same URL. So the first task that needs to be accomplished is check if a parameter is passed in the URL or not.
First import addUrlProps and UrlQueryParamTypes from react-url-query package and PropTypes from prop-types package. These will be required in further steps.
Have a look at the code and then we will understand it’s working.

const urlPropsQueryConfig = {
  token: { type: UrlQueryParamTypes.string },
};

class Login extends Component {
	static propTypes = {
    // URL props are automatically decoded and passed in based on the config
    token: PropTypes.string,
    // change handlers are automatically generated when given a config.
    // By default they update that single query parameter and maintain existing
    // values in the other parameters.
    onChangeToken: PropTypes.func,
  }

  static defaultProps = {
    token: "null",
  }

Above in the first step, we have defined by what parameter should the Reset Password triggered. It means, if and only if the incoming parameter in the URL is token, we move to next step, otherwise normal http://accounts.susi.ai page should be rendered. Also we have defined the data type of the token parameter to be UrlQueryParamTypes.string.
PropTypes are attributes in ReactJS similar to tags in HTML. So again, we have defined the data type. onChangeToken is a custom attribute which is fired whenever token parameter is modified. To declare default values, we have used defaultProps function. If token is not passed in the URL, by default it makes it null. This is still not the last step. Till now we have not yet checked if token is there or not. This is done in the componentDidMount function. componentDidMount is a function which gets called before the client renders the page. In this function, we will check that if the value of token is not equal to null, then a value must have been passed to it. If the value of token is not equal to null, it extracts the token from the URL and redirects user to reset password application. Below is the code snippet for better understanding.

componentDidMount() {
		const {
      token
    } = this.props;
		console.log(token)
		if(token !== "null") {
			this.props.history.push('/resetpass/?token='+token);
		}
		if(cookies.get('loggedIn')) {
			this.props.history.push('/userhome', { showLogin: false });
		}
	}

With this the first step of the process has been achieved. Again in the second step, we extract the token from the URL in the similar above fashion followed by an ajax request to the server which will validate the token and sends response to client accordingly {token might be invalid or expired}. Success is encoded in a JSON object and sent to client. Error is thrown as an error only and client shows an error of “invalid token”. If it is a success, it sends email id of the user against which the token was hashed and the client displays it on the page. Now user has to enter new password and confirm it. Client also tests whether both the fields have same values or not. If they are same, a simple ajax call is made to server with new password and the email id and token. If no error is caused in between (like connection timeout, server might be down etc), client is sent a JSON response again with “accepted” flag = true and message as “Your password has been changed!”.

Additional Resources:

(npmjs – official documentation of create-react-apps)

(Fortnox developer’s blog post)

(Dave Ceddia’s blog post for new ReactJS developers)

Continue ReadingReset SUSI.AI User Password & Parameter extraction from Link

Download SUSI.AI Setting Files from Data Folder

In this blog, I will discuss how the DownloadDataSettings servlet hosted on SUSI server functions. This post also covers a step by step demonstration on how to use this feature if you have hosted your own custom SUSI server and have admin rights to it. Given below is the endpoint where the request to download a particular file has to be made.

/data/settings

For systematic functionality and workflow, Users with admin login, are given a special access. This allows them to download the settings files and go through them easily when needed. There are various files which have email ids of registered users (accounting.json), user roles associated to them (authorization.json), groups they are a part of (groups.json) etc. To list all the files in the folder, use the given below end point:

/aaa/listSettings.json

How does the above servlet works? Prior to that, let us see how to to get admin rights on your custom SUSI.AI server.
For admin login, it is required that you have access to files and folders on server. Signup with an account and browse to

/data/settings/authorization.json

Find the email id with which you signed up for admin login and change userRole to “admin”. For example,

{
	"email:test@test.com": {
		"permissions": {},
		"userRole": "user"
	}
}

If you have signed up with an email id “test@test.com” and want to give admin access to it, modify the userRole to “admin”. See below.

{
	"email:test@test.com": {
		"permissions": {},
		"userRole": "admin"
	}
}

Till now, server did not have any email id with admin login or user role equal to admin. Hence, this exercise is required only for the first admin. Later admins can use changeUserRole application and give/change/modify user roles for any of the users registered. By now you must have admin login session. Let’s see now how the download and file listing servlets work.
First, the server creates a path by locally referencing settings folder with the help of DAO.data_dir.getPath(). This will give a string path to the data directory containing all the data-settings files. Now the server just has to make a JSONArray and has to pass a String array to JSONArray’s constructor, which will eventually be containing the name of all the data/settings files. If the process is not successfull ,then, “accepted” = false will be sent as an error to the user. The base user role to access the servlet is ADMIN as only admins are allowed to download data/setting files,
The file name which you have to download has to be sent in a HTTP request as a get parameter. For example, if an admin has to download accounting.json to get the list of all the registered users, the request is to be made in the following way:

BASE_URL+/data/settings?file=file_name

*BASE_URL is the URL where the server is hosted. For standard server, use BASE_URL = http://api.susi.ai.

In the initial steps, Server generates a path to data/settings folder and finds the file, name of which it receives in the request. If no filename is specified in the API call, by default, the server sends accounting.json file.

File settings = new File(DAO.data_dir.getPath()+"/settings");
String filePath = settings.getPath(); 
String fileName = post.get("file","accounting"); 
filePath += "/"+fileName+".json";

Next, the server will extract the file and using ServletOutputStream class, it will generate properties for it and set appropriate context for it. This context will, in turn, fetch the mime type for the file generated. If the mime type is returned as null, by default, mime type for the file will be set to application/octet-stream. For more information on mime type, please look at the following link. A complete list of mime types is compiled and documented here.

response.setContentType(mimetype);
response.setContentLength((int)file.length());

In the above code snippet, mime type and length of the file being downloaded is set. Next, we set the headers for the download response and use filename for that.

response.setHeader("Content-Disposition", "attachment; filename=" + fileName +".json");

All the manual work is done by now. The only thing left is to open a buffer stream, size of which has been defined as a class variable.
Here we use a byte array of size 4096 elements and write the file to client’s default download storage.

private static final int BUFSIZE = 4096;
byte[] byteBuffer = new byte[BUFSIZE];
             DataInputStream in = new DataInputStream(new FileInputStream(file));
            while ((in != null) && ((length = in.read(byteBuffer)) != -1))
            {
                outStream.write(byteBuffer,0,length);
            }

            in.close();
            outStream.close();

All the above-mentioned steps are enclosed in a try-catch block, which catches an exception if any ,and logs it in the log file. This message is also sent to the client for appropriate user information along with the success or failure indication through a boolean flag. Do not forget to close the input and output buffers as it may lead to memory leaks and someone with proper knowledge of network and buffer stream would be able to steal any essential or secured data.

Additional Resources

Continue ReadingDownload SUSI.AI Setting Files from Data Folder

Password Recovery Link Generation in SUSI with JSON

In this blog, I will discuss how the SUSI server processes requests using JSON when a request for password recovery is made.. The blog post will also cover some parts of the client side implementation as well for better insight.
All the clients function in the same way. When you click on forget password button, client asks you that whether you want to recover password for your own custom server or the standard one. This choice of user defines the base URL where to make the forget password request. If you select custom server radio button, then you will be asked to enter the URL to your server, Otherwise standard base URL will be used. The API endpoint used will be

/aaa/recoverpassword.json

The client will make a POST request with “forgotemail” parameter containing the email id of the user making the request. Following the email id provided, server generates a client identity if and only if the email id is registered. If email id is not found in the database, server throws an exception with error code 422.

String usermail = call.get("forgotemail", null);
ClientCredential credential = new ClientCredential(ClientCredential.Type.passwd_login, usermail);
ClientIdentity identity = new ClientIdentity(ClientIdentity.Type.email, credential.getName());
if (!DAO.hasAuthentication(credential)) {
	throw new APIException(422, "email does not exist");
}

If the email id is found to be registered against a valid user in the database, call to a method is made which returns a random string of length passed in as a parameter. This returned random string acts as a token.
Below is the implementation of the createRandomString(int length) method.

public static String createRandomString(Integer length){
    	char[] chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
    	StringBuilder sb = new StringBuilder();
    	Random random = new Random();
    	for (int i = 0; i < length; i++) {
    	    char c = chars[random.nextInt(chars.length)];
    	    sb.append(c);
    	}
    	return sb.toString();
    }

This method is defined in AbsractAPIHandler class. The function createRandomString(int length) initialises an array with alphabet (both upper and lower cases) and integers 1 to 10. An object of StringBuilder is declared and initialised in the fol loop.
Next, the token generated is hashed against the user’s email id. Since we have used a token of length 30, there will be 30! (30 factorial) combinations and hence chances of two tokens to be same is ver very low (approximately Zero). Validity for token is set for 7 days (i.e. one week). After that the token will expire and to reset the password a new token will be needed.

String token = createRandomString(30);
		ClientCredential tokenkey = new ClientCredential(ClientCredential.Type.resetpass_token, token);
		Authentication resetauth = new Authentication(tokenkey, DAO.passwordreset);
		resetauth.setIdentity(identity);
		resetauth.setExpireTime(7 * 24 * 60 * 60);
		resetauth.put("one_time", true);

Everything is set by now. Only thing left is send a mail to the user. For that we call a method sendEmail() of EmailHandler class. This requires 3 parameters. User email id, subject for the email, and the body of the email. The body contains a verification link. To get this verification link, getVerificationMailContent(String token) is called and token generated in the previous step is sent to it as a parameter.

String verificationLink = DAO.getConfig("host.url", "http://127.0.0.1:9000") + "/apps/resetpass/index.html?token=" + token;

The above command gets the base URL for the server and appends the link to reset password app along with the token it received in the method call. Rest of the body is saved as a template in /conf//templates/reset-mail.txt file. Finally, if no exception was catched during the process, the message “Recovery email sent to your email ID. Please check” and accepted = true is encoded into JSON data and sent to the client. If some exceptions was encountered, The exception message and accepted = false is sent to client.
Now the client processes the JSON object and notifies the user appropriately.

Additional Resources

Continue ReadingPassword Recovery Link Generation in SUSI with JSON

How to Implement Memory like Servlets in SUSI.AI

In this blog, I’ll be discussing about how a server gets previous messages from the Log files in SUSI server. SUSI AI clients, Android, iOS and web chat, follow a very simple rule. Whenever a user logs in to the app, the app makes a http GET call to the server in the background and in response, server returns the chat history.

Link to the API endpoint -> http://api.susi.ai/susi/memory.json

But parsing a lot of data might depend on the connection speed. If the connection is poor or lacking speed, the history would cost user’s time. To prevent this, server by default returns last 10 pair of messages. It is up to the client that how many messages they want to render. So for example, if the client requests last 5 messages, then the client has to make a GET request and pass the cognitions parameter. Hence the modified end point will be :

http://api.susi.ai/susi/memory.json?cognitions=2

But how does the server process it? Let us see.
Browse to susi_server/src/ai/susi/server/api/susi/UserService.java file. This is the main working servlet. If you are new and wondering how servlets for susi are implemented, Please go through this first how-to-add-a-new-servletapi-to-susi-server
This is how serviceImpl() method looks like :

@Override
    public ServiceResponse serviceImpl(Query post, HttpServletResponse response, Authorization user, final JsonObjectWithDefault permissions) throws APIException {

        int cognitionsCount = Math.min(10, post.get("cognitions", 10));
        String client = user.getIdentity().getClient();
        List<SusiCognition> cognitions = DAO.susi.getMemories().getCognitions(client);
        JSONArray coga = new JSONArray();
        for (SusiCognition cognition: cognitions) {
            coga.put(cognition.getJSON());
            if (--cognitionsCount <= 0) break;
        }
        JSONObject json = new JSONObject(true);
        json.put("cognitions", coga);
        return new ServiceResponse(json);
    }

In the first step, we find the minimum of default value (i.e. 10) and the value of cognitions received as GET parameter. Messages equivalent to minimum variable are encoded in JSONArray and sent to the client.

Whenever the server receives a valid signup request, It makes a directory with the name “email_emailid”. In this directory, a log.txt file is maintained which stores all the queries along with the other details associated with it. For example if user has signed up with the email id example@example.com, Then the path of this directory will be /data/susi/email_example@example.com. If a user queries “http://api.susi.ai/susi/chat.json?timezoneOffset=-330&q=flip+a+coin”,  then

{
	"query": "flip a coin",
	"count": 1,
	"client_id": "",
	"query_date": "2017-06-30T12:22:05.918Z",
	"answers": [{
		"data": [{
			"0": "flip a coin",
			"token_original": "coin",
			"token_canonical": "coin",
			"token_categorized": "coin",
			"timezoneOffset": "-330",
			"answer": "tails",

			"skill": "/susi_skill_data/models/general/entertainment/en/flip_coin.txt",
			"_etherpad_dream": "cricket"
		},
		"metadata": {
			"count": 1
		},
		"actions": [{
			"type": "answer",
			"expression": "tails"
		}],
		"skills": ["/susi_skill_data/models/general/entertainment/en/flip_coin.txt"]
	}],
	"answer_date": "2017-06-30T12:22:05.928Z",
	"answer_time": 10,
	"language": "en"
}

The server has user’s identity. It will use this identity and store (will be appended) in the respective log file.

The next steps in retrieving the message are pretty easy and includes getting the identity of the current user session. Use this identity to populate the JSONArray named coga. This is finally encoded in a JSONObject along with other basic details and returned to the clients where they render the data received, and show the messages in an appropriate way.

Resources

Continue ReadingHow to Implement Memory like Servlets in SUSI.AI

Simplifying Scrapers using BaseScraper

Loklak Server‘s main function is to scrape data from websites and other sources and output in different formats like JSON, xml and rss. There are many scrapers in the project that scrape data and output them, but are implemented with different design and libraries which makes them different from each other and a difficult to fix changes.

Due to variation in scrapers’ design, it is difficult to modify them and fix the same issue (any issue, if it appears) in each of them. This issue signals fault in design. To solve this problem, Inheritance can be brought into application. Thus, I created BaseScraper abstract class so that scrapers are more concentrated on fetching data from HTML and all supportive tasks like creating connection with the help of url are defined in BaseScraper.

The concept is pretty easy to implement, but for a perfect implementation, there is a need to go through the complete list of tasks a scraper does.

These are the following tasks with descriptions and how they are implemented using BaseScraper:

  1. Endpoint that triggers the scraper

Every search scraper inherits class AbstractAPIHandler. This is used to fetch get parameters from the endpoint according to which data is scraped from the scraper. The arguments from serviceImpl method is used to generate output and is returned to it as JSONObject.

For this task, the method serviceImpl has been defined in BaseScraper and method getData is implemented to return the output. This method is the driver method of the scraper.

public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights, JSONObjectWithDefault permissions) throws APIException {
    this.setExtra(call);
    return this.getData().toJSON(false, "metadata", "posts");
}

 

  1. Constructor

The constructor of Scraper defines the base URL of the website to be scraped, name of the scraper and data structure to fetch all get parameters input to the scraper. For get parameters, the Map data structure is used to fetch them from Query object.

Since every scraper has it’s own different base URL, scraper name and get parameters used, so it is implemented in respective Scrapers. QuoraProfileScraper is an example which has these variables defined.

  1. Get all input variables

To get all input variables, there are setters and getters defined for fetching them as Map from Query object in BaseScraper. There is also an abstract method getParam(). It is defined in respective scrapers to fetch the useful parameters for scraper and set them to the scraper’s class variables.

// Setter for get parameters from call object
protected void setExtra(Query call) {
    this.extra = call.getMap();
    this.query = call.get("query", "");
    this.setParam();
}

// Getter for get parameter wrt to its key
public String getExtraValue(String key) {
    String value = "";
    if(this.extra.get(key) != null) {
        value = this.extra.get(key).trim();
    }
    return value;
}

// Defination in QuoraProfileScraper
protected void setParam() {
    if(!"".equals(this.getExtraValue("type"))) {
        this.typeList = Arrays.asList(this.getExtraValue("type").trim().split("\\s*,\\s*"));
    } else {
        this.typeList = new ArrayList<String>();
        this.typeList.add("all");
        this.setExtraValue("type", String.join(",", this.typeList));
    }
}

 

  1.  URL creation for web scraper

The URL creation shall be implemented in a separate method as in TwitterScraper. The following is the rough implementation adapted from one of my pull request:

protected String prepareSearchUrl(String type) {
    URIBuilder url = null;
    String midUrl = "search/";

    try {
        switch(type) {
            case "question":
                url = new URIBuilder(this.baseUrl + midUrl);
                url.addParameter("q", this.query);
                url.addParameter("type", "question");
        .
        .
    }
    .
    .
    return url.toString();
}

 

  1. Get BufferedReader object from InputStream

getDataFromConnection method fetches the BufferedReader object from ClientConnection. This object reads the web page line by line by the scrape method to fetch data. See here.

ClientConnection connection = new ClientConnection(url);
BufferedReader br = getHtml(connection);
.
.
.
public BufferedReader getHtml(ClientConnection connection) {

    if (connection.inputStream == null) {
        return null;
    }

    BufferedReader br = new BufferedReader(new InputStreamReader(connection.inputStream, StandardCharsets.UTF_8));
    return br;
}

 

  1. Scraping of data from HTML

The Scraper method for scraping data is declared abstract in BaseScraper and defined in the scraper. This can be a perfect example of implementation for BaseScraper (See code the here) and scraper (here).

  1. Output of data

The output of scrape method is fetched in Post data objects that are implemented for the respective scraper. These Post objects are added to Timeline iterator and which outputs data as JSONArray. Later the objects are output in enclosed Post object wrapper.

This data can be directly output as Post object, but adding it to iterator makes the Post Objects capable to be sorted in an order and be indexed to ElasticSearch.

 

Resources

Continue ReadingSimplifying Scrapers using BaseScraper
Read more about the article Iterating the Loklak Server data
Iterating the Loklak Server data

Iterating the Loklak Server data

Loklak Server is amazing for what it does, but it is more impressive how it does the tasks. Iterators are used for and how to use them, but this project has a customized iterator that iterates Twitter data objects. This iterator is Timeline.java .

Timeline implements an interface iterable (isn’t it iterator?). This interface helps in using Timeline as an iterator and add methods to modify, use or create the data objects. At present, it only iterates Twitter data objects. I am working on it to modify it to iterate data objects from all web scrapers.

The following is a simple example of how an iterator is used.

// Initializing arraylist
List<String> stringsList = Arrays.asList("foo", "bar", "baz");

// Using iterator to display contents of stringsList
System.out.print("Contents of stringsList: ");

Iterator iter = al.iterator();
while(iter.hasNext()) {
    System.out.print(iter.next() + " ");
}

 

This iterator can only iterate data the way array does. (Then why do we need it?) It does the task of iterating objects perfectly, but we can add more functionality to the iterator.

 

Timeline iterator iterates the MessageEntry objects i.e. superclass of TwitterTweet objects. According to Javadocs, “Timeline is a structure which holds tweet for the purpose of presentation, There is no tweet retrieval method here, just an iterator which returns the tweets in reverse appearing order.”

Following are some of the tasks it does:

  1. As an iterator:

This basic use of Timeline is to iterate the MessageEntry objects. It not only iterates the data objects, but also fetches them (See here).

// Declare Timeline object according to order the data object has been created
Timeline tline = new Timeline(Timeline.parseOrder("created_at"));

// Adding data objects to the timeline
tline.add(me1);
tline.add(me2);
.
.
.
// Outputing all data objects as array of JSON objects
for (MessageEntry me: tline) {
    JSONArray postArray = new JSONArray();
    for (MessageEntry post : this) {
        postArray.put(post.toJSON());
    }
}

 

  1. The order of iterating the data objects

Timeline can arrange and iterate the data objects according to the date of creation of the twitter post, number of retweets or number of favourite counts. For this there is an Enum declaration of Order in the Timeline class which is initialized during creation of Timeline object. [link]

    Timeline tline = new Timeline(Timeline.parseOrder("created_at"));

 

  1. Pagination of data objects

There is an object cursor, some methods, including getter and setters to support pagination of the data objects. It is only internally implemented, but can also be used to return a section of the result.

  1. writeToIndex method

This method can be used to write all data fetched by Timeline iterator to ElasticSearch for indexing and to dump that can be used for testing. Thus, indexing of data can concurrently be done while it is iterated. It is implemented here.

  1. Other methods

It also has methods to output all data as JSON and customized method to add data to Timeline keeping user object and Data separate, etc. There are a bit more things in this iterable class which shall be explored instead.

 

Resources:

Continue ReadingIterating the Loklak Server data

Multithreading implementation in Loklak Server

Loklak Server is a near-realtime system. It performs a large number of tasks and are very costly in terms of resources. Its basic function is to scrape all data from websites and output it at the endpoint. In addition to scraping data, there is also a need to perform other tasks like refining and cleaning of data. That is why, multiple threads are instantiated. They perform other tasks like:

  1. Refining of data and extract more data

The data fetched needs to be cleaned and refined before outputting it. Some of the examples are:

a) Removal of html tags from tweet text:

After extracting text from html data and feeding to TwitterTweet object, it concurrently runs threads to remove all html from text.

b) Unshortening of url links:

The url links embedded in the tweet text may track the users with the help of shortened urls. To prevent this issue, a thread is instantiated to unshorten the url links concurrently while cleaning of tweet text.

  1. Indexing all JSON output data to ElasticSearch

While extracting JSON data as output, there is a method here in Timeline.java that indexes data to ElasticSearch.

Managing multithreading

To manage multithreading, Loklak Server applies following objects:

1. ExecutorService

To deal with large numbers of threads ExecutorService object is used to handle threads as it helps JVM to prevent any resource overflow. Thread’s lifecycle can be controlled and its creation cost can be optimized. This is the best example of ExecutorService application is here:

.
.
public class TwitterScraper {
    // Creation of at max 40 threads. This sets max number of threads to 40 at a time
    public static final ExecutorService executor = Executors.newFixedThreadPool(40);
    .
    .
    .
    .
    // Feeding of TwitterTweet object with data
    TwitterTweet tweet = new TwitterTweet(
        user.getScreenName(),
        Long.parseLong(tweettimems.value),
        props.get("tweettimename").value,
        props.get("tweetstatusurl").value,
        props.get("tweettext").value,
        Long.parseLong(tweetretweetcount.value),
        Long.parseLong(tweetfavouritecount.value),
        imgs, vids, place_name, place_id,
        user, writeToIndex,  writeToBackend
    );
    // Starting thread to refine TwitterTweet data
    if (tweet.willBeTimeConsuming()) {
       executor.execute(tweet);
    }    .
    .
    .

 

2. basic Thread class

Thread class can also be used instead of ExecutorService in cases where there is no resource crunch. But it is always suggested to use ExecutorService due to its benefits. Thread implementation can be used as an anonymous class like here.

3. Runnable interface

Runnable interface can be used to create an anonymous class or classes which does more task than just a task concurrently. In Loklak Server, TwitterScraper concurrently indexes the data to ElasticSearch, unshortens link and cleans data. Have a look at implementation here.

Resources:

Continue ReadingMultithreading implementation in Loklak Server

Unifying Data from Different Scrapers of loklak server using Post

Loklak Server project is a software that scrapes data from different websites through different endpoints. It is difficult to create a single endpoint. For a single endpoint, there is a need of a decent design for using multiple scrapers. For such a task, multiple changes are needed. That is why one of the changes I introduced was Post class that acts as both wrapper and an interface for data objects of search scrapers (though implementation in scrapers is in progress).

Post is a subclass of JSONObject that helps in working with JSON data in Java. In other words, Post is a JSONObject with an identity (we call it postId) and and a timestamp of the data scraped. It is used to capture data fetched by the web-scrapers. Benefit of JSONObject as superclass is that it provides methods to capture and access data efficiently.

Why Post?

At present there is a Class MessageEntry which is the superclass of TwitterTweet (data object of TwitterScraper). It has numerous methods that can be used by data objects to clean and analyse data. But it has a disadvantage, it is a specialized for social websites like Twitter, but will become redundant for different types websites like Quora, Github, etc.

Whereas Post object is a small but powerful and flexible object with its ability to deal with data like JSONObject. It contains getter and setter methods, identity members used to provide each Post object a unique identity. It doesn’t have any methods for analysis and cleaning of data, but MessageEntry class’ methods can be used for this purpose.

Uses of Post Object

When I started working on Post Object, it could be used as marker interface for data objects. Following are the advantages I came up with it:

1) Accessing the data object of any scraper using its variable. And yes, this is the primary reason it is an interface.

2) But in addition to accessing the data objects, one can also directly use it to fetch, modify or use data without knowing the scraper it belongs. This feature is useful in Timeline iterator.

This is an example how Post interface is used to append two lists of Posts (maybe carrying different type of data) into one.

public void mergePost(PostTimeline list) {
    for (Post post: list) {
        this.add(post);
    }
}

 

Post as a wrapper object

While working on Post object, I converted it into a class to also use it as a wrapper. But why a wrapper? Wrapper can be used to wrap a list of Post objects into one object. It doesn’t have any identity or timestamp. It is just a utility to dump a pack of data objects with homogeneous attributes.

This is an example implementation of Post object as wrapper. typeArray is a wrapper which is used to store 2 arrays of data objects in it. These data object arrays are timeline objects that are saved as JSONArray objects in the Post wrapper.

    Post typeArray = new Post(true);
    switch(type) {
        case "users":
            typeArray.put("users", scrapeProfile(br, url).toArray());
            break;
        case "question":
            typeArray.put("question", scrapeQues(br, url).toArray());
            break;
        default:
            break;
    }

 

Resources:

 

Continue ReadingUnifying Data from Different Scrapers of loklak server using Post

Shrinking Model Classes Boilerplate in Open Event Android Projects Using Jackson and Lombok

JSON is the de facto standard format used for REST API communication, and for consuming any of such API on Android apps like Open Event Android Client and Organiser App, we need Plain Old Java Objects, or POJOs to map the JSON attributes to class properties. These are called models, for they model the API response or request. Basic structure of these models contain

  • Private properties representing JSON attributes
  • Getters and Setters for these properties used to change the object or access its data
  • A toString() method which converts object state to a string, useful for logging and debugging purposes
  • An equals and hashcode method if we want to compare two objects

These can be easily and automatically be generated by any modern IDE, but add unnecessarily to the code base for a relatively simple model class, and are also difficult to maintain. If you add, remove, or rename a method, you have to change its getters/setters, toString and other standard data class methods.

There are a couple of ways to handle it:

  • Google’s Auto Value: Creates Immutable class builders and creators, with all standard methods. But, as it generates a new class for the model, you need to add a library to make it work with JSON parsers and Retrofit. Secondly, there is no way to change the object attributes as they are immutable. It is generally a good practice to make your models immutable, but if you are manipulating their data in your application and saving it in your database, it won’t be possible except than to create a copy of that object. Secondly, not all database storage libraries support it
  • Kotlin’s Data Classes: Kotlin has a nice way to made models using data classes, but it has certain limitations too. Only parameters in primary constructor will be included in the data methods generated, and for creating a no argument constructor (required for certain database libraries and annotation processors), you either need to assign default value to each property or call the primary constructor filling in all the default values, which is a boilerplate of its own. Secondly, sometimes 3rd party libraries are needed to work correctly with data classes on some JSON parsing frameworks, and you probably don’t want to just include Kotlin in your project just for data classes
  • Lombok: We’ll be talking about it later in this blog
  • Immutables, Xtend Lang, etc

This is one kind of boilerplate, and other kind is JSON property names. As we know Java uses camelcase notation for properties, and JSON attributes are mostly:

  • Snake Cased: property_name
  • Kebab Cased: property-name

Whether you are using GSON, Jackson or any other JSON parsing framework, it works great for non ambiguous property names which match as same in both JSON and Java, but requires special naming attributes for translating JSON attributes to camelcase and vice versa. Won’t you want your parser to intelligently convert property_name to propertyName without having you write the tedious mapping which is not only a boilerplate, but also error prone in case your API changes and you forget to update the annotations, or make a spelling mistake, as they are just non type-safe strings.

These boilerplates cause serious regressions during development for what should be a simple Java model for a simple API response. These both kinds of boilerplate are also related to each other as all JSON parsers look for getters and setters for private fields, so there are two layers of potential errors in modeling JSON to Java Models. This should be a lot simpler than it is. So, in this blog, we’ll see how we can configure our project to be 0 boilerplate tolerant and error free. We reduced approximately 70% boilerplate using this configuration in our projects. For example, the Event model class we had (our biggest) reduced from 590 lines to just 74!

We will use a simpler class for our example here:

public class CallForPapers {

    private String announcement;
    @JsonProperty("starts-at")
    private String startsAt;
    private String privacy;
    @JsonProperty("ends-at")
    private String endsAt;

    // Getters and Setters

    @Override
    public String toString() {
        return "CallForPapers{" +
                "announcement='" + announcement + '\'' +
                ", startsAt='" + startsAt + '\'' +
                ", privacy='" + privacy + '\'' +
                ", endsAt='" + endsAt + '\'' +
                '}';
    }
}

 

Note that getters and setters have been omitted for brevity. The actual class is 57 lines long

As you can see, we are using @JsonProperty annotation to properly map the starts-at attribute to startsAt property and similarly on endsAt. First, we’ll remove this boilerplate from our code. Note that this seems a bit overkill for 2 attributes, but imagine the time you’ll save by not having to maintain 100s of attributes for the whole project.

Jackson is smart enough to map different naming styles to one another in both serializing and deserializing. This is done by using Naming Strategy class in Jackson. There is an option to globally configure it, but I found that it did not work for our case, so we had to apply it to each model. It can be simply done by adding another annotation on the top of your class declaration and removing the JsonProperty attribute from your fields

@JsonNaming(PropertyNamingStrategy.KebabCaseStrategy.class)
public class CallForPapers {

    private String announcement;
    private String startsAt;
    private String privacy;
    private String endsAt;

    // Getters and Setters

    @Override
    public String toString() {
        return "CallForPapers{" +
                "announcement='" + announcement + '\'' +
                ", startsAt='" + startsAt + '\'' +
                ", privacy='" + privacy + '\'' +
                ", endsAt='" + endsAt + '\'' +
                '}';
    }
}

 

Our class looks like this now. But be careful to properly name your getters and setters because now, Jackson will map attributes by method names, so if you name the setter for startsAt -> setStartsAt it will automatically understand that the attribute to be mapped is “starts-at”. But, if the method name is something else, then it won’t be able to correctly map the fields. If your properties are not private, then Jackson may instead use them to map fields, so be sure to name your public properties in a correct manner.

Note: If your API does not use kebab case, there are plenty of other options or naming strategies present in Jackson, one example will be

  • PropertyNamingStrategy.SnakeCaseStrategy for attributes like “starts_at”

 Needless to say, this will only work if your API uses a uniform naming strategy

Now we have removed quite a burden from the development lifecycle, but 70% of class is still getters, setters, toString and other data methods. Now, we’ll configure lombok to automatically generate these for us. First, we’ll need to add lombok in our project by adding provided dependency in build.gradle and sync the project

provided 'org.projectlombok:lombok:1.16.18'

 

And now you’d want to install Lombok plugin in Android Studio by going to Files > Settings > Plugins and searching and installing Lombok Plugin and restarting the IDE

 

After you have restarted the IDE, navigate to your model and add @Data annotation at the top of your class and remove all getters/setters, toString, equals, hashcode and if the plugin was installed correctly and lombok was installed from the gradle dependencies, these will be automatically generated for you at build time without any problem. A way for you to see the generated methods is to the structure perspective in the Project Window.

 

There are many more fun tools in lombok and more fine grained control options are provided. Our class looks like this now

@Data
@JsonNaming(PropertyNamingStrategy.KebabCaseStrategy.class)
public class CallForPapers {
    private String announcement;
    private String startsAt;
    private String privacy;
    private String endsAt;
}

 

Reduced to 16 lines (including imports and package). Now, there are some corner cases that you want to iron out for the integration between lombok and Jackson to work correctly.

Lombok uses property names for generating its getters and setters. But there’s a different convention for handling booleans. For the sake of simplicity, we’ll only talk about primitive boolean. You can check out the links below to learn more about class type. The primitive boolean property of the standard Java format, for example hasSessions will generate hasSessions and getter and setHasSessions. Jackson is smart but it expects a getter named getHasSessions creating problems in serialization. Similarly, for a property name isComplete, generate getter and setter will be isComplete and setComplete, creating a problem in deserialization too. Actually, there are ways how Jackson can get boolean values mapped correctly with these getters/setters, but that method needs to rename property itself, changing the getters and setters generated by Lombok. There is actually a way to tell Lombok to not generate this format of getter/setter. You’d need to create a file named lombok.config in your project directory app/ and write this in it

lombok.anyConstructor.suppressConstructorProperties = true
lombok.addGeneratedAnnotation = false
lombok.getter.noIsPrefix = true

 

There are some other settings in it that make it configured for Android specific project

There are some known issues in Android related to Lombok. As lombok itself is an annotation processor, and there is no order for annotation processors to run, it may create problems with other annotation processors. Dagger had issues with it until they fixed it in their later versions. So you might need to check out if any of your libraries depend upon the lombok generated code like getters and setters. Certain database libraries use that and Android Data Binding does too. Currently, there is no solution to the problem as they will throw an error about not finding a getter/setter because they ran before lombok. A possible workaround is to make properties public so that instead of using getters and setters, these libraries use them instead. This is not a good practice, but as this is a data class and you are already creating getters and setters for all fields, this is not a security vulnerability.

There are tons of options for both Jackson and Lombok with a lot of features to help the development process, so be sure to check out these links:

Continue ReadingShrinking Model Classes Boilerplate in Open Event Android Projects Using Jackson and Lombok