Adding Push endpoint to send data from Loklak Search to Loklak Server

To provide enriched and sufficient amount of data to Loklak, Loklak Server should have multiple sources of data. The api/push.json endpoint of loklak server is used in Loklak to post the search result object to server. It will increase the amount and quality of data on server once the Twitter api is supported by Loklak (Work is in progress to add support for twitter api in loklak).

Creating Push Service

The idea is to create a separate service for making a Post request to server. First step would be to create a new ‘PushService’ under ‘services/’ using:

ng g service services/push

Creating model for Push Api Response

Before starting to write code for push service, create a new model for the type of response data obtained from Post request to ‘api/push.json’. For this, create a new file push.ts under ‘models/’ with the code given below and export the respective push interface method in index file.

export interface PushApiResponse {
   status: string;
   records: number;
   mps: number;
   message: string;
}

Writing Post request in Push Service

Next step would be to create a Post request to api/push.json using HttpClient module. Import necessary dependencies and create an object of HttpClient module in constructor and write a PostData() method which would take the data to be send, makes a Post request and returns the Observable of PushApiResponse created above.

import { Injectable } from ‘@angular/core’;
import {
   HttpClient,
   HttpHeaders,
   HttpParams
} from ‘@angular/common/http’;
import { Observable } from ‘rxjs’;
import {
	ApiResponse,
	PushApiResponse
} from ‘../models’;

@Injectable({
   providedIn: ‘root’
})
export class PushService {

   constructor( private http: HttpClient ) { }
   public postData(data: ApiResponse):
   		Observable<PushApiResponse> {

	const httpUrl = ‘https://api.loklak.org/
		api/push.json’;
	const headers = new HttpHeaders({
		‘Content-Type’: ‘application/
			x-www-form-urlencoded’,
		‘Accept’: ‘application/json’,
		‘cache-control’: ‘no-cache’
	});
	const {search_metadata, statuses} = data;
	
	// Converting the object to JSON string.
	const dataToSend = JSON.stringify({
		search_metadata: search_metadata,
		statuses});
	
	// Setting the data to send in
	// HttpParams() with key as ‘data’
	const body = new HttpParams()
		.set(‘data’, dataToSend);
	
	// Making a Post request to api/push.json
	// endpoint. Response Object is converted
	// to PushApiResponse type.
	return this.http.post<PushApiResponse>(
		httpUrl, body, {headers:
		headers
	});
   }
}

 

Note: Data (dataToSend) send to backend should be exactly in same format as obtained from server.

Pushing data into service dynamically

Now the main part is to provide the data to be send into the service. To make it dynamic, import the Push Service in ‘api-search.effects.ts’ file under effects and create the object of Push Service in its constructor.

import { PushService } from ‘../services’;
constructor(
   
   private pushService: PushService
) { }

 

Now, call the pushService object inside ‘relocateAfterSearchSuccess$’ effect method and pass the search response data (payload value of search success action) inside Push Service’s postData() method.

@Effect()
relocateAfterSearchSuccess$: Observable<Action>
   = this.actions$
       .pipe(
           ofType(
               apiAction.ActionTypes
			   	.SEARCH_COMPLETE_SUCCESS,
               apiAction.ActionTypes
			   	.SEARCH_COMPLETE_FAIL
           ),
           withLatestFrom(this.store$),
           map(([action, state]) => {
               this.pushService
			   .postData(action[‘payload’]);
           
       );

Testing Successful Push to Backend

To test the success of Post request, subscribe to the response data and print the response data on console. You should see something like:

Where each of these is a response of one successful Post request.

Resources

Continue ReadingAdding Push endpoint to send data from Loklak Search to Loklak Server

Indexing for multiscrapers in Loklak Server

I recently added multiscraper system which can scrape data from web-scrapers like YoutubeScraper, QuoraScraper, GithubScraper, etc. As scraping is a costly task, it is important to improve it’s efficiency. One of the approach is to index data in cache. TwitterScraper uses multiple sources to optimize the efficiency.

This system uses Post message holder object to store data and PostTimeline (a specialized iterator) to iterate the data objects. This difference in data structures from TwitterScraper leads to the need of different approach to implement indexing of data to ElasticSearch (currently in review process).

These are the following changes I made while implementing ‘indexing of data’ in the project.

1) Writing of data is invoked only using PostTimeline iterator

In TwitterScraper, the data is written in message holder TwitterTweet. So all the tweets are written to index as they are created. Here, when the data is scraped, Writing of the posts is initiated. Scraping of data is considered a heavy process. This approach keeps lower resource usage in average traffic on the server.

protected Post putData(Post typeArray, String key, Timeline2 postList) {
   if(!"cache".equals(this.source)) {
       postList.writeToIndex();
   }
   return this.putData(typeArray, key, postList.toArray());
}

2) One object for holding a message

During the implementation, I kept the same message holder Post and post-iterator PostTimeline from scraping to indexing of data. This helps to keep the structure uniform. Earlier approach involves different types of message wrappers in the way. This approach cuts the processes for looping and transitioning of data structures.

3) Index a list, not a message

In TwitterScraper, as the messages are enqueued in the bulk to be indexed. But in this approach, I have enqueued the complete lists. This approach delays the indexing till the scraper is done with processing the html.

Creating the queue of postlists:

// Add post-lists to queue to be indexed
queueClients.incrementAndGet();
try {
    postQueue.put(postList);
} catch (InterruptedException e) {
DAO.severe(e);
}
queueClients.decrementAndGet();

 

Indexing of the posts in postlists:

// Start indexing of data in post-lists
for (Timeline2 postList: postBulk) {
    if (postList.size() < 1) continue;
    if(postList.dump) {
        // Dumping of data in a file
        writeMessageBulkDump(postList);
    }
    // Indexing of data to ElasticSearch
    writeMessageBulkNoDump(postList);
}

 

4) Categorizing the input parameters

While searching the index, I have divided the query parameters from scraper into 3 categories. The input parameters are added to those categories (implemented using map data structure) and thus data fetched are according to them. These categories are:

// Declaring the QueryBuilder
BoolQueryBuilder query = new BoolQueryBuilder();

 

a) Get the parameter– Get the results for the input fields in map getMap.

// Result must have these fields. Acts as AND operator
if(getMap != null) {
    for(Map.Entry<String, String> field : getMap.entrySet()) {
        query.must(QueryBuilders.termQuery(
field.getKey(), field.getValue()));
    }
}

 

b) Don’t get the parameter- Don’t get the results for the input fields in map notGetMap.

// Result must not have these fields.
if(notGetMap != null) {
    for(Map.Entry<String, String> field : notGetMap.entrySet()) {
        query.mustNot(QueryBuilders.termQuery(
                field.getKey(), field.getValue()));
    }
}

 

c) Get if possible- Get the results with the input fields if they are present in the index.

// Result may preferably also get these fields. Acts as OR operator
if(mayAlsoGetMap != null) {
    for(Map.Entry<String, String> field : mayAlsoGetMap.entrySet()) {
        query.should(QueryBuilders.termQuery(
                field.getKey(), field.getValue()));

    }
}

 

By applying these changes, the scrapers are shifted from a message indexing to list of messages indexing. This way we are keeping load on RAM low, but the aggregation of latest scraped data may be affected. So there will be a need to workaround to solve this issue while scraping itself.

References

Continue ReadingIndexing for multiscrapers in Loklak Server

Uploading Images to SUSI Server

SUSI Skill CMS is a web app to create and modify SUSI Skills. It needs API Endpoints to function and SUSI Server makes it possible. In this blogpost, we will see how to add a servlet to SUSI Server to upload images and files.

The CreateSkillService.java file is the servlet which handles the process of creating new Skills. It requires different user roles to be implemented and hence it extends the AbstractAPIHandler.

Image upload is only possible via a POST request so we will first override the doPost method in this servlet.

  @Override
  protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
  resp.setHeader("Access-Control-Allow-Origin", "*"); // enable CORS

resp.setHeader enables the CORS for the servlet. This is required as POST requests must have CORS enables from the server. This is an important security feature that is provided by the browser.

        Part file = req.getPart("image");
        if (file == null) {
            json.put("accepted", false);
            json.put("message", "Image not given");
        }

Image upload to servers is usually a Multipart Request. So we get the part which is named as “image” in the form data.

When we receive the image file, then we check if the image with the same name exists on the server or not.

Path p = Paths.get(language + File.separator + “images/” + image_name);

        if (image_name == null || Files.exists(p)) {
                json.put("accepted", false);
                json.put("message", "The Image name not given or Image with same name is already present ");
            }

If the same file is present on the server then we return an error to the user requesting to give a unique filename to upload.

Image image = ImageIO.read(filecontent);
BufferedImage bi = this.createResizedCopy(image, 512, 512, true);
if(!Files.exists(Paths.get(language.getPath() + File.separator + "images"))){
   new File(language.getPath() + File.separator + "images").mkdirs();
           }
ImageIO.write(bi, "jpg", new File(language.getPath() + File.separator + "images/" + image_name));

Then we read the content for the image in an Image object. Then we check if images directory exists or not. If there is no image directory in the skill path specified then create a folder named “images”.

We usually prefer square images at the Skill CMS. So we create a resized copy of the image of 512×512 dimensions and save that copy to the directory we created above.

BufferedImage createResizedCopy(Image originalImage, int scaledWidth, int scaledHeight, boolean preserveAlpha) {
        int imageType = preserveAlpha ? BufferedImage.TYPE_INT_RGB : BufferedImage.TYPE_INT_ARGB;
        BufferedImage scaledBI = new BufferedImage(scaledWidth, scaledHeight, imageType);
        Graphics2D g = scaledBI.createGraphics();
        if (preserveAlpha) {
            g.setComposite(AlphaComposite.Src);
        }
        g.drawImage(originalImage, 0, 0, scaledWidth, scaledHeight, null);
        g.dispose();
        return scaledBI;
    }

The function above is used to create a  resized copy of the image of specified dimensions. If the image was a PNG then it also preserves the transparency of the image while creating a copy.

Since the SUSI server follows an API centric approach, all servlets respond in JSON.

       resp.setContentType("application/json");
       resp.setCharacterEncoding("UTF-8");
       resp.getWriter().write(json.toString());’

At last, we set the character encoding and the character set of the output. This helps the clients to parse the data easily.

To see this endpoint in live send a POST request at http://api.susi.ai/cms/createSkill.json.

Resources

Apache Docs: https://commons.apache.org/proper/commons-fileupload/using.html

Multipart POST Request Tutorial: http://www.codejava.net/java-se/networking/upload-files-by-sending-multipart-request-programmatically

Java File Upload tutorial: https://ursaj.com/upload-files-in-java-with-servlet-api

Jetty Project: https://github.com/jetty-project/

Continue ReadingUploading Images to SUSI Server

GET and POST requests

If you wonder how to get or update page resource, you have to read this article.

It’s trivial if you have basic knowledge about HTTP protocol. I’d like to get you little involved to this subject.

So GET and POST are most useful methods in HTTP protocol.

What is HTTP?

Hypertext transfer protocol – allow us to communicate between client and server side. In Open Event project we use web browser as client and for now we use Heroku for server side.

Difference between GET and POST methods

GET – it allows to get data from specified resources

POST – it allows to submit new data to specified resources for example by html form.

GET samples:

For example we use it to get details about event

curl http://open-event-dev.herokuapp.com/api/v2/events/95

Response from server:

Of course you can use this for another needs, If you are a poker player I suppose that you’d like to know how many percentage you have on hand.

curl http://www.propokertools.com/simulations/show?g=he&s=generic&b&d&h1=AA&h2=KK&h3&h4&h5&h6&_

POST samples:

curl -X POST https://example.com/resource.cgi

You can often find this action in a contact page or in a login page.

How does request look in python?

We use Requests library to communication between client and server side. It’s very readable for developers. You can find great documentation  and a lot of code samples on their website. It’s very important to see how it works.

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200

I know that samples are very important, but take a look how Requests library fulfils our requirements in 100%. We have decided to use it because we would like to communicate between android app generator and orga server application. We have needed to send request with params(email, app_name, and api of event url) by post method to android generator resource. It executes the process of sending an email – a package of android application to a provided email address.

data = {
    "email": login.current_user.email,
    "app_name": self.app_name,
    "endpoint": request.url_root + "api/v2/events/" + str(self.event.id)
}
r = requests.post(self.app_link, json=data)

 

Continue ReadingGET and POST requests