URL Unshortening in Java for loklak server

There are many URL shortening services on the internet. They are useful in converting really long URLs to shorter ones. But apart from redirecting to a longer URL, they are often used to track the people visiting those links. One of the components of loklak server is its URL unshortening and redirect resolution service, which ensures that websites can’t track the users using those links and enhances the protection of privacy. How this service works in loklak. Redirect Codes in HTTP Various standards define 3XX status codes as an indication that the client must perform additional actions to complete the request. These response codes range from 300 to 308, based on the type of redirection. To check the redirect code of a request, we must first make a request to some URL - String urlstring = "http://tinyurl.com/8kmfp"; HttpRequestBase req = new HttpGet(urlstring); Next, we will configure this request to disable redirect and add a nice Use-Agent so that websites do not block us as a robot - req.setConfig(RequestConfig.custom().setRedirectsEnabled(false).build()); req.setHeader("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36"); Now we need a HTTP client to execute this request. Here, we will use Apache’s CloseableHttpClient - CloseableHttpClient httpClient = HttpClients.custom()                                    .setConnectionManager(getConnctionManager(true))                                    .setDefaultRequestConfig(defaultRequestConfig)                                    .build(); The getConnctionManager returns a pooling connection manager that can reuse the existing TCP connections, making the requests very fast. It is defined in org.loklak.http.ClientConnection. Now we have a client and a request. Let’s make our client execute the request and we shall get an HTTP entity on which we can work. HttpResponse httpResponse = httpClient.execute(req); HttpEntity httpEntity = httpResponse.getEntity(); Now that we have executed the request, we can check the status code of the response by calling the corresponding method - if (httpEntity != null) {    int httpStatusCode = httpResponse.getStatusLine().getStatusCode();    System.out.println("Status code - " + httpStatusCode); } else {    System.out.println("Request failed"); } Hence, we have the HTTP code for the requests we make. Getting the Redirect URL We can simply check for the value of the status code and decide whether we have a redirect or not. In the case of a redirect, we can check for the “Location” header to know where it redirects. if (300 <= httpStatusCode && httpStatusCode <= 308) {    for (Header header: httpResponse.getAllHeaders()) {        if (header.getName().equalsIgnoreCase("location")) {            redirectURL = header.getValue();        }    } } Handling Multiple Redirects We now know how to get the redirect for a URL. But in many cases, the URLs redirect multiple times before reaching a final, stable location. To handle these situations, we can repeatedly fetch redirect URL for intermediate links until we saturate. But we also need to take care of cyclic redirects so we set a threshold on the number of redirects that we have undergone - String urlstring = "http://tinyurl.com/8kmfp"; int termination = 10; while (termination-- > 0) {    String unshortened = getRedirect(urlstring);    if (unshortened.equals(urlstring)) {        return urlstring;    }    urlstring = unshortened; } Here, getRedirect is the method which performs single redirect for a URL and returns the same URL in case…

Continue ReadingURL Unshortening in Java for loklak server