Loklak Server isn’t just a scraper system software, it provides numerous other services to perform other interesting functions like Link Unshortening (reverse of link shortening) and video fetching and administrative tasks like status fetching of the Loklak deployment (for analysis in Loklak development use) and many more. Some of these are internally implemented and rest can be used through http endpoints. Also there are some services which aren’t complete and are in development stage.
Let’s go through some of them to know a bit about them and how they can be used.
1) VideoUrlService
This is the service to extract video from the website that has a streaming video and output the video file link. This service is in development stage and is functional. Presently, It can fetch twitter video links and output them with different video qualities.
Endpoint: /api/videoUrlService.json
Implementation Example:
curl api/loklak.org/api/videoUrlService.json?id=https://twitter.com/EXOGlobal/status/886182766970257409&id=https://twitter.com/KMbappe/status/885963850708865025
2) Link Unshortening Service
This is the service used to unshorten the link. There are shortened URLs which are used to track the Internet Users by Websites. To prevent this, link unshortening service unshortens the link and returns the final untrackable link to the user.
Currently this service is in application in TwitterScraper to unshorten the fetched URLs. It has other methods to get Redirect Link and also a link to get final URL from multiple unshortened link.
Implementation Example from TwitterScraper.java [LINK]:
Matcher m = timeline_link_pattern.matcher(text); if (m.find()) { String expanded = RedirectUnshortener.unShorten(m.group(2)); text = m.replaceFirst(" " + expanded); continue; }
Further it can be used to as a service and can be used directly. New features like fetching featured image from links can be added to this service. Though these stuff are in discussion and enthusiastic contribution is most welcomed.
3) StatusService
This is a service that outputs all data related to to Loklak Server deployment’s configurations. To access this configuration, api endpoint status.json is used.
It outputs the following data:
a) About the number of messages it scrapes in an interval of a second, a minute, an hour, a day, etc.
b) The configuration of the server like RAM, assigned memory, used memory, number of cores of CPU, cpu load, etc.
c) And other configurations related to the application like size of ElasticSearch shards size and their specifications, client request header, number of running threads, etc.
Endpoint: /api/status.json
Implementation Example:
curl api/loklak.org/api/status.json
Resources:
- Code URL Shortener: https://stackoverflow.com/questions/742013/how-to-code-a-url-shortener
- URL Shortening-Hashing in Practice: https://blog.codinghorror.com/url-shortening-hashes-in-practice/
- ElasticSearch: https://www.elastic.co/webinars/getting-started-elasticsearch?elektra=home&storm=sub1
- M3U8 format: https://www.lifewire.com/m3u8-file-2621956
- Fetch Video using PHP: https://stackoverflow.com/questions/10896233/how-can-i-retrieve-youtube-video-details-from-video-url-using-php