elasticsearch – blog.fossasia.org

Integrating YaCy Grid Locally with Susper

Post author:praveenojha33
Post published:July 16, 2018
Post category:FOSSASIA
Post comments:0 Comments

The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine.The search results can be improved to a great extent by using YaCy-Grid as the new backend for SUSPER. YaCy Grid is the best choice for distributed search topology. The legacy YaCy is made for decentralised and also distributed network. While both the networks are distributed,the YaCy-Grid is centralized and legacy YaCy is decentralized. YaCy Grid facilitates a lot with scaling that will be in our hand and can be done in all aspects(loading, parsing, indexing) with computing power we choose. In YaCy,Solr is embedded. But in YaCy Grid,we will get elasticsearch cluster.They are both built around the core underlying search library Lucene.But elasticsearch will help us to scale almost indefinitely. In this blog, I will show you how to integrate YaCy Grid with Susper locally and how to use it to fetch results.

Implementing YaCy Grid with Susper:

Before using YaCy Grid we need to first setup YaCy Grid and crawl to url using crawl start API, more information about that can be found here Implementing YaCy Grid with Susper and Setting up YaCy Grid locally.

So, once we are done with setup and crawling, we need to begin using its APIs in Susper. Following are some easy steps in which we can show results from YaCy Grid in a separate tab is Susper.

Step 1:

Creating a service to fetch results:

In order to fetch results from local YaCy Grid server we need to create a service to fetch results from local YaCy Grid server. Here is the class in grid-service.ts which fetches results for us.

export class GridSearchService {
 server = 'http://127.0.0.1:8100';
 searchURL = this.server + '/yacy/grid/mcp/index/yacysearch.json?query=';
 constructor(private http: Http,
             private jsonp: Jsonp,
             private store: Store<fromRoot.State>) {
 }
 getSearchResults(searchquery) { 
   return this.http
     .get(this.searchURL+searchquery).map(res =>
         res.json()
     ).catch(this.handleError);
 }

Step 2:

Modifying results.component.ts file

In order to get results from grid-service.ts in results.component.ts we must need to create an instance of the service and use this instance to get the results and store it in variables results.component.ts file and then use these variables to show results in results template. Following is the code that does this for us

ngOnInit() {
   this.grid.getSearchResults(this.searchdata.query).subscribe(res=>{
     this.gridResult=res.channels;
   });
 }

gridClick(){
   this.getPresentPage(1);
   this.resultDisplay = 'grid';
   this.totalgridresults=this.gridResult[0].totalResults;
   this.gridmessage='About ' + this.totalgridresults + ' results';
   this.gridItems=this.gridResult[0].items;
  
   console.log(this.gridItems);
 }

Step 3:

Creating a New tab to show results from YaCy Grid:

Now we need to create a tab in the template where we can use local variables in results.component.ts to show the results following the current design pattern here is the code for that

<li [class.active_view]="Display('grid')" (click)="gridClick()">YaCy_Grid</li>

<!--YaCy Grid-->
 <div class="container-fluid">
     <div class="result message-bar" *ngIf="totalgridresults > 0 && Display('grid')">
       {{gridmessage}}
     </div>
     <div class="autocorrect">
       <app-auto-correct [hidden]="hideAutoCorrect"></app-auto-correct>
     </div>
   </div>
 <div class="grid-result" *ngIf="Display('grid')">
   <div class="feed container">
       <div *ngFor="let item of gridItems" class="result">
         <div class="title">
           <a class="title-pointer" href="{{item.link}}" [style.color]="themeService.titleColor">{{item.title}}</a>
         </div>
         <div class="link">
           <p [style.color]="themeService.linkColor">{{item.link}}</p>
         </div>
         <div class="description">
           <p [style.color]="themeService.descriptionColor">{{item.pubDate|date:'MMMM d, yyyy'}} - {{item.description}}</p>
         </div>
       </div>
   </div>
 </div>
 <!-- END -->

Step 4:

Starting YaCy Grid Locally:

Now all we need is to start YaCy Grid server locally. To start it go in yacy_grid_mcp folder and use

python bin/start_elasticsearch.py

This will start elasticsearch from its respective script.Next use

python bin/start_rabbitmq.py

This will start RabbitMQ server with the required configuration.Next useThis will start elasticsearch from its respective script.Next use

gradle run

To start YaCy Grid locally.

Now we are all done we just need to start Susper using

ng serve

command and type a search query and move to YaCy_Grid tab to see results from YaCy Grid Server.

Here is the image which shows results from YaCy Grid in Susper

Resources

Adding Susper with YaCy Grid Link to Commit
Implementing Susper with YaCy Grid Link to Issue
Setting up YaCy Grid locally Link to Blog
YaCy Grid Repository Link to Repository
YaCy Grid running with Susper Link to Video

Setting up YaCy Grid locally

Post author:praveenojha33
Post published:June 12, 2018
Post category:FOSSASIA GSoC
Post comments:0 Comments

SUSPER is a search interface that uses P2P search engine YaCy . Search results are displayed using Solr server which is embedded into YaCy. The retrieval of search results is done using YaCy search API. When a search request is made in one of the search templates, an HTTP request is made to YaCy and the response is done in JSON. In this blog post I will show how to setup YaCy Grid locally.

What is YaCy Grid ?

The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine. The required storage functions of the YaCy Grid are:

An asset storage, basically a file sharing environment for YaCy components,an ftp server is used for asset storage.
A message system providing an Enterprise Integration Framework using a message-oriented middleware,RabbitMQ message queues for the message system.
A database system providing search-engine related retrieval functions.It uses Elasticsearch for database operations.

How to setup YaCy Grid locally ?

YaCy Grid have 4 components MCP(Master Connect Program), Loader, Crawler and Parser.

Clone all the components using –recursive flag.

git clone --recursive https://github.com/yacy/yacy_grid_mcp.git
git clone --recursive https://github.com/yacy/yacy_grid_parser.git
git clone --recursive https://github.com/yacy/yacy_grid_crawler.git
git clone --recursive https://github.com/yacy/yacy_grid_loader.git

Now to starting YaCy Grid requires starting Elasticsearch, RabbitMQ with Username `anonymous` and Password `yacy` and an ftp server(it can be omitted as MCP can take over).
All the above steps can also be done in a single step by running a python script in `bin` folder `run_all.py`
Working of `run_all.py` in yacy_grid_mcp:

if not checkportopen(9200):
   print "Elasticsearch is not running"
   mkapps()
   elasticversion = 'elasticsearch-5.6.5'
   if not os.path.isfile(path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz'):
       print('Downloading ' + elasticversion)
       urllib.urlretrieve ('https://artifacts.elastic.co/downloads/elasticsearch/' + elasticversion + '.tar.gz', path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz')
   if not os.path.isdir(path_apphome + '/data/mcp-8100/apps/elasticsearch'):
       print('Decompressing' + elasticversion)
       os.system('tar xfz ' + path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz -C ' + path_apphome + '/data/mcp-8100/apps/')
       os.rename(path_apphome + '/data/mcp-8100/apps/' + elasticversion, path_apphome + '/data/mcp-8100/apps/elasticsearch')
   # run elasticsearch
   print('Running Elasticsearch')
   os.chdir(path_apphome + '/data/mcp-8100/apps/elasticsearch/bin')
   os.system('nohup ./elasticsearch &')

Checks whether Elasticsearch is running or not, if not then runs Elasticsearch.

if checkportopen(15672):
   print "RabbitMQ is Running"
   print "If you have configured it according to YaCy setup press N"
   print "If you have not configured it according to YaCy setup or Do not know what to do press Y"
   n=raw_input()
   if(n=='Y' or n=='y'):
       os.system('service rabbitmq-server stop')
       
if not checkportopen(15672):
   print "rabbitmq is not running"
   os.system('python bin/start_rabbitmq.py')

Checks whether RabbitMQ is running or not, if yes then asks user to configure it according to YaCy Grid setup by pressing Y or else ignore,if not then starts RabbitMQ according to required configuration.

subprocess.call('bin/update_all.sh')

.Updates all the Grid components including MCP.

if not checkportopen(2121):
   print "ftp server is not Running"

Checks for an ftp server and prints message accordingly.

def run_mcp():
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

def run_loader():
   os.system('cd ../yacy_grid_loader')
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

def run_crawler():
   os.system('cd ../yacy_grid_crawler')
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

def run_parser():
   os.system('cd ../yacy_grid_parser')
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

Runs all components of YaCy Grid in separate terminal.

Once user starts it, then he can start using YaCy Grid through terminal.

If a YaCy Grid service has used the MCP once, it learns from the MCP to connect to the infrastructure itself. For example:

a YaCy Grid service starts up and connects to the MCP
the Grid service pushes a message to the message queue using the MCP
the MCP fulfils the message send operation and response with the actual address of the message broker
the YaCy Grid service learns the direct connection information
whenever the YaCy Grid service wants to connect to the message broker again, it can do so using a direct broker connection. This process is done transparently, the Grid service does not need to handle such communication details itself. The routing is done automatically. To use the MCP inside other grid components the git submodule functionality is used.

Resources

YaCy Grid repository https://github.com/yacy/yacy_grid_mcp
SUSPER repository https://github.com/fossasia/susper.com
PR for run_all.py https://github.com/yacy/yacy_grid_mcp/pull/31
Connecting YaCy Grid to SUSPER https://github.com/fossasia/susper.com/issues/999
Use of subprocess in Python2 https://stackoverflow.com/questions/30266166/how-do-you-run-multiple-files-in-multiple-terminal-windows-using-python

Open Event Server: Creating/Rebuilding Elasticsearch Index From Existing Data In a PostgreSQL DB Using Python

Post author:shubham-padia
Post published:November 1, 2017
Post category:FOSSASIA Open Event
Post comments:0 Comments

The Elasticsearch instance in the current Open Event Server deployment is currently just used to store the events and search through it due to limited resources.

The project uses a PostgreSQL database, this blog will focus on setting up a job to create the events index if it does not exist. If the indices exists, the job will delete all the previous the data and rebuild the events index.

Although the project uses Flask framework, the job will be in pure python so that it can run in background properly while the application continues its work. Celery is used for queueing up the aforementioned jobs. For building the job the first step would be to connect to our database:

from config import Config
import psycopg2
conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
cur = conn.cursor()

The next step would be to fetch all the events from the database. We will only be indexing certain attributes of the event which will be useful in search. Rest of them are not stored in the index. The code given below will fetch us a collection of tuples containing the attributes mentioned in the code:

cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()

We will be using the the bulk API, which is significantly fast as compared to adding an event one by one via the API. Elasticsearch-py, the official python client for elasticsearch provides the necessary functionality to work with the bulk API of elasticsearch. The helpers present in the client enable us to use generator expressions to insert the data via the bulk API. The generator expression for events will be as follows:

event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)

We will now delete the events index if it exists. The the event index will be recreated. The generator expression obtained above will be passed to the bulk API helper and the event index will repopulated. The complete code for the function will now be as follows:

@celery.task(name='rebuild.events.elasticsearch')
def cron_rebuild_events_elasticsearch():
   """
   Re-inserts all eligible events into elasticsearch
   :return:
   """
   conn = psycopg2.connect(Config.SQLALCHEMY_DATABASE_URI)
   cur = conn.cursor()
   cur.execute(
       "SELECT id, name, description, searchable_location_name, organizer_name, organizer_description FROM events WHERE state = 'published' and deleted_at is NULL ;")
   events = cur.fetchall()
   event_data = ({'_type': 'event',
                  '_index': 'events',
                  '_id': event_[0],
                  'name': event_[1],
                  'description': event_[2] or None,
                  'searchable_location_name': event_[3] or None,
                  'organizer_name': event_[4] or None,
                  'organizer_description': event_[5] or None}
                 for event_ in events)
   es_store.indices.delete('events')
   es_store.indices.create('events')
   abc = helpers.bulk(es_store, event_data)

Currently we run this job on each week and also on each new deployment. Rebuilding the index is very important as some records may not be indexed when the continuous sync is taking place.

To know more about it please visit https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/

Indexing for multiscrapers in Loklak Server

Post author:Vibhor Verma
Post published:September 4, 2017
Post category:FOSSASIA loklak
Post comments:0 Comments

I recently added multiscraper system which can scrape data from web-scrapers like YoutubeScraper, QuoraScraper, GithubScraper, etc. As scraping is a costly task, it is important to improve it’s efficiency. One of the approach is to index data in cache. TwitterScraper uses multiple sources to optimize the efficiency.

This system uses Post message holder object to store data and PostTimeline (a specialized iterator) to iterate the data objects. This difference in data structures from TwitterScraper leads to the need of different approach to implement indexing of data to ElasticSearch (currently in review process).

These are the following changes I made while implementing ‘indexing of data’ in the project.

1) Writing of data is invoked only using PostTimeline iterator

In TwitterScraper, the data is written in message holder TwitterTweet. So all the tweets are written to index as they are created. Here, when the data is scraped, Writing of the posts is initiated. Scraping of data is considered a heavy process. This approach keeps lower resource usage in average traffic on the server.

protected Post putData(Post typeArray, String key, Timeline2 postList) {
   if(!"cache".equals(this.source)) {
       postList.writeToIndex();
   }
   return this.putData(typeArray, key, postList.toArray());
}

2) One object for holding a message

During the implementation, I kept the same message holder Post and post-iterator PostTimeline from scraping to indexing of data. This helps to keep the structure uniform. Earlier approach involves different types of message wrappers in the way. This approach cuts the processes for looping and transitioning of data structures.

3) Index a list, not a message

In TwitterScraper, as the messages are enqueued in the bulk to be indexed. But in this approach, I have enqueued the complete lists. This approach delays the indexing till the scraper is done with processing the html.

Creating the queue of postlists:

// Add post-lists to queue to be indexed
queueClients.incrementAndGet();
try {
    postQueue.put(postList);
} catch (InterruptedException e) {
DAO.severe(e);
}
queueClients.decrementAndGet();

Indexing of the posts in postlists:

// Start indexing of data in post-lists
for (Timeline2 postList: postBulk) {
    if (postList.size() < 1) continue;
    if(postList.dump) {
        // Dumping of data in a file
        writeMessageBulkDump(postList);
    }
    // Indexing of data to ElasticSearch
    writeMessageBulkNoDump(postList);
}

4) Categorizing the input parameters

While searching the index, I have divided the query parameters from scraper into 3 categories. The input parameters are added to those categories (implemented using map data structure) and thus data fetched are according to them. These categories are:

// Declaring the QueryBuilder
BoolQueryBuilder query = new BoolQueryBuilder();

a) Get the parameter– Get the results for the input fields in map getMap.

// Result must have these fields. Acts as AND operator
if(getMap != null) {
    for(Map.Entry<String, String> field : getMap.entrySet()) {
        query.must(QueryBuilders.termQuery(
field.getKey(), field.getValue()));
    }
}

b) Don’t get the parameter- Don’t get the results for the input fields in map notGetMap.

// Result must not have these fields.
if(notGetMap != null) {
    for(Map.Entry<String, String> field : notGetMap.entrySet()) {
        query.mustNot(QueryBuilders.termQuery(
                field.getKey(), field.getValue()));
    }
}

c) Get if possible- Get the results with the input fields if they are present in the index.

// Result may preferably also get these fields. Acts as OR operator
if(mayAlsoGetMap != null) {
    for(Map.Entry<String, String> field : mayAlsoGetMap.entrySet()) {
        query.should(QueryBuilders.termQuery(
                field.getKey(), field.getValue()));

    }
}

By applying these changes, the scrapers are shifted from a message indexing to list of messages indexing. This way we are keeping load on RAM low, but the aggregation of latest scraped data may be affected. So there will be a need to workaround to solve this issue while scraping itself.

References

Match query with “operator”:”and” via the Java API: https://discuss.elastic.co/t/match-query-with-operator-and-via-the-java-api/67863/2
How to use BoolQueryBuilder: https://stackoverflow.com/questions/40923945/how-to-add-bool-query-inside-a-should-must-method-in-java-api

Creating A Dockerfile For Yacy Grid MCP

Post author:harshit98
Post published:August 22, 2017
Post category:FOSSASIA GSoC
Post comments:1 Comment

The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine. A YaCy Grid installation consists of a set of micro-services which communicate with each other using a common infrastructure for data persistence. The task was to deploy the second-generation of YaCy Grid. To do so, we first had created a Dockerfile. This dockerfile should start the micro services such as rabbitmq, Apache ftp and elasticsearch in one docker instance along with MCP. The microservices perform following tasks:

Apache ftp server for asset storage.
RabbitMQ message queues for the message system.
Elasticsearch for database operations.

To launch these microservices using Dockerfile, we referred to following documentations regarding running these services locally: https://github.com/yacy/yacy_grid_mcp/blob/master/README.md

For creating a Dockerfile we proceeded as follows:

FROMubuntu:latest
MAINTAINERHarshit Prasad# Update
RUNapt-get update
RUNapt-get upgrade -y# add packages
 # install jdk package for java
RUN apt-get install -y git openjdk-8-jdk
#install gradle required for build

RUN apt-get update && apt-get install -y software-properties-common

RUN add-apt-repository ppa:cwchien/gradle

RUN apt-get update

RUN apt-get install -y wget

RUN wget https://services.gradle.org/distributions/gradle-3.4.1-bin.zip

RUN mkdir /opt/gradle

RUN apt-get install -y unzip

RUN unzip -d /opt/gradle gradle-3.4.1-bin.zip

RUN PATH=$PATH:/opt/gradle/gradle-3.4.1/bin

ENV GRADLE_HOME=/opt/gradle/gradle-3.4.1

ENV PATH=$PATH:$GRADLE_HOME/bin

RUN gradle -v
# install apache ftp server 1.1.0

RUN wget http://www-eu.apache.org/dist/mina/ftpserver/1.1.0/dist/apache-ftpserver-1.1.0.tar.gz

RUN tar xfz apache-ftpserver-1.1.0.tar.gz
# install RabbitMQ server

RUN wget https://www.rabbitmq.com/releases/rabbitmq-server/v3.6.6/rabbitmq-server-generic-unix-3.6.6.tar.xz

RUN tar xf rabbitmq-server-generic-unix-3.6.6.tar.xz
# install erlang language for RabbitMQ

RUN apt-get install -y erlang
# install elasticsearch

RUN wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.0.tar.gz

RUN sha1sum elasticsearch-5.5.0.tar.gz

RUN tar -xzf elasticsearch-5.5.0.tar.gz
# clone yacy_grid_mcp repository

RUN git clone https://github.com/nikhilrayaprolu/yacy_grid_mcp.git

WORKDIR /yacy_grid_mcp
RUN cat docker/config–ftp.properties > ../apache–ftpserver–1.1.0/res/conf/users.properties
# compile

RUN gradle build

RUN mkdir data/mcp-8100/conf/ -p

RUN cp docker/config-mcp.properties data/mcp-8100/conf/config.properties

RUN chmod +x ./docker/start.sh
# Expose web interface ports

 # 2121: ftp, a FTP server to be used for mass data / file storage

 # 5672: rabbitmq, a rabbitmq message queue server to be used for global messages, queues and stacks

 # 9300: elastic, an elasticsearch server or main cluster address for global database storage

 EXPOSE 2121 5672 9300 9200 15672 8100
# Define default command.

ENTRYPOINT [“/bin/bash”, “./docker/start.sh”]

We have created a start.sh file to start RabbitMQ and Apache FTP services. At the end, for compilation gradle run will be executed.

adduser –disabled-password –gecos ” r

 adduser r sudo

echo ‘%sudo ALL=(ALL) NOPASSWD:ALL’ >> /etc/sudoers

 chmod a+rwx /elasticsearch-5.5.0 -R

 su -m r -c ‘/elasticsearch-5.5.0/bin/elasticsearch -Ecluster.name=yacygrid &’

cd /apache–ftpserver–1.1.0

./bin/ftpd.sh res/conf/ftpd–typical.xml &

/rabbitmq_server-3.6.6/sbin/rabbitmq-server -detached

sleep 5s;

/rabbitmq_server-3.6.6/sbin/rabbitmq-plugins enable rabbitmq_management

/rabbitmq_server–3.6.6/sbin/rabbitmqctl add_user yacygrid password4account

echo [{rabbit, [{loopback_users, []}]}]. >> /rabbitmq_server-3.6.6/etc/rabbitmq/rabbitmq.config

/rabbitmq_server-3.6.6/sbin/rabbitmqctl set_permissions -p / yacygrid “.*” “.*” “.*”

cd /yacy_grid_mcp

sleep 5s;

gradle run

start.sh will first add username and then password. Then it will start RabbitMQ along with Apache FTP. For username and password, we have created a separate files to configure their properties during Docker run which can be found here:

Configuration of FTP server: https://github.com/yacy/yacy_grid_mcp/blob/master/docker/config-ftp.properties
Configuration of MCP service: https://github.com/yacy/yacy_grid_mcp/blob/master/docker/config-mcp.properties

The logic behind running all the microservices in one docker instance was: creating each container for microservice and then link those containers with the help of docker-compose.yml file.

The Dockerfile which we have created was corresponding to one image. Another image was elasticsearch which was linked to this Dockerfile. The latest version of elasticsearch image was already available on their site: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

We configured the docker-compose.yml file according to the reference link provided above. The docker-compose file can be found here: https://github.com/yacy/yacy_grid_mcp/blob/master/docker/docker-compose.yml

The source code for the implementation of whole structure can be found here: https://github.com/yacy/yacy_grid_mcp/tree/master/docker

Resources

Dockerfile official documentation: https://docs.docker.com/engine/reference/builder/
Docker Tutorial series by Romin Irani: https://rominirani.com/docker-tutorial-series-a7e6ff90a023

Deploying loklak Server on Kubernetes with External Elasticsearch

Post author:Pratyush
Post published:August 7, 2017
Post category:FOSSASIA GSoC loklak
Post comments:1 Comment

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

– kubernetes.io

Kubernetes is an awesome cloud platform, which ensures that cloud applications run reliably. It runs automated tests, flawless updates, smart roll out and rollbacks, simple scaling and a lot more.

So as a part of GSoC, I worked on taking the loklak server to Kubernetes on Google Cloud Platform. In this blog post, I will be discussing the approach followed to deploy development branch of loklak on Kubernetes.

New Docker Image

Since Kubernetes deployments work on Docker images, we needed one for the loklak project. The existing image would not be up to the mark for Kubernetes as it contained the declaration of volumes and exposing of ports. So I wrote a new Docker image which could be used in Kubernetes.

The image would simply clone loklak server, build the project and trigger the server as CMD –

FROM alpine:latest

ENV LANG=en_US.UTF-8
ENV JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

WORKDIR /loklak_server

RUN apk update && apk add openjdk8 git bash && \
    git clone https://github.com/loklak/loklak_server.git /loklak_server && \
    git checkout development && \
    ./gradlew build -x test -x checkstyleTest -x checkstyleMain -x jacocoTestReport && \
    # Some Configurations and Cleanups

CMD ["bin/start.sh", "-Idn"]

Resources

Resources

1) Writing of data is invoked only using PostTimeline iterator

2) One object for holding a message

3) Index a list, not a message

4) Categorizing the input parameters

References

Resources

New Docker Image

Building and Pushing Docker Image using Travis

Kubernetes Configurations for loklak

Readiness and Liveness Probes

Ports and Volumes

Load Balancer Service

Kubernetes Configurations for Elasticsearch

Docker Image and Environment Variables

Persistent Index using Persistent Cloud Disk

Exposing Kubernetes to Cluster

Connecting loklak to Kubernetes

Verifying persistence of the Elasticsearch Index

Conclusion

Resources

Fields to Consider while Caching

Type of Cache

Key

Value

Timeout

Cache Hit

Filtering results

Cache Miss

Conclusion

Resources

Application

When is data indexing done?

1) Data is scraped:

2) Data is fetched from backend:

How?

Resources:

1) VideoUrlService

2) Link Unshortening Service

3) StatusService

Resources:

Structure of index

Requesting aggregations using Elasticsearch Java API

Performing simple aggregation for a classifier

Writing AggregationBuilder

Processing results

Performing nested aggregations for different countries

Writing a nested aggregation builder

Processing the results

Conclusion

Resources