SUSPER is a search interface that uses P2P search engine YaCy . Search results are displayed using Solr server which is embedded into YaCy. The retrieval of search results is done using YaCy search API. When a search request is made in one of the search templates, an HTTP request is made to YaCy and the response is done in JSON. In this blog post I will show how to setup YaCy Grid locally.
What is YaCy Grid ?
The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine. The required storage functions of the YaCy Grid are:
- An asset storage, basically a file sharing environment for YaCy components,an ftp server is used for asset storage.
- A message system providing an Enterprise Integration Framework using a message-oriented middleware,RabbitMQ message queues for the message system.
- A database system providing search-engine related retrieval functions.It uses Elasticsearch for database operations.
How to setup YaCy Grid locally ?
YaCy Grid have 4 components MCP(Master Connect Program), Loader, Crawler and Parser.
- Clone all the components using –recursive flag.
- Now to starting YaCy Grid requires starting Elasticsearch, RabbitMQ with Username `anonymous` and Password `yacy` and an ftp server(it can be omitted as MCP can take over).
- All the above steps can also be done in a single step by running a python script in `bin` folder `run_all.py`
- Working of `run_all.py` in yacy_grid_mcp:
- Checks whether Elasticsearch is running or not, if not then runs Elasticsearch.
- Checks whether RabbitMQ is running or not, if yes then asks user to configure it according to YaCy Grid setup by pressing Y or else ignore,if not then starts RabbitMQ according to required configuration.
- .Updates all the Grid components including MCP.
- Checks for an ftp server and prints message accordingly.
- Runs all components of YaCy Grid in separate terminal.
Once user starts it, then he can start using YaCy Grid through terminal.
If a YaCy Grid service has used the MCP once, it learns from the MCP to connect to the infrastructure itself. For example:
- a YaCy Grid service starts up and connects to the MCP
- the Grid service pushes a message to the message queue using the MCP
- the MCP fulfils the message send operation and response with the actual address of the message broker
- the YaCy Grid service learns the direct connection information
- whenever the YaCy Grid service wants to connect to the message broker again, it can do so using a direct broker connection. This process is done transparently, the Grid service does not need to handle such communication details itself. The routing is done automatically. To use the MCP inside other grid components the git submodule functionality is used.
- YaCy Grid repository:https://github.com/yacy/yacy_grid_mcp
- SUSPER repository:https://github.com/fossasia/susper.com
- PR for run_all.py:https://github.com/yacy/yacy_grid_mcp/pull/31
- Connecting YaCy Grid to SUSPER https://github.com/fossasia/susper.com/issues/999
- Use of subprocess in Python2:https://stackoverflow.com/questions/30266166/how-do-you-run-multiple-files-in-multiple-terminal-windows-using-python