Setting up YaCy Grid locally
SUSPER is a search interface that uses P2P search engine YaCy . Search results are displayed using Solr server which is embedded into YaCy. The retrieval of search results is done using YaCy search API. When a search request is made in one of the search templates, an HTTP request is made to YaCy and the response is done in JSON. In this blog post I will show how to setup YaCy Grid locally.
What is YaCy Grid ?
The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine. The required storage functions of the YaCy Grid are:
- An asset storage, basically a file sharing environment for YaCy components,an ftp server is used for asset storage.
- A message system providing an Enterprise Integration Framework using a message-oriented middleware,RabbitMQ message queues for the message system.
- A database system providing search-engine related retrieval functions.It uses Elasticsearch for database operations.
How to setup YaCy Grid locally ?
YaCy Grid have 4 components MCP(Master Connect Program), Loader, Crawler and Parser.
- Clone all the components using –recursive flag.
git clone --recursive https://github.com/yacy/yacy_grid_mcp.git git clone --recursive https://github.com/yacy/yacy_grid_parser.git git clone --recursive https://github.com/yacy/yacy_grid_crawler.git git clone --recursive https://github.com/yacy/yacy_grid_loader.git
- Now to starting YaCy Grid requires starting Elasticsearch, RabbitMQ with Username `anonymous` and Password `yacy` and an ftp server(it can be omitted as MCP can take over).
- All the above steps can also be done in a single step by running a python script in `bin` folder `run_all.py`
- Working of `run_all.py` in yacy_grid_mcp:
if not checkportopen(9200): print "Elasticsearch is not running" mkapps() elasticversion = 'elasticsearch-5.6.5' if not os.path.isfile(path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz'): print('Downloading ' + elasticversion) urllib.urlretrieve ('https://artifacts.elastic.co/downloads/elasticsearch/' + elasticversion + '.tar.gz', path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz') if not os.path.isdir(path_apphome + '/data/mcp-8100/apps/elasticsearch'): print('Decompressing' + elasticversion) os.system('tar xfz ' + path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz -C ' + path_apphome + '/data/mcp-8100/apps/') os.rename(path_apphome + '/data/mcp-8100/apps/' + elasticversion, path_apphome + '/data/mcp-8100/apps/elasticsearch') # run elasticsearch print('Running Elasticsearch') os.chdir(path_apphome + '/data/mcp-8100/apps/elasticsearch/bin') os.system('nohup ./elasticsearch &')
- Checks whether Elasticsearch is running or not, if not then runs Elasticsearch.
if checkportopen(15672): print "RabbitMQ is Running" print "If you have configured it according to YaCy setup press N" print "If you have not configured it according to YaCy setup or Do not know what to do press Y" n=raw_input() if(n=='Y' or n=='y'): os.system('service rabbitmq-server stop') if not checkportopen(15672): print "rabbitmq is not running" os.system('python bin/start_rabbitmq.py')
- Checks whether RabbitMQ is running or not, if yes then asks user to configure it according to YaCy Grid setup by pressing Y or else ignore,if not then starts RabbitMQ according to required configuration.
subprocess.call('bin/update_all.sh')
- .Updates all the Grid components including MCP.
if not checkportopen(2121): print "ftp server is not Running"
- Checks for an ftp server and prints message accordingly.
def run_mcp(): subprocess.call(['gnome-terminal', '-e', "gradle run"]) def run_loader(): os.system('cd ../yacy_grid_loader') subprocess.call(['gnome-terminal', '-e', "gradle run"]) def run_crawler(): os.system('cd ../yacy_grid_crawler') subprocess.call(['gnome-terminal', '-e', "gradle run"]) def run_parser(): os.system('cd ../yacy_grid_parser') subprocess.call(['gnome-terminal', '-e', "gradle run"])
- Runs all components of YaCy Grid in separate terminal.
Once user starts it, then he can start using YaCy Grid through terminal.
If a YaCy Grid service has used the MCP once, it learns from the MCP to connect to the infrastructure itself. For example:
- a YaCy Grid service starts up and connects to the MCP
- the Grid service pushes a message to the message queue using the MCP
- the MCP fulfils the message send operation and response with the actual address of the message broker
- the YaCy Grid service learns the direct connection information
- whenever the YaCy Grid service wants to connect to the message broker again, it can do so using a direct broker connection. This process is done transparently, the Grid service does not need to handle such communication details itself. The routing is done automatically. To use the MCP inside other grid components the git submodule functionality is used.
Resources
- YaCy Grid repository https://github.com/yacy/yacy_grid_mcp
- SUSPER repository https://github.com/fossasia/susper.com
- PR for run_all.py https://github.com/yacy/yacy_grid_mcp/pull/31
- Connecting YaCy Grid to SUSPER https://github.com/fossasia/susper.com/issues/999
- Use of subprocess in Python2 https://stackoverflow.com/questions/30266166/how-do-you-run-multiple-files-in-multiple-terminal-windows-using-python
You must be logged in to post a comment.