Installing Query Server Search and Adding Search Engines
The query server can be used to search a keyword/phrase on a search engine (Google, Yahoo, Bing, Ask, DuckDuckGo and Yandex) and get the results as json or xml. The tool also stores the searched query string in a MongoDB database for analytical purposes. (The search engine scraper is based on the scraper at fossasia/searss.) In this blog, we will talk about how to install Query-Server and implement the search engine of your own choice as an enhancement. How to clone the repository Sign up / Login to GitHub and head over to the Query-Server repository. Then follow these steps. 1. Go ahead and fork the repository https://github.com/fossasia/query-server 2. Star the repository 3. Get the clone of the forked version on your local machine using git clone https://github.com/<username>/query-server.git 4. Add upstream to synchronize repository using git remote add upstream https://github.com/fossasia/query-server.git Getting Started The Query-Server application basically consists of the following : 1. Installing Node.js dependencies npm install -g bower bower install 2. Installing Python dependencies (Python 2.7 and 3.4+) pip install -r requirements.txt 3. Setting up MongoDB server sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10 echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.0.list sudo apt-get update sudo apt-get install -y mongodb-org sudo service mongod start 4. Now, run the query server: python app/server.py Go to http://localhost:7001/ How to contribute : Add a search engine of your own choice You can add a search engine of your choice apart from the existing ones in application. Just add or edit 4 files and you are ready to go. For adding a search engine ( say Exalead ) : 1. Add exalead.py in app/scrapers directory : from __future__ import print_function from generalized import Scraper class Exalead(Scraper): # Exalead class inheriting Scraper """Scrapper class for Exalead""" def __init__(self): self.url = 'https://www.exalead.com/search/web/results/' self.defaultStart = 0 self.startKey = ‘start_index’ def parseResponse(self, soup): """ Parses the reponse and return set of urls Returns: urls (list) [[Tile1,url1], [Title2, url2],..] """ urls = [] for a in soup.findAll('a', {'class': 'title'}): # Scrap data with the corresponding tag url_entry = {'title': a.getText(), 'link': a.get('href')} urls.append(url_entry) return urls Here, scraping data depends on the tag / class from where we could find the respective link and the title of the webpage. 2. Edit generalized.py in app/scrapers directory from __future__ import print_function import json import sys from google import Google from duckduckgo import Duckduckgo from bing import Bing from yahoo import Yahoo from ask import Ask from yandex import Yandex from exalead import Exalead # import exalead.py scrapers = { 'g': Google(), 'b': Bing(), 'y': Yahoo(), 'd': Duckduckgo(), 'a': Ask(), 'yd': Yandex(), 't': Exalead() # Add exalead to scrapers with index ‘t’ } From the scrapers dictionary, we could find which search engines had supported the project. 3. Edit server.py in app directory @app.route('/api/v1/search/<search_engine>', methods=['GET']) def search(search_engine): try: num = request.args.get('num') or 10 count…
