query-server – blog.fossasia.org

Installing Query Server Search and Adding Search Engines

Post author:Bhavesh Anand
Post published:November 20, 2017
Post category:CodeHeat FOSSASIA Tutorial
Post comments:0 Comments

The query server can be used to search a keyword/phrase on a search engine (Google, Yahoo, Bing, Ask, DuckDuckGo and Yandex) and get the results as json or xml. The tool also stores the searched query string in a MongoDB database for analytical purposes. (The search engine scraper is based on the scraper at fossasia/searss.)

In this blog, we will talk about how to install Query-Server and implement the search engine of your own choice as an enhancement.

How to clone the repository

1. Go ahead and fork the repository

https://github.com/fossasia/query-server

2. Star the repository

3. Get the clone of the forked version on your local machine using

git clone https://github.com/<username>/query-server.git

4. Add upstream to synchronize repository using

git remote add upstream https://github.com/fossasia/query-server.git

Getting Started

The Query-Server application basically consists of the following :

1. Installing Node.js dependencies

npm install -g bower

bower install

2. Installing Python dependencies (Python 2.7 and 3.4+)

pip install -r requirements.txt

3. Setting up MongoDB server

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release   -sc)"/mongodb-org/3.0   multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.0.list

sudo apt-get update

sudo apt-get install -y mongodb-org

sudo service mongod start

4. Now, run the query server:

python app/server.py

Go to http://localhost:7001/

How to contribute :

Add a search engine of your own choice

You can add a search engine of your choice apart from the existing ones in application.

Just add or edit 4 files and you are ready to go.

For adding a search engine ( say Exalead ) :

1. Add exalead.py in app/scrapers directory :

from __future__ import print_function

from generalized import Scraper


class Exalead(Scraper): # Exalead class inheriting Scraper

    """Scrapper class for Exalead"""


    def __init__(self):

       self.url = 'https://www.exalead.com/search/web/results/'

       self.defaultStart = 0

       self.startKey = ‘start_index’


    def parseResponse(self, soup):

       """ Parses the reponse and return set of urls

       Returns: urls (list)

               [[Tile1,url1], [Title2, url2],..]

       """

       urls = []

       for a in soup.findAll('a', {'class': 'title'}): # Scrap data with the corresponding tag

           url_entry = {'title': a.getText(), 'link': a.get('href')}

           urls.append(url_entry)


       return urls

Here, scraping data depends on the tag / class from where we could find the respective link and the title of the webpage.

2. Edit generalized.py in app/scrapers directory

from __future__ import print_function

import json

import sys

from google import Google

from duckduckgo import Duckduckgo

from bing import Bing

from yahoo import Yahoo

from ask import Ask

from yandex import Yandex

from exalead import Exalead   # import exalead.py



scrapers = {

    'g': Google(),

    'b': Bing(),

    'y': Yahoo(),

    'd': Duckduckgo(),

    'a': Ask(),

    'yd': Yandex(),

    't': Exalead() # Add exalead to scrapers with index ‘t’

}

From the scrapers dictionary, we could find which search engines had supported the project.

3. Edit server.py in app directory

@app.route('/api/v1/search/<search_engine>', methods=['GET'])

def search(search_engine):

    try:

       num = request.args.get('num') or 10

       count = int(num)

       qformat = request.args.get('format') or 'json'

       if qformat not in ('json', 'xml'):

           abort(400, 'Not Found - undefined format')


       engine = search_engine

       if engine not in ('google', 'bing', 'duckduckgo', 'yahoo', 'ask', ‘yandex' ‘exalead’): # Add exalead to the tuple

           err = [404, 'Incorrect search engine', qformat]

           return bad_request(err)


       query = request.args.get('query')

       if not query:

           err = [400, 'Not Found - missing query', qformat]

           return bad_request(err)

Checking, if the passed search engine is supporting the project, or not.

4. Edit index.html in app/templates directory

     <button type="submit" value="ask" class="btn btn-lg  search btn-outline"><img src="{{ url_for('static', filename='images/ask_icon.ico') }}" width="30px" alt="Ask Icon"> Ask</button>

     <button type="submit" value="yandex" class="btn btn-lg  search btn-outline"><img src="{{ url_for('static', filename='images/yandex_icon.png') }}" width="30px" alt="Yandex Icon"> Yandex</button>

     <button type="submit" value="exalead" class="btn btn-lg  search btn-outline"><img src="{{ url_for('static', filename='images/exalead_icon.png') }}" width="30px" alt="Exalead Icon"> Exalead</button> # Add button for exalead

In a nutshell,

Scrape the data using the anchor tag having specific class name.

For example, searching fossasia using exalead

https://www.exalead.com/search/web/results/?q=fossasia&start_index=1

Here, after inspecting element for the links, you will find that anchor having class name as title is having the link and title of the webpage. So, scrap data using title classed anchor tag.

Similarly, you can add other search engines as well.

Resources

Beautifulsoup docs: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Exalead search engine: https://www.exalead.com/search/web/
Exalead search for FOSSASIA: https://www.exalead.com/search/web/results/?q=fossasia&start_index=1

Make Flask Fast and Reliable – Simple Steps

Post author:S2606
Post published:October 24, 2017
Post category:CodeHeat Community FOSSASIA Open Event
Post comments:0 Comments

Flask is a microframework for Python, which is mostly used in web-backend development.There are projects in FOSSASIA that are using flask for development purposes such as Open Event Server, Query Server, Badgeyay. Optimization is indeed one of the most important steps for a successful software product. So, in this post some few off- the-hook tricks will be shown which will make your flask-app more fast and reliable.

Flask-Compress

Flask-Compress is a python package which basically provides de-facto lossless compression to your Flask application.
Enough with the theory, now let’s understand the coding part:
1. First install the module

2. Then for a basic setup

3.That’s it! All it takes is just few lines of code to make your flask app optimized .To know more about the module check out flask-compress module.

Requirements Directory

A common practice amongst different FOSSASIA projects which involves dividing requirements.txt files for development,testing as well as production.
Basically when projects either use TRAVIS CI for testing or are deployed to Cloud Services like Heroku, there are some modules which are not really required at some places. For example: gunicorn is only required for deployment purposes and not for development.
So how about we have a separate directory wherein different .txt files are created for different purposes.
Below is the image of file directory structure followed for requirements in badgeyay project.

As you can see different .txt files are created for different purposes
1. dev.txt – for development
2. prod.txt – for production(i.e. deployment)
3. test.txt – for testing.

Resources

Flask documentation: http://flask.pocoo.org/docs/
Optimizing Flask App: https://damyanon.net/flask-series-optimizations/
Badgeyay project: https://github.com/fossasia/badgeyay