Installing Susper Search Engine and Deploying it to Heroku

Susper is a decentralized Search Engine that uses the peer to peer system yacy and Apache Solr to crawl and index search results.

Search results are displayed using the Solr server which is embedded into YaCy. All search results must be provided by a YaCy search server which includes a Solr server with a specialized JSON result writer. When a search request is made in one of the search templates, a HTTP request is made to YaCy. The response is JSON because that can much better be parsed than XML in JavaScript.

In this blog, we will talk about how to install Susper search engine locally and deploying it to Heroku (A cloud application platform).

How to clone the repository

Sign up / Login to GitHub and head over to the Susper repository. Then follow these steps.

  1. Go ahead and fork the repository
https://github.com/fossasia/susper.com

2.   Get the clone of the forked version on your local machine using

git clone https://github.com/<username>/susper.com.git

3. Add upstream to synchronize repository using

git remote add upstream https://github.com/fossasia/susper.com.git

Getting Started

The Susper search application basically consists of the following :

  1. First, we will need to install angular-cli by using the following command:
npm install -g @angular/[email protected]

2. After installing angular-cli we need to install our required node modules, so we will do that by using the following command:

npm install

3. Deploy locally by running this

ng serve

Go to localhost:4200 where the application will be running locally.

How to Deploy Susper Search Engine to Heroku :

  1. We need to install Heroku on our machine. Type the following in your Linux terminal:
wget -O- https://toolbelt.heroku.com/install-ubuntu.sh | sh

This installs the Heroku Toolbelt on your machine to access Heroku from the command line.

  1. Create a Procfile inside root directory and write
web: ng serve
  1. Next, we need to login to our Heroku server (assuming that you have already created an account).

Type the following in the terminal:

heroku login

Enter your credentials and login.

  1. Once logged in we need to create a space on the Heroku server for our application. This is done with the following command
heroku create
  1. Add nodejs buildpack to the app
heroku buildpacks:add –index 1 heroku/nodejs
  1. Then we deploy the code to Heroku.
git push heroku master
git push heroku yourbranch:master # If you are in a different branch other than master

Resources

Installing the Loklak Search and Deploying it to Surge

The Loklak search creates a website using the Loklak server as a data source. The goal is to get a search site, that offers timeline search as well as custom media search, account and geolocation search.

In order to run the service, you can use the API of http://api.loklak.org or install your own Loklak server data storage engine. Loklak_server is a server application which collects messages from various social media tweet sources, including Twitter. The server contains a search index and a peer-to-peer index sharing interface. All messages are stored in an elasticsearch index.

The site of this repo is deployed on the GitHub gh-pages branch and automatically deployed here: http://loklak.org

In this blog, we will talk about how to install Loklak_Search locally and deploying it to Surge (Static web publishing for Front-End Developers).

How to clone the repository

Sign up / Login to GitHub and head over to the Loklak_Search repository. Then follow these steps.

  1. Go ahead and fork the repository
https://github.com/fossasia/loklak_search
  1.   Get the clone of the forked version on your local machine using
git clone https://github.com/<username>/loklak_search.git
  1.   Add upstream to synchronize repository using
git remote add upstream https://github.com/fossasia/loklak_search.git

Getting Started

The Loklak search application basically consists of the following :

  1. First, we will need to install angular-cli by using the following command:
npm install -g @angular/[email protected]

2. After installing angular-cli we need to install our required node modules, so we will do that by using the following command:

npm install

3. Deploy locally by running this

ng serve

Go to localhost:4200 where the application will be running locally.

How to Deploy Loklak Search on Surge :

Surge is the technology which publishes or generates the static web-page demo link, which makes it easier for the developer to deploy their web-app. There are a lot of benefits of using surge over generating demo link using GitHub pages.

  1. We need to install surge on our machine. Type the following in your Linux terminal:
npm install –global surge

This installs the Surge on your machine to access Surge from the command line.

  1. In your project directory just run
surge
  1. After this, it will ask you three parameters, namely
Email
Password
Domain

After specifying all these three parameters, the deployment link with the respective domain is generated.

Auto deployment of Pull Requests using Surge :

To implement the feature of auto-deployment of pull request using surge, one can follow up these steps:

  • Create a pr_deploy.sh file
  • The pr_deploy.sh file will be executed only after success of Travis CI i.e. when Travis CI passes by using command bash pr_deploy.sh
#!/usr/bin/env bash
if [ “$TRAVIS_PULL_REQUEST” == “false” ]; then
echo “Not a PR. Skipping surge deployment.”
exit 0
fi
npm i -g surge
export [email protected]
# Token of a dummy account
export SURGE_TOKEN=d1c28a7a75967cc2b4c852cca0d12206
export DEPLOY_DOMAIN=https://pr-${TRAVIS_PULL_REQUEST}-fossasia-LoklakSearch.surge.sh
surge –project ./dist –domain $DEPLOY_DOMAIN;

Here, Travis CI is first installing surge locally by npm i -g surge  and then we are exporting the environment variables SURGE_LOGIN , SURGE_TOKEN and DEPLOY_DOMAIN.

Now, execute pr_deploy.sh file from .travis.yml by using command bash pr_deploy.sh

Resources

Installing Query Server Search and Adding Search Engines

The query server can be used to search a keyword/phrase on a search engine (Google, Yahoo, Bing, Ask, DuckDuckGo and Yandex) and get the results as json or xml. The tool also stores the searched query string in a MongoDB database for analytical purposes. (The search engine scraper is based on the scraper at fossasia/searss.)

In this blog, we will talk about how to install Query-Server and implement the search engine of your own choice as an enhancement.

How to clone the repository

Sign up / Login to GitHub and head over to the Query-Server repository. Then follow these steps.

1. Go ahead and fork the repository

https://github.com/fossasia/query-server

2. Star the repository

3. Get the clone of the forked version on your local machine using

git clone https://github.com/<username>/query-server.git

4. Add upstream to synchronize repository using

git remote add upstream https://github.com/fossasia/query-server.git

Getting Started

The Query-Server application basically consists of the following :

1. Installing Node.js dependencies

npm install -g bower

bower install

2. Installing Python dependencies (Python 2.7 and 3.4+)

pip install -r requirements.txt

3. Setting up MongoDB server

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release   -sc)"/mongodb-org/3.0   multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.0.list

sudo apt-get update

sudo apt-get install -y mongodb-org

sudo service mongod start

4. Now, run the query server:

python app/server.py

Go to http://localhost:7001/

How to contribute :

Add a search engine of your own choice

You can add a search engine of your choice apart from the existing ones in application.

  • Just add or edit 4 files and you are ready to go.

For adding a search engine ( say Exalead ) :

1. Add exalead.py in app/scrapers directory :

from __future__ import print_function

from generalized import Scraper


class Exalead(Scraper): # Exalead class inheriting Scraper

    """Scrapper class for Exalead"""


    def __init__(self):

       self.url = 'https://www.exalead.com/search/web/results/'

       self.defaultStart = 0

       self.startKey = ‘start_index’


    def parseResponse(self, soup):

       """ Parses the reponse and return set of urls

       Returns: urls (list)

               [[Tile1,url1], [Title2, url2],..]

       """

       urls = []

       for a in soup.findAll('a', {'class': 'title'}): # Scrap data with the corresponding tag

           url_entry = {'title': a.getText(), 'link': a.get('href')}

           urls.append(url_entry)


       return urls

Here, scraping data depends on the tag / class from where we could find the respective link and the title of the webpage.

2. Edit generalized.py in app/scrapers directory

from __future__ import print_function

import json

import sys

from google import Google

from duckduckgo import Duckduckgo

from bing import Bing

from yahoo import Yahoo

from ask import Ask

from yandex import Yandex

from exalead import Exalead   # import exalead.py



scrapers = {

    'g': Google(),

    'b': Bing(),

    'y': Yahoo(),

    'd': Duckduckgo(),

    'a': Ask(),

    'yd': Yandex(),

    't': Exalead() # Add exalead to scrapers with index ‘t’

}

From the scrapers dictionary, we could find which search engines had supported the project.

3. Edit server.py in app directory

@app.route('/api/v1/search/<search_engine>', methods=['GET'])

def search(search_engine):

    try:

       num = request.args.get('num') or 10

       count = int(num)

       qformat = request.args.get('format') or 'json'

       if qformat not in ('json', 'xml'):

           abort(400, 'Not Found - undefined format')


       engine = search_engine

       if engine not in ('google', 'bing', 'duckduckgo', 'yahoo', 'ask', ‘yandex' ‘exalead’): # Add exalead to the tuple

           err = [404, 'Incorrect search engine', qformat]

           return bad_request(err)


       query = request.args.get('query')

       if not query:

           err = [400, 'Not Found - missing query', qformat]

           return bad_request(err)

Checking, if the passed search engine is supporting the project, or not.

4.  Edit index.html in app/templates directory

     <button type="submit" value="ask" class="btn btn-lg  search btn-outline"><img src="{{ url_for('static', filename='images/ask_icon.ico') }}" width="30px" alt="Ask Icon"> Ask</button>

     <button type="submit" value="yandex" class="btn btn-lg  search btn-outline"><img src="{{ url_for('static', filename='images/yandex_icon.png') }}" width="30px" alt="Yandex Icon"> Yandex</button>

     <button type="submit" value="exalead" class="btn btn-lg  search btn-outline"><img src="{{ url_for('static', filename='images/exalead_icon.png') }}" width="30px" alt="Exalead Icon"> Exalead</button> # Add button for exalead
  • In a nutshell,

Scrape the data using the anchor tag having specific class name.

For example, searching fossasia using exalead

https://www.exalead.com/search/web/results/?q=fossasia&start_index=1

Here, after inspecting element for the links, you will find that anchor having class name as title is having the link and title of the webpage. So, scrap data using title classed anchor tag.

Similarly, you can add other search engines as well.

Resources