Adding Yarn as new Dependency Manager  along with NPM in Susper

Dependency managers are software modules that coordinate the integration of external libraries or packages into larger application stack. Dependency managers use configuration files like composer.json, package.json, build.gradle or pom.xml to determine: What dependency to get, What version of the dependency in particular and, Which repository to get them from. Currently SUSPER has only NPM as a dependency manager which is used to install all dependencies. In this blog, I will describe how we have added facebook’s Yarn as a new dependency manager in Susper

Lets checkout Yarn in detail:

Yarn is a fast and good alternative to NPM. One of the great advantages of Yarn is that while remaining compatible with the npm registry, it replaces the workflow for npm client or other package managers Yarn was created by Facebook, to solve some particular problems that were faced while using NPM. Yarn was developed to deal with inconsistency in dependency installation while scaling and to increase speed.

What is advantages of using Yarn?

  • Improving Network performance:Queuing up the requests and avoiding requests waterfalls helps to maximize network utilization.
  • Checks Package Integrity:Package integrity is checked after each install to avoid corrupt packages installation.
  • Checks Package Integrity:Package integrity is checked after each install to avoid corrupt packages installation.
  • Caching: Yarn helps to install the dependencies without an internet connection if the dependency has been previously installed on the system. This is done by caching.
  • Lock File: Lock files are used to make sure that the node_modules directory has the exact same structure on all development environments.

Source: https://yarnpkg.com/en/

How Yarn is installed along with NPM in SUSPER?

Installing Yarn is super easy. Here are the steps to setup Yarn along with NPM and begin using it as dependency manager.

On Debian or Ubuntu Linux, we can install Yarn via our Debian package repository. We will first need to configure the repository:

curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee 

/etc/apt/sources.list.d/yarn.list

 

Then simply use:

sudo apt-get update && sudo apt-get install yarn

 

Note: Ubuntu 17.04 comes with cmdtest installed by default. If anyone gets any errors from installing yarn, then remove it by sudo apt remove cmdtest first. Refer to this for more information.

If using nvm you can avoid the node installation by doing:

sudo apt-get install --no-install-recommends yarn

 

Test that Yarn is installed by running:

yarn --version

 

Now delete the node_modules folder so that all dependencies installed by npm is removed.

Now use yarn command in project’s repository.

yarn 

 

Wait while dependencies are installed and then we will be done.

What is happening ?

Yarn has created a lock file  yarn.lock. After each operation the file is updated (installing, updating or removing packages) to keep the track of exact package version. If kept in our Git repository we can see that the exact same result in node_modules is made available to all systems.

Resources

  1. Yarn: https://yarnpkg.com/en/
  2. Announcement of Yarn: https://code.facebook.com/posts/1840075619545360
  3. Yarn Vs NPM: https://stackoverflow.com/questions/40027819/when-to-use-yarn-over-npm-what-are-the-differences
Continue ReadingAdding Yarn as new Dependency Manager  along with NPM in Susper

Using Wikipedia API for knowledge graph in SUSPER

Knowledge Graph is way to give a brief description about search query by connecting it to a real world entity. This helps users to get information about exactly what they want. Previously Susper had a Knowledge Graph which was implemented using DBpedia API. But since DBpedia do not provide content over HTTPS connections therefore the content was blocked on susper.com and there was a need to implement the Knowledge Graph using a new API that provide contents over HTTPS. In this blog, I will describe how getting a knowledge graph was made possible using Wikipedia API.

What is Wikipedia API ?

The MediaWiki action API is a web service that provides convenient access to wiki features, data, and metadata over HTTP, via a URL usually at api.php. Clients request particular “actions” by specifying an action parameter, mainly action=query to get information.

The endpoint :

https://en.wikipedia.org/w/api.php

The format :

format=json This tells the API that we want data to be returned in JSON format.

The action :

action=query

The MediaWiki web service API implements dozens of actions and extensions implement many more; the dynamically generated API help documents all available actions on a wiki. In this case, we’re using the “query” action to get some information.

The complete API which is used in SUSPER to extract information of a query is :

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=japan

Where titles=Search_Query, here Japan

How it is implemented in SUSPER?

For implementing it a service has been created which fetches information by setting various URL parameters. This result can be fetched by creating an instance of service and passing search query to getsearchresults(searchquery) function.

export class KnowledgeapiService {
 server = 'https://en.wikipedia.org';
 searchURL = this.server + '/w/api.php?';
 homepage = 'http://susper.com';
 logo = '../images/susper.svg';
 constructor(private http: Http,
             private jsonp: Jsonp,
             private store: Store<fromRoot.State>) {
 }
 getsearchresults(searchquery) {
   let params = new URLSearchParams();
   params.set('origin', '*');
   params.set('format', 'json');
   params.set('action', 'query');
   params.set('prop', 'extracts');
   params.set('exintro', '');
   params.set('explaintext', '');
   params.set('titles', searchquery);
   let headers = new Headers({ 'Accept': 'application/json' });
   let options = new RequestOptions({ headers: headers, search: params });
   return this.http
     .get(this.searchURL, options).map(res =>
         res.json().query.pages
     ).catch(this.handleError);
}

 

Since the result obtained is an observable therefore we have to subscribe for it and then extract information to local variables in infobox.component.ts file.

export class InfoboxComponent implements OnInit {
 public title: string;
 public description: string;
 query$: any;
 resultsearch = '/search';
 constructor(private knowledgeservice: KnowledgeapiService,
             private route: Router,
             private activatedroute: ActivatedRoute,
             private store: Store<fromRoot.State>,
             private ref: ChangeDetectorRef) {
   this.query$ = store.select(fromRoot.getquery);
   this.query$.subscribe( query => {
     if (query) {
       this.knowledgeservice.getsearchresults(query).subscribe(res => {
         const pageId = Object.keys(res)[0];
         if (res[pageId].extract) {
           this.title = res[pageId].title;
           this.description = res[pageId].extract;
         } else {
           this.title = '';
           this.description = '';
         }
       });
     }
   });
 }

The variable title and description are used to display results on results page.

<div *ngIf=“this.description” class=“card”>
  <div>
    <h2><b>{{this.title}}</b></h2>
    <p>{{this.description | slice:0:600}}<a href=‘https://en.wikipedia.org/wiki/{{this.title}}’>..more at Wikipedia</a></p>
  </div>
</div>

Resources

1.MediaWiki API : https://www.mediawiki.org/wiki/API:Main_page

2.Stackoverflow : https://stackoverflow.com/questions/8555320/is-there-a-clean-wikipedia-api-just-for-retrieve-content-summary

3.Angular Docs : https://angular.io/tutorial/toh-pt4

Continue ReadingUsing Wikipedia API for knowledge graph in SUSPER

Integrating YaCy Grid Locally with Susper

The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine.The search results can be improved to a great extent by using YaCy-Grid as the new backend for SUSPER. YaCy Grid is the best choice for distributed search topology. The legacy YaCy is made for decentralised and also distributed network. While both the networks are distributed,the YaCy-Grid is centralized and legacy YaCy is decentralized. YaCy Grid facilitates a lot with scaling that will be in our hand and can be done in all aspects​(loading, parsing, indexing) with computing power we choose. In YaCy,Solr is embedded. But in YaCy Grid,we will get elasticsearch cluster.​They are both built around the core underlying search library Lucene.But ​elasticsearch will help us to scale almost indefinitely. In this blog, I will show you how to integrate YaCy Grid with Susper locally and how to use it to fetch results.

Implementing YaCy Grid with Susper:

Before using YaCy Grid we need to first setup YaCy Grid and crawl to url using crawl start API, more information about that can be found here Implementing YaCy Grid with Susper and Setting up YaCy Grid locally.

So, once we are done with setup and crawling, we need to begin using its APIs in Susper. Following are some easy steps in which we can show results from YaCy Grid in a separate tab is Susper.

Step 1:

Creating a service to fetch results:

In order to fetch results from local YaCy Grid server we need to create a service to fetch results from local YaCy Grid server. Here is the class in grid-service.ts which fetches results for us.

export class GridSearchService {
 server = 'http://127.0.0.1:8100';
 searchURL = this.server + '/yacy/grid/mcp/index/yacysearch.json?query=';
 constructor(private http: Http,
             private jsonp: Jsonp,
             private store: Store<fromRoot.State>) {
 }
 getSearchResults(searchquery) { 
   return this.http
     .get(this.searchURL+searchquery).map(res =>
         res.json()
     ).catch(this.handleError);
 }

 

Step 2:

Modifying results.component.ts file

In order to get results from grid-service.ts in results.component.ts we must need to create an instance of the service and use this instance to get the results and store it in variables results.component.ts file and then use these variables to show results in results template. Following is the code that does this for us

ngOnInit() {
   this.grid.getSearchResults(this.searchdata.query).subscribe(res=>{
     this.gridResult=res.channels;
   });
 }

 

gridClick(){
   this.getPresentPage(1);
   this.resultDisplay = 'grid';
   this.totalgridresults=this.gridResult[0].totalResults;
   this.gridmessage='About ' + this.totalgridresults + ' results';
   this.gridItems=this.gridResult[0].items;
  
   console.log(this.gridItems);
 }

 

Step 3:

Creating a New tab to show results from YaCy Grid:

Now we need to create a tab in the template where we can use local variables in results.component.ts to show the results following the current design pattern here is the code for that

<li [class.active_view]="Display('grid')" (click)="gridClick()">YaCy_Grid</li>

<!--YaCy Grid-->
 <div class="container-fluid">
     <div class="result message-bar" *ngIf="totalgridresults > 0 && Display('grid')">
       {{gridmessage}}
     </div>
     <div class="autocorrect">
       <app-auto-correct [hidden]="hideAutoCorrect"></app-auto-correct>
     </div>
   </div>
 <div class="grid-result" *ngIf="Display('grid')">
   <div class="feed container">
       <div *ngFor="let item of gridItems" class="result">
         <div class="title">
           <a class="title-pointer" href="{{item.link}}" [style.color]="themeService.titleColor">{{item.title}}</a>
         </div>
         <div class="link">
           <p [style.color]="themeService.linkColor">{{item.link}}</p>
         </div>
         <div class="description">
           <p [style.color]="themeService.descriptionColor">{{item.pubDate|date:'MMMM d, yyyy'}} - {{item.description}}</p>
         </div>
       </div>
   </div>
 </div>
 <!-- END -->

 

Step 4:

Starting YaCy Grid Locally:

Now all we need is to start YaCy Grid server locally. To start it go in yacy_grid_mcp folder and use

python bin/start_elasticsearch.py

 

This will start elasticsearch from its respective script.Next use

python bin/start_rabbitmq.py

 

This will start RabbitMQ server with the required configuration.Next useThis will start elasticsearch from its respective script.Next use

gradle run

 

To start YaCy Grid locally.

Now we are all done we just need to start Susper using

ng serve

 

command and type a search query and move to YaCy_Grid tab to see results from YaCy Grid Server.

Here is the image which shows results from YaCy Grid in Susper

Resources

Continue ReadingIntegrating YaCy Grid Locally with Susper

Removing vulnerable dependencies from SUSPER

A vulnerability is a problem in a project’s code that could be exploited to damage the confidentiality, integrity, or availability of the project or other projects that use its code. Depending on the severity level and the way your project uses the dependency, vulnerabilities can cause a range of problems for your project or the people who use it.GitHub tracks public vulnerabilities in Ruby gems and NPM packages on MITRE’s Common Vulnerabilities and Exposures (CVE) List.

What were  vulnerabilities in SUSPER ?

SUSPER was having vulnerability in Gemfile.lock, Gemfile.lock makes our application a single package of both your own code and the third-party code it ran the last time you know for sure that everything worked. Specifying exact versions of the third-party code you depend on in your Gemfile would not provide the same guarantee, because gems usually declare a range of versions for their dependencies.

What were vulnerable dependencies in Gemfile.lock ?

Two dependency namely Nokogiri and Yajl-Ruby were having security vulnerability.

Nokogiri is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors whereas

Yajl-Ruby gem is a C binding to the excellent YAJL JSON parsing and generation library. Older versions of both the dependencies were having security vulnerability.

Security alerts for a vulnerable dependency in our repository include a severity level and a link to the affected file in our project. When available, the alerts also include a link to the CVE record and a suggested fix.

What was the suggested fix ?

One way to fix this problem was to update the vulnerable dependencies to latest versions.

The versions of Nokogiri and Yajl-Ruby which were used in SUSPER are:

Nokogiri (~>1.5)

Yajl-Ruby (1.1.0)

What are the best ways to update dependencies without breaking

the project ?

The best way to update a dependency is to check where those dependencies are used in project and what are breaking changes which are introduced within the dependencies.

How vulnerable dependencies were updated ?

Firstly we updated the Bundler the tool we use to update our gems in Gemfile.lock,from version 1.13.6 to 1.16.0.

We then updated Nokogiri dependency and other sub dependencies using  bundle update nokogiri i.e:

mini_portile2 (2.1.0) -> mini_portile2 (2.3.0)

nokogiri (1.6.8.1) ->nokogiri (1.8.2)

Then we checked the project for integrity , and the project was working well.

We then tried to update Yajl-Ruby, but there was a problem in updating Yajl-Ruby,

We later found that Yajl-Ruby was replaced by many other dependencies.

We therefore updated whole Gemfile.lock . Following are two simple steps to update Gemfile.lock

bundle update

bundle install

 

We later checked that whether the new dependencies do not break the current project and we found that there were no breaking changes involved in updated dependencies.

Security alerts for vulnerable dependencies list the affected dependency and, in some cases, use machine learning to suggest a fix from the GitHub community. By default, we receive a weekly email summarizing security alerts for up to 10 of our repositories. We can choose to receive security alerts individually by email, in a daily digest email, in our web notifications, or in the GitHub user interface.

Resources

Continue ReadingRemoving vulnerable dependencies from SUSPER

Setting up YaCy Grid locally

SUSPER is a search interface that uses P2P search engine YaCy . Search results are displayed using Solr server which is embedded into YaCy. The retrieval of search results is done using YaCy search API. When a search request is made in one of the search templates, an HTTP request is made to YaCy and the response is done in JSON. In this blog post I will show how to setup YaCy Grid locally.

What is YaCy Grid ?

The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine. The required storage functions of the YaCy Grid are:

  1.  An asset storage, basically a file sharing environment for YaCy components,an ftp server is used for asset storage.
  2.  A message system providing an Enterprise Integration Framework using a message-oriented middleware,RabbitMQ message queues for the message system.
  3.  A database system providing search-engine related retrieval functions.It uses Elasticsearch for database operations.

How to setup YaCy Grid locally ?

YaCy Grid have 4 components MCP(Master Connect Program), Loader, Crawler and  Parser.

  1. Clone all the components using –recursive flag.

git clone --recursive https://github.com/yacy/yacy_grid_mcp.git
git clone --recursive https://github.com/yacy/yacy_grid_parser.git
git clone --recursive https://github.com/yacy/yacy_grid_crawler.git
git clone --recursive https://github.com/yacy/yacy_grid_loader.git
  1.  Now to starting YaCy Grid requires starting Elasticsearch, RabbitMQ with Username `anonymous` and Password `yacy` and an ftp server(it can be omitted as MCP can take over).
  2.  All the above steps can also be done in a single step by running a python script in `bin` folder `run_all.py`
  3.  Working of `run_all.py` in yacy_grid_mcp:

if not checkportopen(9200):
   print "Elasticsearch is not running"
   mkapps()
   elasticversion = 'elasticsearch-5.6.5'
   if not os.path.isfile(path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz'):
       print('Downloading ' + elasticversion)
       urllib.urlretrieve ('https://artifacts.elastic.co/downloads/elasticsearch/' + elasticversion + '.tar.gz', path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz')
   if not os.path.isdir(path_apphome + '/data/mcp-8100/apps/elasticsearch'):
       print('Decompressing' + elasticversion)
       os.system('tar xfz ' + path_apphome + '/data/mcp-8100/apps/' + elasticversion + '.tar.gz -C ' + path_apphome + '/data/mcp-8100/apps/')
       os.rename(path_apphome + '/data/mcp-8100/apps/' + elasticversion, path_apphome + '/data/mcp-8100/apps/elasticsearch')
   # run elasticsearch
   print('Running Elasticsearch')
   os.chdir(path_apphome + '/data/mcp-8100/apps/elasticsearch/bin')
   os.system('nohup ./elasticsearch &')

 

  • Checks whether Elasticsearch is running or not, if not then runs Elasticsearch.
if checkportopen(15672):
   print "RabbitMQ is Running"
   print "If you have configured it according to YaCy setup press N"
   print "If you have not configured it according to YaCy setup or Do not know what to do press Y"
   n=raw_input()
   if(n=='Y' or n=='y'):
       os.system('service rabbitmq-server stop')
       
if not checkportopen(15672):
   print "rabbitmq is not running"
   os.system('python bin/start_rabbitmq.py')
  • Checks whether RabbitMQ is running or not, if yes then asks user to configure it according to YaCy Grid setup by pressing Y or else ignore,if not then starts RabbitMQ according to required configuration.
subprocess.call('bin/update_all.sh')
  • .Updates all the Grid components including MCP.

if not checkportopen(2121):
   print "ftp server is not Running"
  • Checks for an ftp server and prints message accordingly.

def run_mcp():
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

def run_loader():
   os.system('cd ../yacy_grid_loader')
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

def run_crawler():
   os.system('cd ../yacy_grid_crawler')
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

def run_parser():
   os.system('cd ../yacy_grid_parser')
   subprocess.call(['gnome-terminal', '-e', "gradle run"])

 

  • Runs all components of YaCy Grid in separate terminal.

Once user starts it, then he can start using YaCy Grid through terminal.

If a YaCy Grid service has used the MCP once, it learns from the MCP to connect to the infrastructure itself. For example:

  • a YaCy Grid service starts up and connects to the MCP
  • the Grid service pushes a message to the message queue using the MCP
  • the MCP fulfils the message send operation and response with the actual address of the message broker
  • the YaCy Grid service learns the direct connection information
  • whenever the YaCy Grid service wants to connect to the message broker again, it can do so using a direct broker connection. This process is done transparently, the Grid service does not need to handle such communication details itself. The routing is done automatically. To use the MCP inside other grid components the git submodule functionality is used.

Resources

Continue ReadingSetting up YaCy Grid locally