Crawl Job Feature For Susper To Index Websites

The Yacy backend provides search results for Susper using a web crawler (or) spider to crawl and index data from the internet. They also require some minimum input from the user.

As stated by Michael Christen (@Orbiter) “a web index is created by loading a lot of web pages first, then parsing the content and placing the result into a search index. The question is: how to get a large list of URLs? This is solved by a crawler: we start with a single web page, extract all links, then load these links and go on. The root of such a process is the ‘Crawl Start’.”

Yacy has a web crawler module that can be accessed from here: http://yacy.searchlab.eu/CrawlStartExpert.html. As we would like to have a fully supported front end for Yacy, we also introduced a crawler in Susper. Using crawler one could tell Yacy what process to do and how to crawl a URL to index search results on Yacy server. To support the indexing of web pages with the help of Yacy server, we had implemented a ‘Crawl Job’ feature in Susper.

1)Visit http://susper.com/crawlstartexpert and give information regarding the sites you want Susper to crawl.Currently, the crawler accepts an input of URLs or a file containing URLs. You could customise crawling process by tweaking crawl parameters like crawling depth, maximum pages per domain, filters, excluding media etc.

2) Once crawl parameters are set, click on ‘Start New Crawl Job’ to start the crawling process.

3) It will raise a basic authentication pop-up. After filling, the user will receive a success alert and will be redirected back to home page.

The process of crawl job on Yacy server will get started according to crawling parameters.

Implementation of Crawler on Susper:

We have created a separate component and service in Susper for Crawler

Source code can be found at:

When the user initiates the crawl job by pressing the start button, it calls startCrawlJob() function from the component and this indeed calls the CrawlStart service.We send crawlvalues to the service and subscribe, to the return object confirming whether the crawl job has started or not.

crawlstart.component.ts:-

startCrawlJob() {
 this.crawlstartservice.startCrawlJob(this.crawlvalues).subscribe(res => {
   alert('Started Crawl Job');
   this.router.navigate(['/']);
 }, (err) => {
   if (err === 'Unauthorized') {
     alert("Authentication Error");
   }
 });
};

 

After calling startCrawlJob() function from the service file, the service file creates a URLSearchParams object to create parameters for each key in input and send it to Yacy server through JSONP request.

crawlstart.service.ts

startCrawlJob(crawlvalues) {
 let params = new URLSearchParams();
 for (let key in crawlvalues) {
   if (crawlvalues.hasOwnProperty(key)) {
     params.set(key, crawlvalues[key]);
   }

 }
 params.set('callback', 'JSONP_CALLBACK');


 let options = new RequestOptions({ search: params });
 return this.jsonp
   .get('http://yacy.searchlab.eu/Crawler_p.json', options).map(res => {
     res.json();
   });

}

Resources:

Continue Reading

Using @Output EventEmitter to Hide Search Suggestions in Angular for Susper Web App

Problem: In Susper the suggestions box doesn’t hide when there are no suggestions. To fix this, we have used @Output to create interaction between the search bar and suggestions box.

Susper gives suggestions to the user when user types a query. These suggestions are retrieved from the suggest.json endpoint from Yacy server.

We have a separate component for searching a query and a separate component for showing suggestions (auto-complete.component.ts). The architectural link between the query box, suggestion box and the results page is a bit complicated.

The search bar and the auto-complete component doesn’t interact directly. Whenever a new query is entered, the search bar triggers an action with a payload including the query. On receiving the new query, auto-complete component calls Yacy server to get suggestions from the endpoint and display them inside the suggestion box. Whenever a user searches making a new query, the search bar implementation opens the suggestion box even if there are no results. So there should be a way to inform search bar component that suggestions box has received empty results and search bar could hide the suggestions box.

To achieve this we used @Output to emit an event

@Output() hidecomponent: EventEmitter<any> = new EventEmitter<any>();

autocomplete.component.ts:-

this.autocompleteservice.getsearchresults(query).subscribe(res => {
 if (res) {
   if (res[0]) {
     this.results = res[1];
     if (this.results.length === 0) {
       this.hidecomponent.emit(1);
     } else {
       this.hidecomponent.emit(0);
     }
}

 

Then in search bar component, this is binded to a function hidesuggestions() which takes care of hiding the suggestion box.

searchbar.component.html

<app-auto-complete (hidecomponent)="hidesuggestions($event)" id="auto-box" [hidden]="!ShowAuto()"></app-auto-complete>

 

searchbar.component.ts

hidesuggestions(data: number) {
 if (data === 1) {
   this.displayStatus = 'hidebox';
 } else {
   this.displayStatus = 'showbox';
 }
}
ShowAuto() {
 return (this.displayStatus === 'showbox');
}

 

Here you could see that the auto-complete component’s hidden attribute in searchbar.component.ts is binded with ShowAuto() function which takes care about the interaction and hides the suggestions box whenever there are no results.

Below a GIF shows how this suggestions feature is working on Susper

Source code related to this implementation is available at this pull

References:

Continue Reading

Multiple Page Rendering on a Single Query in Susper Angular Front-end

Problem: Susper used to render a new results page for each new character input. It should render a single page for the final query as reported in issue 371. For instance, the browser’s back button shows five pages for each of the five characters entered as a query.

Solution: This problem was arising due to code:

this.router.navigate(['/search'], {queryParams: this.searchdata});

Before we have this one line in search-bar component which gets called on each character entry

Fix:To fix this issue we required calling router.navigate only when we receive results and not on each character input.

So, we first removed the line which was cause of this issue from search-bar component and replaced it with

this.store.dispatch(new queryactions.QueryServerAction(query));

 

This triggers a QueryServer action, and make a request to Yacy end point for search results.

Now in app.component.ts , we get subscribed to resultscomponentchange$ which gets called only when new search results are received and hence we navigate to a new page after the resultscomponentchange subscription is called.

this.resultscomponentchange$ = store.select(fromRoot.getItems);
this.resultscomponentchange$.subscribe(res => {
 if (this.searchdata.query.length > 0) {
   this.router.navigate(['/search'], {queryParams: this.searchdata});
 }

});
this.wholequery$ = store.select(fromRoot.getwholequery);
this.wholequery$.subscribe(data => {
 this.searchdata = data;
});
if (localStorage.getItem('resultscount')) {
 this.store.dispatch(new queryactions.QueryServerAction({'query': '', start: 0, rows: 10, search: false}));
}

 

 

Finally, this problem got fixed and now there is only one page being rendered for a valid search. Source code for this implementation is available in this pull.

Resources:

Continue Reading

Using RouterLink in the Susper Angular Frontend to Speed up the Loading Time

In Susper, whenever the user clicks on some links, the whole application used to load again, thereby taking more time to load the page. But in Single Page Applications (SPAs) we don’t need to load the whole application. In Fact, SPAs are known to load internal pages faster than traditional HTML web pages. To achieve this we have to inform the application that a link will redirect the user to an internal page. So that the application doesn’t reload completely and reinitializes itself. In angular, this can be done by replacing href with routerLink for the tag.

Routerlink when used with tag syntactically as

<a routerLink="/contact" routerLinkActive="active">Contact</a>

doesn’t load the whole page instead it asks the server for only the contact component and renders it in place of <router-outlet></router-outlet>

This happens through an ajax call to the server asking for only contact component, thereby reducing the time it takes and doesn’t show a whole complete reload of the page.

Below time graph shows requests made when a tag with href was clicked.

If you observe it takes more than 3 seconds to load the page.

But when you use [routerLink] as an attribute for navigation, you find the page being displayed in just a blink.

What we have done in Susper?

In Susper, on issue #167, @mariobehling has noticed that there are some links which are loading slowly. On looking at the issue and a test run of the issue, I found that the problem is with the loading of the whole page, thereby immediately checked with the tag and found that a “href” attribute was used instead of “[routerLink]” angular attribute. I made a pull changing href to “[routerLink]” thereby speeding up Susper to around 3x faster than before.

https://github.com/fossasia/susper.com/pull/234/files

References

Continue Reading

How we implemented an InfoBox similar to Google in Susper

Research Work: This was initially proposed by @mariobehling , https://github.com/fossasia/susper.com/issues/181, where he proposed an idea of building an infobox similar to Google or Duckduckgo.

Later Michael Christen 0rb1t3r referenced DBpedia API, which can get a structured data from Wikipedia information.

One example of using the DBpedia API is: http://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=place&QueryString=berlin

More information about the structured Knowledge Graphs is available at https://en.wikipedia.org/wiki/Knowledge_Graph

Implementation:

We created an infobox component to display the data related to infobox https://github.com/fossasia/susper.com/tree/master/src/app/infobox

It takes care about rendering the information, styling of the rendered data retrieved from the DBpedia API

Infobox.component.html :

<div *ngIf="results?.length > 0" class="card">

<div>

<h2><b>{{this.results[0].label}}</b></h2>

<p>{{this.results[0].description}}</p>

</div>

<div class="card-container">

<h3><b>Related Searches</b></h3>




<div *ngFor="let result of results">

   <a [routerLink]="resultsearch" [queryParams]="{query: result.label}">{{result.label}}</a>

</div>

</div>

</div>

The infobox.component.ts makes a call to Knowledge service with the required query, and the knowledge service makes a get request to the DBpedia API and retrieves the results.

infobox.component.ts

this.query$.subscribe( query => {

if (query) {

   this.knowledgeservice.getsearchresults(query).subscribe(res => {

     if (res.results) {

       this.results = res.results;

     }

   });

 }






knowledeapi.service.ts

getsearchresults(searchquery) {




let params = new URLSearchParams();

params.set('QueryString', searchquery);




let headers = new Headers({ 'Accept': 'application/json' });

let options = new RequestOptions({ headers: headers, search: params });

return this.http

   .get(this.searchURL, options).map(res =>




     res.json()




   ).catch(this.handleError);




}

For passing params in an HTTP object, we should create URLSearchParams() object, set the parameters in it, and send them as RequestOptions in http.get method. If you observe the line let headers = new Headers({ ‘Accept’: ‘application/json’ }); . we informed the API to send us data in JSON format.

Thereby finally the infobox component retrieves the results and displays them on susper.

Whole code for this implementation could be found in this pull:

https://github.com/fossasia/susper.com/pull/288

Continue Reading

Calling an API in Angular: Using Ngrx/Redux Architecture and Yacy API for Susper

Initially, in Susper we retrieved data from Yacy using a service in Angular 2, but later we introduced redux architecture, which resolved many issues and also made the code structured. In the past when Web APIs were not standardised people used to make their own architecture to implement each functionality. Web APIs have simplified the process of sending a query to an external server and standardised the process of sharing one’s own work with others.

The rise of Internet and mobile content in the recent past has resulted in many developers decoupling the front end and back end of their projects by exposing APIs that they create so that Android and iOS devices can interact with them using Web APIs. If you are new to building web APIs, A good place would be to look is here https://zapier.com/learn/apis/chapter-1-introduction-to-apis/ .

To understand how the Susper front end implements API calls from Yacy, it’s essential to understand the ngrx redux architecture inspired by react redux which helps manage the state. In case you are new to redux, please go through this to learn more about it and look at this sample app before proceeding with the rest of this blog post

In Susper we have implemented a front end for peer-to-peer decentralised Search Engine Yacy using Yacy Search API.

The services here are very similar to the angular services that seasoned angular js developers are familiar with. This service implementation in the project is responsible for making the calls to the API whenever a query is made.

https://github.com/fossasia/susper.com/blob/master/src/app/search.service.ts

where we implemented a searchService –

 

getsearchresults(searchquery) {

let params = new URLSearchParams();

for (let key in searchquery) {

if (searchquery.hasOwnProperty(key)) {

params.set(key, searchquery[key]);

}

}

params.set('wt', 'yjson');

params.set('callback', 'JSONP_CALLBACK');

params.set('facet', 'true');

params.set('facet.mincount', '1');

params.append('facet.field', 'host_s');

params.append('facet.field', 'url_protocol_s');

params.append('facet.field', 'author_sxt');

params.append('facet.field', 'collection_sxt');

return this.jsonp

.get('http://yacy.searchlab.eu/solr/select', {search: params}).map(res =>

res.json()[0]

).catch(this.handleError);

}

Now that you have seen the above service it contains JSONP_CALLBACK as a parameter, which tells the server “Hey Yacy, I can understand JSON, so you could communicate or send me data in JSON”. Some servers need one to send a header Accept: application/json

*JSONP is JSON with padding, that is, you put a string at the beginning and a pair of parenthesis around it*

so what about Redux where have we used it then? Basically, every redux based project will have an action and a reducer for each state in the store. Especially for search implementation we have our reducers and actions at https://github.com/fossasia/susper.com/blob/master/src/app/reducers

And

https://github.com/fossasia/susper.com/blob/master/src/app/actions

Now going through the architecture when a user types something in the search bar a call to the action query is made this.store.dispatch(new query.QueryServerAction(event.target.value));

Which is of type QUERYSERVER

 

export class QueryServerAction implements Action {

type = ActionTypes.QUERYSERVER;

constructor(public payload: any) {}

}

export type Actions

= QueryAction|QueryServerAction ;

 

Now on the above action below effect gets called

@Injectable()

export class ApiSearchEffects {

@Effect()

search$: Observable<any>

= this.actions$

.ofType(query.ActionTypes.QUERYSERVER)

.debounceTime(300)

.map((action: query.QueryServerAction) => action.payload)

.switchMap(querypay => {

if (querypay === '') {

return empty();

}

const nextSearch$ = this.actions$.ofType(query.ActionTypes.QUERYSERVER).skip(1);

this.searchService.getsearchresults(querypay)

.takeUntil(nextSearch$)

.subscribe((response) => {

this.store.dispatch(new search.SearchAction(response));

return empty();

});

return empty();

});

If you check with the above lines it makes a call to the service we have built before. that is: searchService.getsearchresults()

On response, it dispatches the response as a payload to SearchAction

On receiving the payload with the searchAction, the searchReducer takes off the responsibility and stores the payload in a state in the store.

export function reducer(state: State = initialState, action: search.Actions): State {

switch (action.type) {

case search.ActionTypes.CHANGE: {

const search = action.payload;

return Object.assign({}, state, {

searchresults: search,

items: search.channels[0].items,

totalResults: Number(search.channels[0].totalResults) || 0,

navigation: search.channels[0].navigation,

});

}

default: {

return state;

}

}

}

Thereby results could be displayed in the results page by subscribing to the store as in this.items$ = store.select(fromRoot.getItems);

Why should we do this all why not a direct service call? There are two reasons to use ngrx store along with ngrx effects.

  1. Using a store the search results will be available to all components.
  2. When we have implemented an instant search, a query call to the server goes for each character input, thereby if the response from the server is not in order, it leads to different results. For instance, if one searches for ‘India’ they might get results shown for ‘Ind’. Which was faced by us while developing the server https://github.com/fossasia/susper.com/issues/256 where “When a user searches, there is a search performed while typing. The search results that are often shown do not always reflect the final search term, they show a result that appeared while the user was typing it in.” we solved this issue using takeUntil(nextSearch$) in the search-effect.ts
Continue Reading

How to make your Website as a default Search Engine

A huge number of users are forcefully made to use predefined search engines on their browsers. On the Firefox browser, you usually see a search box at the top right. In Chrome, you can simply put in anything into the URL bar and it will go to the standard search engine. The browser companies predefined these search boxes, e.g. Google on Chrome, Firefox depending on which language version you have Yahoo or another and on Internet Explorer/Edge it is Bing. This shuts out new and independent searches like our Susper search engine. However, we want to help users and provide them with a choice.

 

At Susper, we integrated a small toast modal which allows users to make Susper their default search engine. They can simply go to susper.com and they will see a small option to change their search engine.

 Search box on Firefox

Implementation

We have Implemented this feature in a simple three-step procedure.

1.Generate Plugin for your search engine

create a plugin for your search engine at:-

http://mycroftproject.com/submitos.html

checked in with “show full instructions” checkbox at the top of the form to understand what those fields are.

Finally, a plugin got generated in XML format that we have to distribute to our user base.

2.Distributing the plugin to our user base

We at susper.com have implemented a toast onto the right bottom of the website’s homepage.

Where, when the user clicks on install Susper, the button triggers and displays an alert box.

Check the box “make this the current search engine”, and make the Susper search engine as your default search engine.

 

Implementation Code:-

 

 

We first check whether the search engine is installed already using window.external.IsSearchProviderInstalled , if not we show the toast for the user. When the user clicks Install button, this will call the window.external.AddSearchProvider API and installs susper using that.

 

<div id="set-susper-default">

<h3>Set Susper as your default search engine on Mozilla!</h3>

<ol>

<li><button id="install-susper">Install susper</button></li>

<li>Mark the checkbox to set Susper as your default search engine</li>

<li>Start searching!</li>

</ol>

<button id="cancel-installation">Cancel</button>

</div>

<script>

$(document).ready(function () {

if (window.external && window.external.IsSearchProviderInstalled) {

var isInstalled = window.external.IsSearchProviderInstalled("http://susper.com");

if (!isInstalled) {

$("#set-susper-default").show();

}

}

$("#install-susper").on("click", function () {

window.external.AddSearchProvider("http://susper.com/susper.xml");

});

$("#cancel-installation").on("click", function () {

$("#set-susper-default").remove();

});

});

</script>

In this way, we are able to give users an option to choose Susper as a default search engine.

More details regarding the implementation of this feature in susper could be checked at this pull https://github.com/fossasia/susper.com/pull/62 .

Continue Reading
  • 1
  • 2
Close Menu