First, replace Python interpreter, Request and Beautifulsoup library with Node JS interpreter, Request and Cheerio JS library.
TIP: use `–save` with npm like here while installing a library.
2) Request Library :- This is used to load webpage to be processed. Similar to one in Python.
Step#1: fetching HTML from webpage with the help of Request library.
We input url to Request function to fetch the webpage and is saved to `html` variable. This scrapeTimeAndDate() function scrapes data from html
Step#2: To scrape important data from html using Cheerio JS
list of date and time of locations is embedded in table tag, So we will iterate through <td> and extract text.
- a) Load html to Cheerio as we do in beautifulsoup
In Cheerio JS
- b) This line finds first tr tag in table tag.
- c) Iterate through td tags data by using each() function. This function acts as loop (in Python) iterating through list of elements in which data will be extracted.
- d) To extract data
Cheerio JS loads html and uses DOM model traverse through. DOM model considers html is tree. So, go to the tag, and scrape data you want.
Some other useful functions:-
1) $(selector, [context], [root])
returns object of selector(any tag) with class or id inside root
2) $(“table”).attr(name, value)
To get tag object having attribute having `value`
To get html enclosed in tags
For more just drop in here
Step#3: Execute scraper using command
- Loklak Scraper JS: https://github.com/fossasia/loklak_scraper_js
- Loklak Wok Android Project: https://github.com/fossasia/loklak_wok_android
- Node JS: https://nodejs.org/en/about/
- Cheerio JS: https://cheerio.js.org/
- Request JS: https://github.com/request/request
- BlueBird JS: http://bluebirdjs.com/docs/getting-started.html