This node module supports scraping several search engines.
Right now scraping for
* Google
* Google News
* Google News New (https://news.google.com)
* Google Image
* Bing
* Baidu
* Youtube
* Infospace
* Duckduckgo
* Webcrawler
is supported.
This module uses puppeteer. It was created by the Developer of https://github.com/NikolaiT/GoogleScraper, a module with 1800 Stars on Github.
### Technical Notes
Scraping is done with a headless chromium browser using the automation library puppeteer. Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.
No multithreading is supported for now. Only one scraping worker per `scrape()` call.
If you need to deploy scraping to the cloud (AWS or Azure), you can contact me on hire@incolumitas.com
### Installation and Usage
Install with
```bash
npm install se-scraper
```
Use se-scraper by calling it with a script such as the one below.
```javascript
const se_scraper = require('se-scraper');
let config = {
// the user agent to scrape with
user_agent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
// if random_user_agent is set to True, a random user agent is chosen
'Coding, Learning and Business Ideas – Tutorial: Youtube scraping ...',
snippet:
'29.10.2018 - In this blog post I am going to show you how to scrape YouTube video data using the handy puppeteer library. Puppeteer is a Node library...',
title: 'Coding, Learning and Business Ideas – Tag Scraping',
snippet:
'Scraping Amazon Reviews using Headless Chrome Browser and Python3. Posted on Mi ... GoogleScraper Tutorial - How to scrape 1000 keywords with Google.',
'Title: Scraping and Extracting Links from any major Search Engine like Google, Yandex, Baidu, Bing and Duckduckgo Date: 2014-11-12 00:47 Author: Nikolai...',
'Scraping links on Google, Yandex, Bing, Duckduckgo, Baidu and ...',
snippet:
'12.11.2014 - Scraping links on Google, Yandex, Bing, Duckduckgo, Baidu and other search engines with Python ... submitted 4 years ago by incolumitas.',
'Embed Tweet. How to use GoogleScraper to scrape images and download them ... Learn how to scrape millions of url from yandex and google or bing with:...',
'Best Open Source Web Scraping Frameworks and Tools - ScrapeHero',
snippet:
'05.06.2018 - List of Open Source Web Scraping Frameworks. Scrapy. MechanicalSoup. PySpider. Portia. Apify SDK. Nodecrawler. Selenium WebDriver. Puppeteer.',
'Best Data Scraping Tools for 2018 (Top 10 Reviews) – Data Driven ...',
snippet:
'05.03.2018 - Pros: Octoparse is the best free data scraping tool I\'ve met. ... your Scrapy (a open-source data extraction framework) web spider\'s activities.',
title: 'What is the best web scraping open source tool? - Quora',
snippet:
'15.06.2015 - My personal favourite is Python Scrapy and it is an excellent framework for building a web data scraper. Why Scrapy? 1) It is an open source framework and cost...',
title: 'Top Web Scraping Frameworks and Libraries - AI Optify',
snippet:
'21.05.2018 - Top Web Scraping Frameworks and Libraries. Requests. Scrapy. Beautiful Soup. Selenium with Python. lxml. Webscraping with Selenium - part 1. Extracting data from websites with Scrapy. Scrapinghub.',
'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework',
snippet:
'An open source and collaborative framework for extracting the data you need from ... Spider): name = \'blogspider\' start_urls = [\'https://blog.scrapinghub.com\'] def...',
title: '5 Tasty Python Web Scraping Libraries - EliteDataScience',
snippet:
'03.02.2017 - We\'ve decided to feature the 5 Python libraries for web scraping that ... The good news is that you can swap out its parser with a faster one if ... Scrapy is technically not even a library… it\'s a complete web scraping framework.',
'GitHub - lorien/awesome-web-scraping: List of libraries, tools and APIs ...',
snippet:
'List of libraries, tools and APIs for web scraping and data processing. ... golang.md · add dataflow kit framework, 2 months ago ... Make this list better!',
title: 'Best Web Scraping Software Tools 2018 | Import.io',
snippet:
'07.08.2018 - List of Best Web Scraping SoftwareThere are hundreds of Web ... it is a fast high-level screen scraping and web crawling framework, used to...',