2019-02-28 15:34:25 +01:00
|
|
|
### 24.12.2018
|
2018-12-24 14:25:02 +01:00
|
|
|
- fix interface to scrape() [DONE]
|
|
|
|
- add to Github
|
|
|
|
|
2019-01-24 15:50:03 +01:00
|
|
|
|
2019-02-28 15:34:25 +01:00
|
|
|
### 24.1.2018
|
2019-01-24 15:50:03 +01:00
|
|
|
- fix issue #3: add functionality to add keyword file
|
|
|
|
|
2019-02-28 15:34:25 +01:00
|
|
|
### 27.1.2019
|
2019-01-27 01:27:52 +01:00
|
|
|
- Add functionality to block images and CSS from loading as described here:
|
|
|
|
https://www.scrapehero.com/how-to-increase-web-scraping-speed-using-puppeteer/
|
2019-01-27 15:54:56 +01:00
|
|
|
https://www.scrapehero.com/how-to-build-a-web-scraper-using-puppeteer-and-node-js/
|
2019-01-27 01:27:52 +01:00
|
|
|
|
2019-02-28 15:34:25 +01:00
|
|
|
### 29.1.2019
|
2019-01-29 22:48:08 +01:00
|
|
|
- implement proxy support functionality
|
|
|
|
- implement proxy check
|
|
|
|
|
|
|
|
- implement scraping more than 1 page
|
|
|
|
- do it for google
|
|
|
|
- and bing
|
|
|
|
- implement duckduckgo scraping
|
|
|
|
|
2019-01-30 16:05:08 +01:00
|
|
|
|
2019-02-28 15:34:25 +01:00
|
|
|
### 30.1.2019
|
2019-01-30 16:05:08 +01:00
|
|
|
- modify all scrapers to use the generic class where it makes sense
|
|
|
|
- Bing, Baidu, Google, Duckduckgo
|
|
|
|
|
2019-02-28 15:34:25 +01:00
|
|
|
### 7.2.2019
|
2019-02-07 16:21:56 +01:00
|
|
|
- add num_requests to test cases [done]
|
2019-02-07 16:09:38 +01:00
|
|
|
|
2019-02-28 15:34:25 +01:00
|
|
|
### 25.2.2019
|
2019-02-27 20:58:13 +01:00
|
|
|
- https://antoinevastel.com/crawler/2018/09/20/parallel-crawler-puppeteer.html
|
|
|
|
- add support for browsing with multiple browsers, use this neat library:
|
|
|
|
- https://github.com/thomasdondorf/puppeteer-cluster [done]
|
2019-02-28 15:34:25 +01:00
|
|
|
|
|
|
|
|
|
|
|
### 28.2.2019
|
|
|
|
- write test case for multiple browsers/proxies
|
|
|
|
- write test case and example for multiple tabs with bing
|
|
|
|
- make README.md nicer. https://github.com/thomasdondorf/puppeteer-cluster/blob/master/README.md as template
|
|
|
|
|
|
|
|
### TODO:
|
2019-03-06 00:08:25 +01:00
|
|
|
- fix duckduckgo test case!!!
|
|
|
|
- add test case for infospace
|
|
|
|
- add test case for google parameters for
|
|
|
|
- num
|
|
|
|
- start
|
|
|
|
- some language settings
|
2019-02-27 20:58:13 +01:00
|
|
|
- write test case for proxy support and cluster support
|
2019-02-07 16:21:56 +01:00
|
|
|
- add captcha service solving support
|
|
|
|
- check if news instances run the same browser and if we can have one proxy per tab wokers
|
|
|
|
- write test case for:
|
|
|
|
- pluggable
|