Commit Graph

111 Commits

Author SHA1 Message Date
Nikolai Tschacher
d1e9b21269 added google maps scraper 2019-06-29 17:00:19 +02:00
Nikolai Tschacher
593f3a95e5
Merge pull request #33 from TDenoncin/add-html-output-rework
Add html output option
2019-06-26 15:38:38 +02:00
HugoPoi
d9ac9f4162 Add test for html_output, refactor the results return 2019-06-26 12:03:42 +02:00
Thomas
a0e63aa4b0 Use bing_setting.bing_domain if defined for startUrl 2019-06-25 17:16:17 +02:00
Thomas
a3ebe357a4 Add html_output fonctionality
Pagination support for html output
Change return value to keep it compliant to the current version of se-scrapper
2019-06-25 17:02:34 +02:00
Nikolai Tschacher
0d7f6dcd11 worked on issue #31 2019-06-18 22:23:52 +02:00
Nikolai Tschacher
80d23a9d57 users may pass their own user agents, different browsers have random user agents and not the same now 2019-06-17 21:25:45 +02:00
Nikolai Tschacher
ebe9ba8ea9 added option to throw on detection 2019-06-17 15:02:44 +02:00
Nikolai Tschacher
caa93df3b0 random user agent fixed 2019-06-17 12:01:13 +02:00
Nikolai Tschacher
0c9f353cb2 remove hardcoded sleep() in Google Image 2019-06-17 00:03:13 +02:00
Nikolai Tschacher
43d5732de7 resolved issue #30, custom scrapers now possible. new npm version 2019-06-13 12:34:39 +02:00
Nikolai Tschacher
06d500f75c . 2019-06-12 21:25:40 +02:00
Nikolai Tschacher
784e887787 fixed issue #22 2019-06-12 21:25:20 +02:00
Nikolai Tschacher
db5fbb23d2 removed unnecessary sleeping times 2019-06-12 18:14:49 +02:00
Nikolai Tschacher
5bf7c94b9a new version 2019-06-11 22:01:27 +02:00
Nikolai Tschacher
d4d06f7d67 need to edit readme 2019-06-11 18:34:51 +02:00
Nikolai Tschacher
35943e7449 minor stuff 2019-06-11 18:33:11 +02:00
Nikolai Tschacher
7e06944fa1 updated README 2019-06-11 18:27:34 +02:00
Nikolai Tschacher
6825c97790 changed api big time 2019-06-11 18:16:59 +02:00
Nikolai Tschacher
3d69f4e249 added a proxy test script 2019-05-06 21:54:23 +02:00
Nikolai Tschacher
1593759556 passing chrome flags directly now possible 2019-04-01 15:33:26 +02:00
Nikolai Tschacher
775dcfa077 proxy mgmt better 2019-03-22 18:55:17 +01:00
Nikolai Tschacher
b82c769bb1 google_news_old supports google_news_old_settings now 2019-03-20 15:28:04 +01:00
Nikolai Tschacher
1bed9c5854 fixed issue 12 2019-03-20 11:50:43 +01:00
Nikolai Tschacher
7a8c6f13f0 fixed #11 by improving baidu a lot in speed and quality 2019-03-14 23:33:46 +01:00
Nikolai Tschacher
51d617442d added support for amazon 2019-03-10 20:02:42 +01:00
Nikolai Tschacher
dd1f36076e can now parse args from string to json 2019-03-07 15:50:36 +01:00
Nikolai Tschacher
62b3b688b4 minor fixes 2019-03-07 13:16:12 +01:00
Nikolai Tschacher
7b52b4e62f added suport for custom query string parameters 2019-03-06 00:08:25 +01:00
Nikolai Tschacher
7239e23cba fixed pluggable 2019-03-03 16:46:10 +01:00
Nikolai Tschacher
8cbf37eaba minor improvements 2019-03-02 22:32:26 +01:00
Nikolai Tschacher
abf4458e46 fixed quotes in user agent. this lead to cloudflare detecting the scraper. very bad. 2019-03-01 16:02:30 +01:00
Nikolai Tschacher
79d32a315a fixed some errors and way better README 2019-02-28 15:34:25 +01:00
Nikolai Tschacher
089e410ec6 support for multible browsers and proxies 2019-02-27 20:58:13 +01:00
Nikolai Tschacher
393b9c0450
Merge pull request #8 from NikolaiT/add-license-1
Create LICENSE
2019-02-08 00:58:27 +01:00
Nikolai Tschacher
fb3f2836e4
Create LICENSE 2019-02-08 00:58:15 +01:00
Nikolai Tschacher
53c9ebf467
Merge pull request #7 from NikolaiT/add-code-of-conduct-1
Create CODE_OF_CONDUCT.md
2019-02-08 00:54:28 +01:00
Nikolai Tschacher
9521c54c77
Create CODE_OF_CONDUCT.md 2019-02-08 00:54:10 +01:00
Nikolai Tschacher
77c332d7c8 updated readme 2019-02-07 16:26:11 +01:00
Nikolai Tschacher
7b5048b8ee num_keywords are counted now. added to pluggable 2019-02-07 16:21:56 +01:00
Nikolai Tschacher
7572ebd314 added chrome detection evasion techniques 2019-02-07 16:09:38 +01:00
Nikolai Tschacher
d5b147296e ticker search OOP now and added tests 2019-01-31 22:13:22 +01:00
Nikolai Tschacher
d35a602994 added clean test cases for bing and duckduckgo 2019-01-31 15:36:27 +01:00
Nikolai Tschacher
7441c57a43 removed generic tests. too complicated 2019-01-31 14:58:07 +01:00
Nikolai Tschacher
c60d0f3528 clean test case for google is passing 2019-01-31 14:57:34 +01:00
Nikolai Tschacher
987e3d7342 tested and works 2019-01-30 23:53:09 +01:00
Nikolai Tschacher
581568ff18 cleaned up google scrapers. All scrapers are classes now. from 600 LOC to 400 LOC. HIGH IQ MOVE 2019-01-30 20:24:03 +01:00
Nikolai Tschacher
4306848657 implemented generic scraping class 2019-01-30 16:05:08 +01:00
Nikolai Tschacher
9e62f23451 resolved some issues. proxy possible now. scraping for more than one page possible now 2019-01-29 22:48:08 +01:00
Nikolai Tschacher
89441070cd before_keyword_scraped() hook supported 2019-01-29 13:29:24 +01:00