Commit Graph

56 Commits

Author SHA1 Message Date
Nikolai Tschacher
1694ee92d0 updated to puppeteeer 2.0 2019-11-08 16:21:16 +01:00
Nikolai Tschacher
da69913272 added detected status to metadata 2019-10-06 15:34:18 +02:00
Nikolai Tschacher
4a3a0e6fd4 better pluggable api 2019-10-05 19:39:33 +02:00
Nikolai Tschacher
4953d9da7a chaned version 2019-09-23 23:39:06 +02:00
Nikolai Tschacher
07f3dceba1 fixed google SERP title, better docker support 2019-09-23 16:46:22 +02:00
Nikolai Tschacher
b25f7a4285 added test to my working tree 2019-09-13 18:28:19 +02:00
Nikolai Tschacher
21378dab02 removed some search engines, added tests for existing, added yandex search engines 2019-09-13 16:15:33 +02:00
Nikolai Tschacher
77d6c4f04a removed some stuff 2019-09-12 10:43:57 +02:00
Nikolai Tschacher
e661241f6f added some parsing to google 2019-08-16 20:10:40 +02:00
Nikolai Tschacher
98414259fe docker support added 2019-08-13 17:35:06 +02:00
Nikolai Tschacher
19a172c654 better tests 2019-08-13 15:28:30 +02:00
Nikolai Tschacher
0f7e89c272 added little bug in cleaning 2019-08-12 17:16:37 +02:00
Nikolai Tschacher
87fcdd35d5 readme in static tests 2019-08-12 00:01:02 +02:00
Nikolai Tschacher
78fe12390b better user agents now, added option to include screenshots as base64 in results 2019-07-18 20:19:15 +02:00
Nikolai Tschacher
fcbe66b56b using random user agents now from https://github.com/intoli/user-agents 2019-07-18 19:34:09 +02:00
Nikolai Tschacher
59154694f2 fixed issue https://github.com/NikolaiT/se-scraper/issues/37 2019-07-18 19:14:33 +02:00
Nikolai Tschacher
1fc7f0d1c8 fixed a badboy 2019-07-11 16:54:32 +02:00
Nikolai Tschacher
dab25f9068 added google shopping results 2019-07-11 16:42:01 +02:00
Nikolai Tschacher
a413cb54ef parsing ads works for duckduckgo, google, bing. tested. 2019-07-07 19:38:28 +02:00
Nikolai Tschacher
bbebe3ce60 parsing ads is supported now for google, bing and duckduckgo 2019-07-06 21:42:13 +02:00
Nikolai Tschacher
09c1255400 removed some superflous stuff 2019-07-02 18:04:01 +02:00
Nikolai Tschacher
5e8ff1cb34 Merge branch 'master' of https://github.com/NikolaiT/se-scraper 2019-06-29 17:01:25 +02:00
Nikolai Tschacher
d1e9b21269 added google maps scraper 2019-06-29 17:00:19 +02:00
HugoPoi
d9ac9f4162 Add test for html_output, refactor the results return 2019-06-26 12:03:42 +02:00
Nikolai Tschacher
80d23a9d57 users may pass their own user agents, different browsers have random user agents and not the same now 2019-06-17 21:25:45 +02:00
Nikolai Tschacher
ebe9ba8ea9 added option to throw on detection 2019-06-17 15:02:44 +02:00
Nikolai Tschacher
caa93df3b0 random user agent fixed 2019-06-17 12:01:13 +02:00
Nikolai Tschacher
43d5732de7 resolved issue #30, custom scrapers now possible. new npm version 2019-06-13 12:34:39 +02:00
Nikolai Tschacher
db5fbb23d2 removed unnecessary sleeping times 2019-06-12 18:14:49 +02:00
Nikolai Tschacher
5bf7c94b9a new version 2019-06-11 22:01:27 +02:00
Nikolai Tschacher
7e06944fa1 updated README 2019-06-11 18:27:34 +02:00
Nikolai Tschacher
6825c97790 changed api big time 2019-06-11 18:16:59 +02:00
Nikolai Tschacher
3d69f4e249 added a proxy test script 2019-05-06 21:54:23 +02:00
Nikolai Tschacher
1593759556 passing chrome flags directly now possible 2019-04-01 15:33:26 +02:00
Nikolai Tschacher
b82c769bb1 google_news_old supports google_news_old_settings now 2019-03-20 15:28:04 +01:00
Nikolai Tschacher
7a8c6f13f0 fixed #11 by improving baidu a lot in speed and quality 2019-03-14 23:33:46 +01:00
Nikolai Tschacher
51d617442d added support for amazon 2019-03-10 20:02:42 +01:00
Nikolai Tschacher
dd1f36076e can now parse args from string to json 2019-03-07 15:50:36 +01:00
Nikolai Tschacher
7b52b4e62f added suport for custom query string parameters 2019-03-06 00:08:25 +01:00
Nikolai Tschacher
7239e23cba fixed pluggable 2019-03-03 16:46:10 +01:00
Nikolai Tschacher
8cbf37eaba minor improvements 2019-03-02 22:32:26 +01:00
Nikolai Tschacher
abf4458e46 fixed quotes in user agent. this lead to cloudflare detecting the scraper. very bad. 2019-03-01 16:02:30 +01:00
Nikolai Tschacher
79d32a315a fixed some errors and way better README 2019-02-28 15:34:25 +01:00
Nikolai Tschacher
089e410ec6 support for multible browsers and proxies 2019-02-27 20:58:13 +01:00
Nikolai Tschacher
7b5048b8ee num_keywords are counted now. added to pluggable 2019-02-07 16:21:56 +01:00
Nikolai Tschacher
7572ebd314 added chrome detection evasion techniques 2019-02-07 16:09:38 +01:00
Nikolai Tschacher
c60d0f3528 clean test case for google is passing 2019-01-31 14:57:34 +01:00
Nikolai Tschacher
9e62f23451 resolved some issues. proxy possible now. scraping for more than one page possible now 2019-01-29 22:48:08 +01:00
Nikolai Tschacher
89441070cd before_keyword_scraped() hook supported 2019-01-29 13:29:24 +01:00
Nikolai Tschacher
c5e3e84e1d minor changes 2019-01-27 22:11:41 +01:00