HugoPoi
|
394b567db6
|
test: add user_agent tests, add html_output tests
|
2020-01-10 09:35:24 +01:00 |
|
HugoPoi
|
cac6b87e92
|
test: Bing tests working, refactor proxy for tests
|
2020-01-08 14:40:28 +01:00 |
|
HugoPoi
|
f192e4ebb4
|
test: remove legacy tests
|
2020-01-07 16:43:17 +01:00 |
|
HugoPoi
|
392c43390e
|
test(google): add real integration/unit tests for google module
|
2020-01-03 19:21:34 +01:00 |
|
HugoPoi
|
8f40057534
|
refactor(cluster): use custom concurrency for puppeteer-cluster
|
2019-12-20 19:44:59 +01:00 |
|
HugoPoi
|
bcd181111b
|
refactor(log): remove common.js, use winston and debug
|
2019-12-15 17:56:22 +01:00 |
|
HugoPoi
|
b4a86fcc51
|
refactor(proxy): remove proxy option not working replace by proxies
|
2019-12-13 18:02:22 +01:00 |
|
David Solé
|
ca9f5f7f50
|
Added post install script to build the puppeteer-cluster, and also added the updated dependencies from puppeteer-cluster
|
2019-11-22 00:37:29 +01:00 |
|
Nikolai Tschacher
|
1694ee92d0
|
updated to puppeteeer 2.0
|
2019-11-08 16:21:16 +01:00 |
|
Nikolai Tschacher
|
da69913272
|
added detected status to metadata
|
2019-10-06 15:34:18 +02:00 |
|
Nikolai Tschacher
|
4a3a0e6fd4
|
better pluggable api
|
2019-10-05 19:39:33 +02:00 |
|
Nikolai Tschacher
|
4953d9da7a
|
chaned version
|
2019-09-23 23:39:06 +02:00 |
|
Nikolai Tschacher
|
07f3dceba1
|
fixed google SERP title, better docker support
|
2019-09-23 16:46:22 +02:00 |
|
Nikolai Tschacher
|
b25f7a4285
|
added test to my working tree
|
2019-09-13 18:28:19 +02:00 |
|
Nikolai Tschacher
|
21378dab02
|
removed some search engines, added tests for existing, added yandex search engines
|
2019-09-13 16:15:33 +02:00 |
|
Nikolai Tschacher
|
77d6c4f04a
|
removed some stuff
|
2019-09-12 10:43:57 +02:00 |
|
Nikolai Tschacher
|
e661241f6f
|
added some parsing to google
|
2019-08-16 20:10:40 +02:00 |
|
Nikolai Tschacher
|
98414259fe
|
docker support added
|
2019-08-13 17:35:06 +02:00 |
|
Nikolai Tschacher
|
19a172c654
|
better tests
|
2019-08-13 15:28:30 +02:00 |
|
Nikolai Tschacher
|
0f7e89c272
|
added little bug in cleaning
|
2019-08-12 17:16:37 +02:00 |
|
Nikolai Tschacher
|
87fcdd35d5
|
readme in static tests
|
2019-08-12 00:01:02 +02:00 |
|
Nikolai Tschacher
|
78fe12390b
|
better user agents now, added option to include screenshots as base64 in results
|
2019-07-18 20:19:15 +02:00 |
|
Nikolai Tschacher
|
fcbe66b56b
|
using random user agents now from https://github.com/intoli/user-agents
|
2019-07-18 19:34:09 +02:00 |
|
Nikolai Tschacher
|
59154694f2
|
fixed issue https://github.com/NikolaiT/se-scraper/issues/37
|
2019-07-18 19:14:33 +02:00 |
|
Nikolai Tschacher
|
1fc7f0d1c8
|
fixed a badboy
|
2019-07-11 16:54:32 +02:00 |
|
Nikolai Tschacher
|
dab25f9068
|
added google shopping results
|
2019-07-11 16:42:01 +02:00 |
|
Nikolai Tschacher
|
a413cb54ef
|
parsing ads works for duckduckgo, google, bing. tested.
|
2019-07-07 19:38:28 +02:00 |
|
Nikolai Tschacher
|
bbebe3ce60
|
parsing ads is supported now for google, bing and duckduckgo
|
2019-07-06 21:42:13 +02:00 |
|
Nikolai Tschacher
|
09c1255400
|
removed some superflous stuff
|
2019-07-02 18:04:01 +02:00 |
|
Nikolai Tschacher
|
5e8ff1cb34
|
Merge branch 'master' of https://github.com/NikolaiT/se-scraper
|
2019-06-29 17:01:25 +02:00 |
|
Nikolai Tschacher
|
d1e9b21269
|
added google maps scraper
|
2019-06-29 17:00:19 +02:00 |
|
HugoPoi
|
d9ac9f4162
|
Add test for html_output, refactor the results return
|
2019-06-26 12:03:42 +02:00 |
|
Nikolai Tschacher
|
80d23a9d57
|
users may pass their own user agents, different browsers have random user agents and not the same now
|
2019-06-17 21:25:45 +02:00 |
|
Nikolai Tschacher
|
ebe9ba8ea9
|
added option to throw on detection
|
2019-06-17 15:02:44 +02:00 |
|
Nikolai Tschacher
|
caa93df3b0
|
random user agent fixed
|
2019-06-17 12:01:13 +02:00 |
|
Nikolai Tschacher
|
43d5732de7
|
resolved issue #30, custom scrapers now possible. new npm version
|
2019-06-13 12:34:39 +02:00 |
|
Nikolai Tschacher
|
db5fbb23d2
|
removed unnecessary sleeping times
|
2019-06-12 18:14:49 +02:00 |
|
Nikolai Tschacher
|
5bf7c94b9a
|
new version
|
2019-06-11 22:01:27 +02:00 |
|
Nikolai Tschacher
|
7e06944fa1
|
updated README
|
2019-06-11 18:27:34 +02:00 |
|
Nikolai Tschacher
|
6825c97790
|
changed api big time
|
2019-06-11 18:16:59 +02:00 |
|
Nikolai Tschacher
|
3d69f4e249
|
added a proxy test script
|
2019-05-06 21:54:23 +02:00 |
|
Nikolai Tschacher
|
1593759556
|
passing chrome flags directly now possible
|
2019-04-01 15:33:26 +02:00 |
|
Nikolai Tschacher
|
b82c769bb1
|
google_news_old supports google_news_old_settings now
|
2019-03-20 15:28:04 +01:00 |
|
Nikolai Tschacher
|
7a8c6f13f0
|
fixed #11 by improving baidu a lot in speed and quality
|
2019-03-14 23:33:46 +01:00 |
|
Nikolai Tschacher
|
51d617442d
|
added support for amazon
|
2019-03-10 20:02:42 +01:00 |
|
Nikolai Tschacher
|
dd1f36076e
|
can now parse args from string to json
|
2019-03-07 15:50:36 +01:00 |
|
Nikolai Tschacher
|
7b52b4e62f
|
added suport for custom query string parameters
|
2019-03-06 00:08:25 +01:00 |
|
Nikolai Tschacher
|
7239e23cba
|
fixed pluggable
|
2019-03-03 16:46:10 +01:00 |
|
Nikolai Tschacher
|
8cbf37eaba
|
minor improvements
|
2019-03-02 22:32:26 +01:00 |
|
Nikolai Tschacher
|
abf4458e46
|
fixed quotes in user agent. this lead to cloudflare detecting the scraper. very bad.
|
2019-03-01 16:02:30 +01:00 |
|