fixed google SERP title, better docker support

2025-07-15 21:45:19 +02:00 · 2019-09-23 16:46:22 +02:00
parent b25f7a4285
commit 07f3dceba1
3 changed files with 38 additions and 8 deletions
--- a/README.md
+++ b/README.md
@ -6,10 +6,11 @@
 This node module allows you to scrape search engines concurrently with different proxies.
-If you don't have much technical experience or don't want to purchase proxies, you can use [my scraping service](https://scrapeulous.com/).
+If you don't have extensive technical experience or don't want to purchase proxies, you can use [my scraping service](https://scrapeulous.com/).
-##### Table of Contents
+#### Table of Contents
 - [Installation](#installation)
 - [Docker](#docker-support)
 - [Minimal Example](#minimal-example)
 - [Quickstart](#quickstart)
 - [Contribute](#contribute)
@ -75,7 +76,7 @@ If you **don't** want puppeteer to download a complete chromium browser, add thi
 export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1
 ```
-### Docker Image
+### Docker Support
 I will maintain a public docker image of se-scraper. Pull the docker image with the command:
@ -83,7 +84,30 @@ I will maintain a public docker image of se-scraper. Pull the docker image with
 docker pull tschachn/se-scraper
 ```
-When the image is running, you may start scrape jobs via an HTTP API:
+Confirm that the docker image was correctly pulled:
 ```bash
 docker image ls
 ```
 Should show something like that:
 ```
 tschachn/se-scraper             secondtry           897e1aeeba78        21 minutes ago      1.29GB
 ```
 You can check the [latest tag here](https://hub.docker.com/r/tschachn/se-scraper/tags). In the example below, the latest tag is **secondtry**. This will most likely change in the future to **latest**.
 Run the docker image and map the internal port 3000 to the external 
 port 3000:
 ```bash
 $ docker run -p 3000:3000 tschachn/se-scraper:secondtry
 Running on http://0.0.0.0:3000
 ```
 When the image is running, you may start scrape jobs via HTTP API:
 ```bash
 curl -XPOST http://0.0.0.0:3000 -H 'Content-Type: application/json' \
--- a/package.json
+++ b/package.json
@ -1,6 +1,6 @@
 {
  "name": "se-scraper",
-  "version": "1.5.1",
+  "version": "1.5.2",
  "description": "A module using puppeteer to scrape several search engines such as Google, Bing and Duckduckgo",
  "homepage": "https://scrapeulous.com/",
  "main": "index.js",
--- a/src/modules/google.js
+++ b/src/modules/google.js
@ -16,13 +16,19 @@ class GoogleScraper extends Scraper {
        const results = [];
        $('#center_col .g').each((i, link) => {
-            results.push({
+            let obj = {
                link: $(link).find('.r a').attr('href'),
-                title: $(link).find('.r a').text(),
+                title: $(link).find('.r a h3').text(),
                snippet: $(link).find('span.st').text(),
                visible_link: $(link).find('.r cite').text(),
                date: $(link).find('span.f').text() || '',
-            })
+            };
            if (obj.date) {
                obj.date = obj.date.replace(' - ', '');
            }
            results.push(obj);
        });
        // parse ads