13 Commits

Author SHA1 Message Date
kik4444
0d518bf813
query web --query should return list<list<string>> like the scraper crate's ElementRef::text() (#11705)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
## Problem
I tried converting one of my Rust web scrapers to Nushell just to see
how it would be done, but quickly ran into an issue that proved annoying
to fix without diving into the source.

For instance, let's say we have the following HTML
```html
<p>Hello there, <span style="color: red;">World</span></p>
```
and we want to extract only the text within the `p` element, but not the
`span`. With the current version of nu_plugin_query, if we run this code
```nushell
echo `<p>Hello there, <span style="color: red;">World</span></p>` | query web -q "p" | get 0
# returns "Hello there, World"

# but we want only "Hello there, "
```
we will get back a `list<string>` that contains 1 string `Hello there,
World`.
To avoid scraping the span, we would have to do something like this
```nushell
const html = `<p>Hello there, <span style="color: red;">World</span></p>`
$html
| query web -q "p"
| get 0
| str replace ($html | query web -q "p > span" | get 0) ""
# returns "Hello there, "
```
In other words, we would have to make a sub scrape of the text we
*don't* want in order to subtract it from the text we *do* want.

## Solution
I didn't like this behavior, so I decided to change it. I modified the
`execute_selector_query` function to collect all text nodes in the HTML
element matching the query. Now `query web --query` will return a
`list<list<string>>`
```nushell
echo `<p>Hello there, <span style="color: red;">World</span></p>` | query web -q "p" | get 0 | to json --raw
# returns ["Hello there, ","World"]
```
This also brings `query web --query`'s behavior more in line with
[scraper's
ElementRef::text()](https://docs.rs/scraper/latest/scraper/element_ref/struct.ElementRef.html#method.text)
which "Returns an iterator over descendent text nodes", allowing you to
choose how much of an element's text you want to scrape without
resorting to string substitutions.

## Consequences
As this is a user-facing change, the usage examples will produce
different results than before. For example
```nushell
http get https://phoronix.com | query web --query 'header'
```
will return a list of lists of 1 string each, whereas before it was just
a list of strings.

I only modified the 3rd example
```nushell
# old
http get https://www.nushell.sh | query web --query 'h2, h2 + p' | group 2 | each {rotate --ccw tagline description} | flatten
# new
http get https://www.nushell.sh | query web --query 'h2, h2 + p' | each {str join} | group 2 | each {rotate --ccw tagline description} | flatten
```
to make it behave like before because I thought this one ought to show
the same results as before.
However, the second reason I changed the 3rd example is because it
otherwise panics! If we run the original 3rd example with my
modifications, we get a panic
```
thread 'main' panicked at crates/nu-protocol/src/value/record.rs:34:9:
assertion `left == right` failed
  left: 2
 right: 17
```
This happens because `rotate` receives a list of lists where the inner
lists have a different number of elements.

However this panic is unrelated to the changes I've made, because it can
be triggered easily without using the plugin. For instance
```nushell
# this is fine
[[[one] [two]] [[three] [four]]] | each {rotate --ccw tagline description}

# this panics!
[[[one] [two]] [[three] [four five]]] | each {rotate --ccw tagline description}
```
Though beyond the scope of this PR, I thought I'd mention this bug since
I found it while testing the usage examples. However, I intend to make a
proper issue about it tomorrow.

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
`query web --query "css selector"` now returns a `list<list<string>>`
instead of a `list<string>` to make it more in line with [scraper's
ElementRef::text()](https://docs.rs/scraper/latest/scraper/element_ref/struct.ElementRef.html#method.text).

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
I ran `cargo fmt --all -- --check`, `cargo clippy --workspace -- -D
warnings -D clippy::unwrap_used` and the tests in the plugin.

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
PR that updates the documentation to match the new 3rd example:
https://github.com/nushell/nushell.github.io/pull/1235
2024-02-02 19:40:47 -06:00
Darren Schroeder
2e5a857983
update query web wiki example (#11709)
# Description

This PR tries to make `query web` more resilient and easier to debug
with the `--inspect` parameter when trying to scrape tables. Previously
it would just fail, now at least it tries to give you a hint.

This is some example output now of when something went wrong.
```
❯ http get https://en.wikipedia.org/wiki/List_of_cities_in_India_by_population | query web --as-table [Rank City 'Population(2011)[3]' 'Population(2001)[3][a]' 'State or union territory'] --inspect
Passed in Column Headers = ["Rank", "City", "Population(2011)[3]", "Population(2001)[3][a]", "State or union territory"]

First 2048 HTML chars = <!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-0 vector-feature-client-preferences-disabled vector-feature-client-prefs-pinned-disabled vector-toc-available" lang="en" dir="ltr">
<head>
<meta charset="UTF-8">
<title>List of cities in India by population - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-0 vector-feature-client-preferences-disabled vector-feature-client-prefs-pinned-disabled vector-toc-available";var cookie=document.cookie.match(/(?:^|; )enwikimwclientpreferences=([^;]+)/);if(cookie){cookie[1].split('%2C').forEach(function(pref){className=className.replace(new RegExp('(^| )'+pref.replace(/-clientpref-\w+$|[^\w-]+/g,'')+'-clientpref-\\w+( |$)'),'$1'+pref+'$2');});}document.documentElement.className=className;}());RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["",
"January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"9ecdad8f-2dbd-4245-b54d-9c57aea5ca45","wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_cities_in_India_by_population","wgTitle":"List of cities in India by population","wgCurRevisionId":1192093210,"wgRev

Potential HTML Headers = ["City", "Population(2011)[3]", "Population(2001)[3][a]", "State or unionterritory", "Ref"]

Potential HTML Headers = ["City", "Population(2011)[5]", "Population(2001)", "State or unionterritory"]

Potential HTML Headers = [".mw-parser-output .navbar{display:inline;font-size:88%;font-weight:normal}.mw-parser-output .navbar-collapse{float:left;text-align:left}.mw-parser-output .navbar-boxtext{word-spacing:0}.mw-parser-output .navbar ul{display:inline-block;white-space:nowrap;line-height:inherit}.mw-parser-output .navbar-brackets::before{margin-right:-0.125em;content:\"[ \"}.mw-parser-output .navbar-brackets::after{margin-left:-0.125em;content:\" ]\"}.mw-parser-output .navbar li{word-spacing:-0.125em}.mw-parser-output .navbar a>span,.mw-parser-output .navbar a>abbr{text-decoration:inherit}.mw-parser-output .navbar-mini abbr{font-variant:small-caps;border-bottom:none;text-decoration:none;cursor:inherit}.mw-parser-output .navbar-ct-full{font-size:114%;margin:0 7em}.mw-parser-output .navbar-ct-mini{font-size:114%;margin:0 4em}vtePopulation of cities in India"]

Potential HTML Headers = ["vteGeography of India"]

╭──────────────────────────┬─────────────────────────────────────────────────────╮
│ Rank                     │ error: no data found (column name may be incorrect) │
│ City                     │ error: no data found (column name may be incorrect) │
│ Population(2011)[3]      │ error: no data found (column name may be incorrect) │
│ Population(2001)[3][a]   │ error: no data found (column name may be incorrect) │
│ State or union territory │ error: no data found (column name may be incorrect) │
╰──────────────────────────┴─────────────────────────────────────────────────────╯
```
The key here is to look at the `Passed in Column Headers` and compare
them to the `Potential HTML Headers` and couple that with the error
table at the bottom should give you a hint that, in this situation,
wikipedia has changed the column names, yet again. So we need to update
our query web statement's tables to get closer to what we want.

```
❯ http get https://en.wikipedia.org/wiki/List_of_cities_in_India_by_population | query web --as-table [City 'Population(2011)[3]' 'Population(2001)[3][a]' 'State or unionterritory' 'Ref']
╭─#──┬───────City───────┬─Population(2011)[3]─┬─Population(2001)[3][a]─┬─State or unionterritory─┬──Ref───╮
│ 0  │ Mumbai           │ 12,442,373          │ 11,978,450             │ Maharashtra             │ [3]    │
│ 1  │ Delhi            │ 11,034,555          │ 9,879,172              │ Delhi                   │ [3]    │
│ 2  │ Bangalore        │ 8,443,675           │ 5,682,293              │ Karnataka               │ [3]    │
│ 3  │ Hyderabad        │ 6,993,262           │ 5,496,960              │ Telangana               │ [3]    │
│ 4  │ Ahmedabad        │ 5,577,940           │ 4,470,006              │ Gujarat                 │ [3]    │
│ 5  │ Chennai          │ 4,646,732           │ 4,343,645              │ Tamil Nadu              │ [3]    │
│ 6  │ Kolkata          │ 4,496,694           │ 4,580,546              │ West Bengal             │ [3]    │
│ 7  │ Surat            │ 4,467,797           │ 2,788,126              │ Gujarat                 │ [3]    │
│ 8  │ Pune             │ 3,124,458           │ 2,538,473              │ Maharashtra             │ [3]    │
│ 9  │ Jaipur           │ 3,046,163           │ 2,322,575              │ Rajasthan               │ [3]    │
│ 10 │ Lucknow          │ 2,817,105           │ 2,185,927              │ Uttar Pradesh           │ [3]    │
│ 11 │ Kanpur           │ 2,765,348           │ 2,551,337              │ Uttar Pradesh           │ [3]    │
│ 12 │ Nagpur           │ 2,405,665           │ 2,052,066              │ Maharashtra             │ [3]    │
```
# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2024-02-02 09:03:28 -06:00
Eric Hodel
7071617f18
Allow plugins to receive configuration from the nushell configuration (#10955)
# Description

When nushell calls a plugin it now sends a configuration `Value` from
the nushell config under `$env.config.plugins.PLUGIN_SHORT_NAME`. This
allows plugin authors to read configuration provided by plugin users.

The `PLUGIN_SHORT_NAME` must match the registered filename after
`nu_plugin_`. If you register `target/debug/nu_plugin_config` the
`PLUGIN_NAME` will be `config` and the nushell config will loook like:

        $env.config = {
          # ...
          plugins: {
            config: [
              some
              values
            ]
          }
        }

Configuration may also use a closure which allows passing values from
`$env` to a plugin:

        $env.config = {
          # ...
          plugins: {
            config: {||
              $env.some_value
            }
          }
        }

This is a breaking change for the plugin API as the `Plugin::run()`
function now accepts a new configuration argument which is an
`&Option<Value>`. If no configuration was supplied the value is `None`.

Plugins compiled after this change should work with older nushell, and
will behave as if the configuration was not set.

Initially discussed in #10867

# User-Facing Changes

* Plugins can read configuration data stored in `$env.config.plugins`
* The plugin `CallInfo` now includes a `config` entry, existing plugins
will require updates

# Tests + Formatting

- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`

# After Submitting

- [ ] Update [Creating a plugin (in
Rust)](https://www.nushell.sh/contributor-book/plugins.html#creating-a-plugin-in-rust)
[source](https://github.com/nushell/nushell.github.io/blob/main/contributor-book/plugins.md)
- [ ] Add "Configuration" section to [Plugins
documentation](https://www.nushell.sh/contributor-book/plugins.html)
2024-01-15 16:59:47 +08:00
Darren Schroeder
a093e66822
update query web param --as-table from Table to List (#11531)
# Description

This is a small change that updates the `--as-table`/`-t` parameter to
`SyntaxShape::List` instead of `SyntaxShape::Table`. It was always
supposed to be a list of headers. Not sure where Table came from.

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2024-01-12 13:26:40 -06:00
Hofer-Julian
5f2089a15b
Add long options for misc and network (#10753) 2023-10-19 18:16:44 +02:00
Darren Schroeder
02318cf3a7
update query web example because wikipedia changed their page (#10173)
# Description

This PR updates one of the query web examples because the wikipedia page
changed. This works again.

![image](https://github.com/nushell/nushell/assets/343840/72658c98-a339-4e76-96da-56d725e7a0e1)


# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2023-08-31 11:00:30 -05:00
mike
8e38596bc9
allow tables to have annotations (#9613)
# Description

follow up to #8529 and #8914

this works very similarly to record annotations, only difference being
that

```sh
table<name: string>
      ^^^^  ^^^^^^
      |     | 
      |     represents the type of the items in that column
      |
      represents the column name
```
more info on the syntax can be found
[here](https://github.com/nushell/nushell/pull/8914#issue-1672113520)

# User-Facing Changes

**[BREAKING CHANGE]**
this change adds a field to `SyntaxShape::Table` so any plugins that
used it will have to update and include the field. though if you are
unsure of the type the table expects, `SyntaxShape::Table(vec![])` will
suffice
2023-07-07 11:06:09 +02:00
Justin Ma
e672689a76
Fix docs building error caused by missing end tag (#8477) 2023-03-16 19:41:19 +08:00
Yethal
f4a129a792
Added examples to query web plugin (#8171)
# Description

_(Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.)_

_(Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.)_

# User-Facing Changes

_(List of all changes that impact the user experience here. This helps
us keep track of breaking changes.)_

# Tests + Formatting

Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A
clippy::needless_collect` to check that you're using the standard code
style
- `cargo test --workspace` to check that all tests pass

# After Submitting

If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.

---------

Co-authored-by: Yethal <nosuchemail@email.com>
2023-02-22 19:01:15 +00:00
WindSoilder
055edd886d
Make plugin commands support examples. (#7984)
# Description

As title, we can't provide examples for plugin commands, this pr would
make it possible


# User-Facing Changes

Take plugin `nu-example-1` as example:
```
❯ nu-example-1 -h
PluginSignature test 1 for plugin. Returns Value::Nothing

Usage:
  > nu-example-1 {flags} <a> <b> (opt) ...(rest)

Flags:
  -h, --help - Display the help message for this command
  -f, --flag - a flag for the signature
  -n, --named <String> - named string

Parameters:
  a <int>: required integer value
  b <string>: required string value
  (optional) opt <int>: Optional number
  ...rest <string>: rest value string

Examples:
  running example with an int value and string value
  > nu-example-1 3 bb
```

The examples session is newly added.

## Basic idea behind these changes
when nushell query plugin signatures, plugin just returns it's signature
without any examples, so nushell have no idea about the examples of
plugin commands.
To adding the feature, we just making plugin returns it's signature with
examples.

Before:
```
        1. get signature
         ----------------> 
Nushell ------------------  Plugin
        <-----------------
        2. returns Vec<Signature>
```

After:
```
        1. get signature
        ----------------> 
Nushell ------------------  Plugin
        <-----------------
        2. returns Vec<PluginSignature>
```
        
When writing plugin signature to $nu.plugin-path:
Serialize `<PluginSignature>` rather than `<Signature>`, which would
enable us to serialize examples to `$nu.plugin-path`

## Shortcoming
It's a breaking changes because `Plugin::signature` is changed, and it
requires plugin authors to change their code for new signatures.

Fortunally it should be easy to change, for rust based plugin, we just
need to make a global replace from word `Signature` to word
`PluginSignature` in their plugin project.

Our content of plugin-path is really large, if one plugin have many
examples, it'd results to larger body of $nu.plugin-path, which is not
really scale. A solution would be save register information in other
binary formats rather than `json`. But I think it'd be another story.

# Tests + Formatting

Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A
clippy::needless_collect` to check that you're using the standard code
style
- `cargo test --workspace` to check that all tests pass

# After Submitting

If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
2023-02-08 16:14:18 -06:00
Jakub Žádník
2873e943b3
Add search terms to Command and Signature (#4980)
* Add search terms to command

* Rename Signature desc to usage

To be named uniformly with extra_usage

* Throw in foldl search term for reduce

* Add missing usage to post

* Add search terms to signature

* Try to add capnp Signature serialization
2022-03-27 22:25:30 +03:00
JT
f5f9d56c37
Move to a standard kebab/snake style (#4509) 2022-02-17 09:55:17 -05:00
Darren Schroeder
004d7b5ff0
query command with json, web, xml (#870)
* query command with json, web, xml

* query xml now working

* clippy

* comment out web tests

* Initial work on query web

For now we can query everything except tables

* Support for querying tables

Now we can query multiple tables just like before, now the only thing
missing is the test coverage

* finish off

* comment out web test

Co-authored-by: Luccas Mateus de Medeiros Gomes <luccasmmg@gmail.com>
2022-02-01 12:45:48 -06:00