nushell/crates/nu-std
Bahex b6e84879b6
add multiple grouper support to group-by (#14337)
- closes #14330 

Related:
- #2607 
- #14019
- #14316 

# Description
This PR changes `group-by` to support grouping by multiple `grouper`
arguments.

# Changes

- No grouper: no change in behavior 
- Single grouper
  - `--to-table=false`: no change in behavior
  - `--to-table=true`:
    - closure grouper: named group0
    - cell-path grouper: named after the cell-path
- Multiple groupers:
  - `--to-table=false`: nested groups
- `--to-table=true`: one column for each grouper argument, followed by
the `items` column
    - columns corresponding to cell-paths are named after them
- columns corresponding to closure groupers are named `group{i}` where
`i` is the index of the grouper argument

# Examples
```nushell
> [1 3 1 3 2 1 1] | group-by
╭───┬───────────╮
│   │ ╭───┬───╮ │
│ 1 │ │ 0 │ 1 │ │
│   │ │ 1 │ 1 │ │
│   │ │ 2 │ 1 │ │
│   │ │ 3 │ 1 │ │
│   │ ╰───┴───╯ │
│   │ ╭───┬───╮ │
│ 3 │ │ 0 │ 3 │ │
│   │ │ 1 │ 3 │ │
│   │ ╰───┴───╯ │
│   │ ╭───┬───╮ │
│ 2 │ │ 0 │ 2 │ │
│   │ ╰───┴───╯ │
╰───┴───────────╯

> [1 3 1 3 2 1 1] | group-by --to-table
╭─#─┬─group─┬───items───╮
│ 0 │ 1     │ ╭───┬───╮ │
│   │       │ │ 0 │ 1 │ │
│   │       │ │ 1 │ 1 │ │
│   │       │ │ 2 │ 1 │ │
│   │       │ │ 3 │ 1 │ │
│   │       │ ╰───┴───╯ │
│ 1 │ 3     │ ╭───┬───╮ │
│   │       │ │ 0 │ 3 │ │
│   │       │ │ 1 │ 3 │ │
│   │       │ ╰───┴───╯ │
│ 2 │ 2     │ ╭───┬───╮ │
│   │       │ │ 0 │ 2 │ │
│   │       │ ╰───┴───╯ │
╰─#─┴─group─┴───items───╯

> [1 3 1 3 2 1 1] | group-by { $in >= 2 }
╭───────┬───────────╮
│       │ ╭───┬───╮ │
│ false │ │ 0 │ 1 │ │
│       │ │ 1 │ 1 │ │
│       │ │ 2 │ 1 │ │
│       │ │ 3 │ 1 │ │
│       │ ╰───┴───╯ │
│       │ ╭───┬───╮ │
│ true  │ │ 0 │ 3 │ │
│       │ │ 1 │ 3 │ │
│       │ │ 2 │ 2 │ │
│       │ ╰───┴───╯ │
╰───────┴───────────╯

> [1 3 1 3 2 1 1] | group-by { $in >= 2 } --to-table
╭─#─┬─group0─┬───items───╮
│ 0 │ false  │ ╭───┬───╮ │
│   │        │ │ 0 │ 1 │ │
│   │        │ │ 1 │ 1 │ │
│   │        │ │ 2 │ 1 │ │
│   │        │ │ 3 │ 1 │ │
│   │        │ ╰───┴───╯ │
│ 1 │ true   │ ╭───┬───╮ │
│   │        │ │ 0 │ 3 │ │
│   │        │ │ 1 │ 3 │ │
│   │        │ │ 2 │ 2 │ │
│   │        │ ╰───┴───╯ │
╰─#─┴─group0─┴───items───╯
```

```nushell
let data = [
    [name, lang, year];
    [andres, rb, "2019"],
    [jt, rs, "2019"],
    [storm, rs, "2021"]
]

> $data
╭─#─┬──name──┬─lang─┬─year─╮
│ 0 │ andres │ rb   │ 2019 │
│ 1 │ jt     │ rs   │ 2019 │
│ 2 │ storm  │ rs   │ 2021 │
╰─#─┴──name──┴─lang─┴─year─╯
```

```nushell
> $data | group-by lang
╭────┬──────────────────────────────╮
│    │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│ rb │ │ 0 │ andres │ rb   │ 2019 │ │
│    │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│    │ ╭─#─┬─name──┬─lang─┬─year─╮  │
│ rs │ │ 0 │ jt    │ rs   │ 2019 │  │
│    │ │ 1 │ storm │ rs   │ 2021 │  │
│    │ ╰─#─┴─name──┴─lang─┴─year─╯  │
╰────┴──────────────────────────────╯
```

Group column is now named after the grouper, to allow multiple groupers.
```nushell
> $data | group-by lang --to-table  # column names changed!
╭─#─┬─lang─┬────────────items─────────────╮
│ 0 │ rb   │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│   │      │ │ 0 │ andres │ rb   │ 2019 │ │
│   │      │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│ 1 │ rs   │ ╭─#─┬─name──┬─lang─┬─year─╮  │
│   │      │ │ 0 │ jt    │ rs   │ 2019 │  │
│   │      │ │ 1 │ storm │ rs   │ 2021 │  │
│   │      │ ╰─#─┴─name──┴─lang─┴─year─╯  │
╰─#─┴─lang─┴────────────items─────────────╯
```

Grouping by multiple columns makes finer grained aggregations possible.
```nushell
> $data | group-by lang year --to-table
╭─#─┬─lang─┬─year─┬────────────items─────────────╮
│ 0 │ rb   │ 2019 │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│   │      │      │ │ 0 │ andres │ rb   │ 2019 │ │
│   │      │      │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│ 1 │ rs   │ 2019 │ ╭─#─┬─name─┬─lang─┬─year─╮   │
│   │      │      │ │ 0 │ jt   │ rs   │ 2019 │   │
│   │      │      │ ╰─#─┴─name─┴─lang─┴─year─╯   │
│ 2 │ rs   │ 2021 │ ╭─#─┬─name──┬─lang─┬─year─╮  │
│   │      │      │ │ 0 │ storm │ rs   │ 2021 │  │
│   │      │      │ ╰─#─┴─name──┴─lang─┴─year─╯  │
╰─#─┴─lang─┴─year─┴────────────items─────────────╯
```

Grouping by multiple columns, without `--to-table` returns a nested
structure.
This is equivalent to `$data | group-by year | split-by lang`, making
`split-by` obsolete.
```nushell
> $data | group-by lang year
╭────┬─────────────────────────────────────────╮
│    │ ╭──────┬──────────────────────────────╮ │
│ rb │ │      │ ╭─#─┬──name──┬─lang─┬─year─╮ │ │
│    │ │ 2019 │ │ 0 │ andres │ rb   │ 2019 │ │ │
│    │ │      │ ╰─#─┴──name──┴─lang─┴─year─╯ │ │
│    │ ╰──────┴──────────────────────────────╯ │
│    │ ╭──────┬─────────────────────────────╮  │
│ rs │ │      │ ╭─#─┬─name─┬─lang─┬─year─╮  │  │
│    │ │ 2019 │ │ 0 │ jt   │ rs   │ 2019 │  │  │
│    │ │      │ ╰─#─┴─name─┴─lang─┴─year─╯  │  │
│    │ │      │ ╭─#─┬─name──┬─lang─┬─year─╮ │  │
│    │ │ 2021 │ │ 0 │ storm │ rs   │ 2021 │ │  │
│    │ │      │ ╰─#─┴─name──┴─lang─┴─year─╯ │  │
│    │ ╰──────┴─────────────────────────────╯  │
╰────┴─────────────────────────────────────────╯
```

From #2607:
> Here's a couple more examples without much explanation. This one shows
adding two grouping keys. I'm always wanting to add more columns when
using group-by and it just-work™️ `gb.exe -f movies-2.csv -k 3,2 -s 7
--skip_header`
> 
> ```
>  k:3                   | k:2       | count | sum:7
> -----------------------+-----------+-------+--------------------
>  20th Century Fox      | Drama     | 1     | 117.09
>  20th Century Fox      | Romance   | 1     | 39.66
>  CBS                   | Comedy    | 1     | 77.09
>  Disney                | Animation | 4     | 1264.23
>  Disney                | Comedy    | 4     | 950.27
>  Fox                   | Comedy    | 5     | 661.85
>  Independent           | Comedy    | 7     | 399.07
>  Independent           | Drama     | 4     | 69.75
>  Independent           | Romance   | 7     | 1048.75
>  Independent           | romance   | 1     | 29.37
> ...
> ```

This example can be achieved like this:
```nushell
> open movies-2.csv
  | group-by "Lead Studio" Genre --to-table
  | insert count {get items | length}
  | insert sum { get items."Worldwide Gross" | math sum}
  | reject items
  | sort-by "Lead Studio" Genre
╭─#──┬──────Lead Studio──────┬───Genre───┬─count─┬───sum───╮
│ 0  │ 20th Century Fox      │ Drama     │     1 │  117.09 │
│ 1  │ 20th Century Fox      │ Romance   │     1 │   39.66 │
│ 2  │ CBS                   │ Comedy    │     1 │   77.09 │
│ 3  │ Disney                │ Animation │     4 │ 1264.23 │
│ 4  │ Disney                │ Comedy    │     4 │  950.27 │
│ 5  │ Fox                   │ Comedy    │     5 │  661.85 │
│ 6  │ Fox                   │ comedy    │     1 │   60.72 │
│ 7  │ Independent           │ Comedy    │     7 │  399.07 │
│ 8  │ Independent           │ Drama     │     4 │   69.75 │
│ 9  │ Independent           │ Romance   │     7 │ 1048.75 │
│ 10 │ Independent           │ romance   │     1 │   29.37 │
...
```
2024-11-15 06:40:49 -06:00
..
src No longer autoload deprecated-dirs (#14242) 2024-11-05 21:53:41 +01:00
std No longer autoload deprecated-dirs (#14242) 2024-11-05 21:53:41 +01:00
tests Fix dirs removal warning (#14029) 2024-10-09 08:03:33 -05:00
Cargo.toml Bump to dev version 0.100.1 (#14328) 2024-11-14 10:04:39 +01:00
CONTRIBUTING.md Surprising symlink resolution for std path add (#13258) 2024-06-28 18:11:48 -05:00
LICENSE add LICENSE to nu-std (#8803) 2023-04-07 13:39:21 -07:00
README.md Change the usage misnomer to "description" (#13598) 2024-08-22 12:02:08 +02:00
testing.nu add multiple grouper support to group-by (#14337) 2024-11-15 06:40:49 -06:00

Welcome to the standard library of `nushell`!

The standard library is a pure-nushell collection of custom commands which provide interactive utilities and building blocks for users writing casual scripts or complex applications.

To see what's here:

> use std
> scope commands | select name description | where name =~ "std "
#┬───────────name────────────┬───────────────────description───────────────────
0│std assert                 │Universal assert command
1│std assert equal           │Assert $left == $right
2│std assert error           │Assert that executing the code generates an error
3│std assert greater         │Assert $left > $right
4│std assert greater or equal│Assert $left >= $right
             ...                                     ...
─┴───────────────────────────┴─────────────────────────────────────────────────

🧰 Using the standard library in the REPL or in scripts

All commands in the standard library must be "imported" into the running environment (the interactive read-execute-print-loop (REPL) or a .nu script) using the use command.

You can choose to import the whole module, but then must refer to individual commands with a std prefix, e.g:

use std

std log debug "Running now"
std assert (1 == 2)

Or you can enumerate the specific commands you want to import and invoke them without the std prefix.

use std ["log debug" assert]

log debug "Running again"
assert (2 == 1)

This is probably the form of import you'll want to add to your env.nu for interactive use.

✏️ contribute to the standard library

You're invited to contribute to the standard library! See CONTRIBUTING.md for details