<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx
you can also mention related issues, PRs or discussions!
-->
# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.
Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR adds a new command that outputs a NuDataFrame or NuLazyFrame in
its repr format, which can then be ingested in another polars instance.
Advantages of serializing a dataframe in this format are that it can be
viewed as a table, carries type information, and can easily be copied to
the clipboard.
```nushell
# In Nushell
> [[a b]; [2025-01-01 2] [2025-01-02 4]] | polars into-df | polars into-lazy | polars into-repr
shape: (2, 2)
┌─────────────────────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ datetime[ns] ┆ i64 │
╞═════════════════════╪═════╡
│ 2025-01-01 00:00:00 ┆ 2 │
│ 2025-01-02 00:00:00 ┆ 4 │
└─────────────────────┴─────┘
```
```python
# In python
>>> import polars as pl
>>> df = pl.from_repr("""
... shape: (2, 2)
... ┌─────────────────────┬─────┐
... │ a ┆ b │
... │ --- ┆ --- │
... │ datetime[ns] ┆ i64 │
... ╞═════════════════════╪═════╡
... │ 2025-01-01 00:00:00 ┆ 2 │
... │ 2025-01-02 00:00:00 ┆ 4 │
... └─────────────────────┴─────┘""")
shape: (2, 2)
┌─────────────────────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ datetime[ns] ┆ i64 │
╞═════════════════════╪═════╡
│ 2025-01-01 00:00:00 ┆ 2 │
│ 2025-01-02 00:00:00 ┆ 4 │
└─────────────────────┴─────┘
>>> df.select(pl.col("a").dt.offset_by("12m"))
shape: (2, 1)
┌─────────────────────┐
│ a │
│ --- │
│ datetime[ns] │
╞═════════════════════╡
│ 2025-01-01 00:12:00 │
│ 2025-01-02 00:12:00 │
└─────────────────────┘
```
# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
A new command `polars into-repr` is added. No other commands are
impacted by the changes in this PR.
# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.
Make sure you've run and fixed any issues with these commands:
- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library
> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
Examples were added in the command definition.
# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx
you can also mention related issues, PRs or discussions!
-->
# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.
Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR adds type checking of all command input types at run-time.
Generally, these errors should be caught by the parser, but sometimes we
can't know the type of a value at parse-time. The simplest example is
using the `echo` command, which has an output type of `any`, so
prefixing a literal with `echo` will bypass parse-time type checking.
Before this PR, each command has to individually check its input types.
This can result in scenarios where the input/output types don't match
the actual command behavior. This can cause valid usage with an
non-`any` type to become a parse-time error if a command is missing that
type in its pipeline input/output (`drop nth` and `history import` do
this before this PR). Alternatively, a command may not list a type in
its input/output types, but doesn't actually reject that type in its
code, which can have unintended side effects (`get` does this on an
empty pipeline input, and `sort` used to before #13154).
After this PR, the type of the pipeline input is checked to ensure it
matches one of the input types listed in the proceeding command's
input/output types. While each of the issues in the "before this PR"
section could be addressed with each command individually, this PR
solves this issue for _all_ commands.
**This will likely cause some breakage**, as some commands have
incorrect input/output types, and should be adjusted. Also, some scripts
may have erroneous usage of commands. In writing this PR, I discovered
that `toolkit.nu` was passing `null` values to `str join`, which doesn't
accept nothing types (if folks think it should, we can adjust it in this
PR or in a different PR). I found some issues in the standard library
and its tests. I also found that carapace's vendor script had an
incorrect chaining of `get -i`:
```nushell
let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)
```
Before this PR, if the `get -i 0` ever actually did evaluate to `null`,
the second `get` invocation would error since `get` doesn't operate on
`null` values. After this PR, this is immediately a run-time error,
alerting the user to the problematic code. As a side note, we'll need to
PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to
carapace.
A notable exception to the type checking is commands with input type of
`nothing -> <type>`. In this case, any input type is allowed. This
allows piping values into the command without an error being thrown. For
example, `123 | echo $in` would be an error without this exception.
Additionally, custom types bypass type checking (I believe this also
happens during parsing, but not certain)
I added a `is_subtype` method to `Value` and `PipelineData`. It
functions slightly differently than `get_type().is_subtype()`, as noted
in the doccomments. Notably, it respects structural typing of lists and
tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]`
is a subtype of `table<a: int>`, whereas the type returned by
`Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some
special handling for `ListStream`s and `ByteStream`s. The latter was
needed for this PR to work properly with external commands.
Here's some examples.
Before:
```nu
1..2 | drop nth 1
Error: nu::parser::input_type_mismatch
× Command does not support range input.
╭─[entry #9:1:8]
1 │ 1..2 | drop nth 1
· ────┬───
· ╰── command doesn't support range input
╰────
echo 1..2 | drop nth 1
# => ╭───┬───╮
# => │ 0 │ 1 │
# => ╰───┴───╯
```
After this PR, I've adjusted `drop nth`'s input/output types to accept
range input.
Before this PR, zip accepted any value despite not being listed in its
input/output types. This caused different behavior depending on if you
triggered a parse error or not:
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# =>
# => × Command does not support int input.
# => ╭─[entry #3:1:5]
# => 1 │ 1 | zip [2]
# => · ─┬─
# => · ╰── command doesn't support int input
# => ╰────
echo 1 | zip [2]
# => ╭───┬───────────╮
# => │ 0 │ ╭───┬───╮ │
# => │ │ │ 0 │ 1 │ │
# => │ │ │ 1 │ 2 │ │
# => │ │ ╰───┴───╯ │
# => ╰───┴───────────╯
```
After this PR, it works the same in both cases. For cases like this, if
we do decide we want `zip` or other commands to accept any input value,
then we should explicitly add that to the input types.
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# =>
# => × Command does not support int input.
# => ╭─[entry #3:1:5]
# => 1 │ 1 | zip [2]
# => · ─┬─
# => · ╰── command doesn't support int input
# => ╰────
echo 1 | zip [2]
# => Error: nu:🐚:only_supports_this_input_type
# =>
# => × Input type not supported.
# => ╭─[entry #14:2:6]
# => 2 │ echo 1 | zip [2]
# => · ┬ ─┬─
# => · │ ╰── only list<any> and range input data is supported
# => · ╰── input type: int
# => ╰────
```
# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
**Breaking change**: The type of a command's input is now checked
against the input/output types of that command at run-time. While these
errors should mostly be caught at parse-time, in cases where they can't
be detected at parse-time they will be caught at run-time instead. This
applies to both internal commands and custom commands.
Example function and corresponding parse-time error (same before and
after PR):
```nushell
def foo []: int -> nothing {
print $"my cool int is ($in)"
}
1 | foo
# => my cool int is 1
"evil string" | foo
# => Error: nu::parser::input_type_mismatch
# =>
# => × Command does not support string input.
# => ╭─[entry #16:1:17]
# => 1 │ "evil string" | foo
# => · ─┬─
# => · ╰── command doesn't support string input
# => ╰────
# =>
```
Before:
```nu
echo "evil string" | foo
# => my cool int is evil string
```
After:
```nu
echo "evil string" | foo
# => Error: nu:🐚:only_supports_this_input_type
# =>
# => × Input type not supported.
# => ╭─[entry #17:1:6]
# => 1 │ echo "evil string" | foo
# => · ──────┬────── ─┬─
# => · │ ╰── only int input data is supported
# => · ╰── input type: string
# => ╰────
```
Known affected internal commands which erroneously accepted any type:
* `str join`
* `zip`
* `reduce`
# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.
Make sure you've run and fixed any issues with these commands:
- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library
> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`
# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
* Play whack-a-mole with the commands and scripts this will inevitably
break
# Description
Fix the docs repo CI build error here:
https://github.com/nushell/nushell.github.io/actions/runs/12425087184/job/34691291790#step:5:18
The doc generated by `make_docs.nu` for `polars profile` command will
make the CI build fail due to the indention error of markdown front
matters. I used to fix it manually before, for the long run, it's better
to fix it from the source code.
- fixes#14572
# Description
This allowed columns to be coalesced on full joins with `polars join`,
providing functionality simlar to the old `--outer` join behavior.
# User-Facing Changes
- Provides a new flag `--coalesce-columns` on the `polars join` command
The `--name` flag of `polars with-column` only works when used with an
eager dataframe. I will not work with lazy dataframes and it will not
work when used with expressions (which forces a conversion to a
lazyframe). This pull request adds better documentation to the flags and
errors messages when used in cases where it will not work.
Closes#13654
# User-Facing Changes
- Short flags are now fully type-checked,
including null and record signatures for literal arguments:
```nushell
def test [-v: record<l: int>] {};
test -v null # error
test -v {l: ""} # error
def test2 [-v: int] {};
let v = ""
test2 -v $v # error
```
- `polars unpivot` `--index`/`--on` and `into value --columns`
now accept `list` values
# Description
Provides the ability to decomes struct columns into seperate columns for
each field:
<img width="655" alt="Screenshot 2024-10-16 at 09 57 22"
src="https://github.com/user-attachments/assets/6706bd36-8d38-4365-b58d-ba82f2d5ba9a">
# User-Facing Changes
- provides a new command `polars unnest` for decomposing struct fields
into separate columns.
Closes#13654
# User-Facing Changes
- Short flags are now fully type-checked,
including null and record signatures for literal arguments:
```nushell
def test [-v: record<l: int>] {};
test -v null # error
test -v {l: ""} # error
def test2 [-v: int] {};
let v = ""
test2 -v $v # error
```
- `polars unpivot` `--index`/`--on` and `into value --columns`
now accept `list` values
# Description
Provides the ability to decomes struct columns into seperate columns for
each field:
<img width="655" alt="Screenshot 2024-10-16 at 09 57 22"
src="https://github.com/user-attachments/assets/6706bd36-8d38-4365-b58d-ba82f2d5ba9a">
# User-Facing Changes
- provides a new command `polars unnest` for decomposing struct fields
into separate columns.
# Description
This fixes an issue with converting to a dataframe when specifying a
struct in the schema. Things like the following now work correctly:
```nushell
[[foo bar]; [{a: "a_0", b:"b_0"} 1] [{a: "a_1", b: "b_1" } 2]] | polars into-df -s {foo: {a: str, b: str}, bar: u8}
```
# Description
Introduces a new flag `--truncate-ragged-lines` for `polars open` that
will truncate lines that are longer than the schema.
# User-Facing Changes
- Introduction of the flag `--truncate-ragged-lines` for `polars open`
# Description
This request exposes the prelude::polars::len expression. It is ended
for doing fast select count(*) like operations:
<img width="626" alt="Screenshot 2024-09-26 at 18 14 45"
src="https://github.com/user-attachments/assets/74285fc6-f99c-46e0-9226-9a7d41738d78">
# User-Facing Changes
- Introduction of the `polars len` command
# Description
Fixes: #12726 and #13185
Previously converting columns that contained null caused polars to force
a dtype of object even when using a schema.
Now:
1. When using a schema, the type the schema defines for the column will
always be used.
2. When a schema is not used, the previous type is used when a value is
null.
# User-Facing Changes
- The type defined by the schema we be respected when passing in a null
value `[a]; [null] | polars into-df -s {a: str}` will create a df with
an str dtype column with one null value versus a column of type object.
- *BREAKING CHANGE* If you define a schema, all columns must be in the
schema.
# Description
In order to be more consistent with it's nu counterpart, `polars save`
now returns an empty pipeline instead of a message of the saved file.
# User-Facing Changes
- `polars save` no longer displays a save message, making it consistent
with `save` behavior.
# Description
This PR:
- Removes the lazy_command, expr_command macros and migrates the
commands that were utilizing them.
- Removes the custom logic in DataFrameValues::is_equals to use the
polars DataFrame version of PartialEq
- Adds examples to commands that previously did not have examples or had
inadequate ones.
NOTE: A lot of examples now have a `polars sort` at the end. This is
needed due to the comparison in the result. The new polars version of
equals cares about the ordering. I removed the custom equals logic as it
causes comparisons to lock up when comparing dataframes that contain a
row that contains a list. I discovered this issue when adding examples
to `polars implode`
# Description
Previously there were no examples or explanations that you can use '*'
to select all columns. Updated description and added a new example.