Commit Graph

46 Commits

Author SHA1 Message Date
Jack Wright
0f6996b70d
Support for reading Categorical and Enum types (#15292)
# fixes https://github.com/nushell/nushell/issues/15281

# Description
Provides the ability read dataframes with Categorical and Enum data

The ability to write Categorical and Enum data will provided in a future
PR
2025-03-12 22:11:00 +01:00
Jack Wright
2dab65f852
Polars: Map pq extension to parquet files (#15284)
# Description
Files with the extension pq will automatically be treated as parquet
files.

closes #15282
2025-03-10 16:25:34 -05:00
Matthias Meschede
88bbe4abaa
Add Xor to polars plugin nu_expressions (#15249)
solution for #15242 ,  based on PR #15248 .

Allows doing this:

```
~/Projects/nushell> [[a, b]; [1., 2.], [3.,3.], [4., 6.]] | polars into-df | polars filter (((polars col a) < 2) xor ((polars col b) > 5))
╭───┬──────┬──────╮
│ # │  a   │  b   │
├───┼──────┼──────┤
│ 0 │ 1.00 │ 2.00 │
│ 1 │ 4.00 │ 6.00 │
╰───┴──────┴──────╯
```
2025-03-05 08:03:35 -08:00
Jack Wright
c504c93a1d
Polars: Minor code cleanup (#15144)
# Description
Removing todos and deadcode from a previous refactor
2025-02-19 09:47:21 -08:00
Ian Manske
62e56d3581
Rework operator type errors (#14429)
# Description

This PR adds two new `ParseError` and `ShellError` cases for type errors
relating to operators.
- `OperatorUnsupportedType` is used when a type is not supported by an
operator in any way, shape, or form. E.g., `+` does not support `bool`.
- `OperatorIncompatibleTypes` is used when a operator is used with types
it supports, but the combination of types provided cannot be used
together. E.g., `filesize + duration` is not a valid combination.

The other preexisting error cases related to operators have been removed
and replaced with the new ones above. Namely:

- `ShellError::OperatorMismatch`
- `ShellError::UnsupportedOperator`
- `ParseError::UnsupportedOperationLHS`
- `ParseError::UnsupportedOperationRHS`
- `ParseError::UnsupportedOperationTernary`

# User-Facing Changes

- `help operators` now lists the precedence of `not` as 55 instead of 0
(above the other boolean operators). Fixes #13675.
- `math median` and `math mode` now ignore NaN values so that `[NaN NaN]
| math median` and `[NaN NaN] | math mode` no longer trigger a type
error. Instead, it's now an empty input error. Fixing this in earnest
can be left for a future PR.
- Comparisons with `nan` now return false instead of causing an error.
E.g., `1 == nan` is now `false`.
- All the operator type errors have been standardized and reworked. In
particular, they can now have a help message, which is currently used
for types errors relating to `++`.

```nu
[1] ++ 2
```
```
Error: nu::parser::operator_unsupported_type

  × The '++' operator does not work on values of type 'int'.
   ╭─[entry #1:1:5]
 1 │ [1] ++ 2
   ·     ─┬ ┬
   ·      │ ╰── int
   ·      ╰── does not support 'int'
   ╰────
  help: if you meant to append a value to a list or a record to a table, use the `append` command or wrap the value in a list. For example: `$list ++ $value` should be
        `$list ++ [$value]` or `$list | append $value`.
```
2025-02-12 20:03:40 -08:00
Jack Wright
c0b4d19761
Polars upgrade to 0.46 (#14933)
Upgraded to Polars 0.46
2025-01-27 13:01:39 -06:00
pyz4
0ad5f4389c
nu_plugin_polars: add polars into-repr to display dataframe in portable repr format (#14917)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR adds a new command that outputs a NuDataFrame or NuLazyFrame in
its repr format, which can then be ingested in another polars instance.
Advantages of serializing a dataframe in this format are that it can be
viewed as a table, carries type information, and can easily be copied to
the clipboard.

```nushell
# In Nushell
> [[a b]; [2025-01-01 2] [2025-01-02 4]] | polars into-df | polars into-lazy | polars into-repr

shape: (2, 2)
┌─────────────────────┬─────┐
│ a                   ┆ b   │
│ ---                 ┆ --- │
│ datetime[ns]        ┆ i64 │
╞═════════════════════╪═════╡
│ 2025-01-01 00:00:00 ┆ 2   │
│ 2025-01-02 00:00:00 ┆ 4   │
└─────────────────────┴─────┘
```

```python
# In python
>>> import polars as pl
>>> df = pl.from_repr("""
... shape: (2, 2)
... ┌─────────────────────┬─────┐
... │ a                   ┆ b   │
... │ ---                 ┆ --- │
... │ datetime[ns]        ┆ i64 │
... ╞═════════════════════╪═════╡
... │ 2025-01-01 00:00:00 ┆ 2   │
... │ 2025-01-02 00:00:00 ┆ 4   │
... └─────────────────────┴─────┘""")
shape: (2, 2)
┌─────────────────────┬─────┐
│ a                   ┆ b   │
│ ---                 ┆ --- │
│ datetime[ns]        ┆ i64 │
╞═════════════════════╪═════╡
│ 2025-01-01 00:00:00 ┆ 2   │
│ 2025-01-02 00:00:00 ┆ 4   │
└─────────────────────┴─────┘

>>> df.select(pl.col("a").dt.offset_by("12m"))
shape: (2, 1)
┌─────────────────────┐
│ a                   │
│ ---                 │
│ datetime[ns]        │
╞═════════════════════╡
│ 2025-01-01 00:12:00 │
│ 2025-01-02 00:12:00 │
└─────────────────────┘
```

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
A new command `polars into-repr` is added. No other commands are
impacted by the changes in this PR.

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
Examples were added in the command definition.

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2025-01-27 06:02:18 -06:00
Jack Wright
23ba613b00
Polars AWS S3 support (#14648)
# Description

Provides Amazon S3 support.

- Utilizes your existing AWS cli configuration. 
- Supports AWS SSO
- Supports
[gimme-aws-creds](https://github.com/Nike-Inc/gimme-aws-creds).
- respects the settings of AWS_PROFILE environment variable for
selecting profile config
- AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION environment
variables for configuring without an AWS config

Usage:
```nushell
polars open s3://bucket/and/path.parquet
```

Supports:
- CSV
- Parquet
- NDJSON / json lines
- Arrow

Doesn't support:
- eager dataframes
-  Avro
- JSON
2024-12-25 06:15:50 -06:00
Jack Wright
219b44a04f
Improve handling of columns with null values (#14588)
Addresses some null handling issues in #6882

# Description

This changes the implementation of guessing a column type when a schema
is not specified.

New behavior:
1. Use the first non-Value::Nothing value type for the columns data type
2. If the value type changes (ignoring Value::Nothing) in subsequent
values, the datatype will be changed to DataType::Object("Value", None)
3. If a column type does not have a value type,
DataType::Object("Value", None) will be assumed.
2024-12-14 18:36:01 -06:00
Jack Wright
c63bb81c3e
Convert Filesize to Int (#14491)
# Description
Fixes the conversion of Value::Filesize to Value::Int allowing things
like `ps | polars into-df` to work correctly.
2024-12-03 06:08:41 -06:00
Jack Wright
0172ad8461
Upgrading to polars 0.44 (#14478)
Upgrading to polars 0.44
2024-11-29 19:39:07 -06:00
Jack Wright
9d0f69ac50
Add support for converting polars decimal values to nushell values (#14343)
Adds support for converting from polars decimal type to nushell values.

This fix works by first converting a polars decimal series to an f64
series, then converting to Value::Float

Co-authored-by: Jack Wright <jack.wright@nike.com>
2024-11-15 12:10:38 +08:00
Darren Schroeder
b6eda33438
allow != for polars (#14263)
# Description

This PR fixes a problem where not equal in polars wasn't working with
strings.

## Before
```nushell
let a = ls | polars into-df
$a.type != "dir"
Error: nu:🐚:type_mismatch

  × Type mismatch during operation.
   ╭─[entry #16:1:1]
 1 │ $a.type != "dir"
   · ─┬      ─┬ ──┬──
   ·  │       │   ╰── string
   ·  │       ╰── type mismatch for operator
   ·  ╰── NuDataFrame
   ╰────
```

## After
```nushell
let a = ls | polars into-df
$a.type != "dir"
╭──#──┬─type──╮
│ 0   │ false │
│ 1   │ false │
│ 2   │ false │
...
```

/cc @ayax79 to make sure I did this right.

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2024-11-06 15:58:22 -08:00
Ian Manske
e87a35104a
Remove as_i64 and as_f64 (#14258)
# Description
Turns out there are duplicate conversion functions: `as_i64` and
`as_f64`. In most cases, these can be replaced with `as_int` and
`as_float`, respectively.
2024-11-05 09:28:56 +01:00
Jack Wright
ae54d05930
Upgrade to polars 0.43 (#14148)
Upgrades the polars plugin to polars version 0.43
2024-10-23 19:14:24 +02:00
Jack Wright
2df91e7f92
Removed CustomValue portion of CustomValue type name strings. (#14054)
# Description

This changes the names returned by CustomValue::name() of the various
custom value structs to just say the name of the thing they represent.
For instance "DataFrameCustomValue" is not just "DataFrame".

# User-Facing Changes
- Places such as or errors where NuDataFrameCustomValue would be seen,
now just shows as NuDataFrame.
2024-10-11 06:41:24 -05:00
Jack Wright
1d6ac16530
polars into-df struct fix (#13977)
# Description
This fixes an issue with converting to a dataframe when specifying a
struct in the schema. Things like the following now work correctly:
```nushell
 [[foo bar]; [{a: "a_0", b:"b_0"} 1] [{a: "a_1", b: "b_1" } 2]] | polars into-df -s {foo: {a: str, b: str}, bar: u8}
```
2024-10-02 05:59:14 -05:00
Skyler Hawthorne
5fa9d76500
polars: add binary type support (#13830)
# Description
This adds support for reading and writing binary types in the polars
commands.

The `BinaryOffset` type can be read into a Nushell native `Value` type
no problem, but unfortunately this is a lossy conversion, as there's
no Nushell-native semantic equivalent to the fixed size binary type
in Arrow.

# User-Facing Changes

`polars open` and `polars save` now work with binary types.
2024-09-23 06:28:41 -05:00
Jack Wright
af77bc60e2
Improved null handling when converting from nu -> dataframe. (#13855)
# Description
Fixes: #12726 and #13185

Previously converting columns that contained null caused polars to force
a dtype of object even when using a schema.

Now:
1. When using a schema, the type the schema defines for the column will
always be used.
2. When a schema is not used, the previous type is used when a value is
null.

# User-Facing Changes
- The type defined by the schema we be respected when passing in a null
value `[a]; [null] | polars into-df -s {a: str}` will create a df with
an str dtype column with one null value versus a column of type object.
- *BREAKING CHANGE* If you define a schema, all columns must be in the
schema.
2024-09-16 18:07:13 -05:00
Jack Wright
c535c24d03
catch unwrap on panics with polars collect (#13850)
# Description
This resurrects the work from #12866 and fixes #12732. 

Polars panics for a plethora or reasons. While handling panics is
generally frowned upon, in cases like with `polars collect` a panic
cause a lot of work to be lost. Often you might have multiple dataframes
in memory and you are trying one operation and lose all state.

While it possible the panic can leave things a strange state, it is
pretty unlikely as part of a polars pipeline. Most of the time polars
objects are not manipulating dataframes in memory mutability, but rather
creating a new dataframe the operations being applied. This is always
the case with a lazy pipeline. After the collect call, the original
dataframes are intact still and I haven't observed any side effects.
2024-09-15 07:21:02 -05:00
Jack Wright
8d60c0d35d
Migrating polars commands away from macros, removed custom DataFrame comparison. (#13829)
# Description
This PR:
- Removes the lazy_command, expr_command macros and migrates the
commands that were utilizing them.
- Removes the custom logic in DataFrameValues::is_equals to use the
polars DataFrame version of PartialEq
- Adds examples to commands that previously did not have examples or had
inadequate ones.

NOTE: A lot of examples now have a `polars sort` at the end. This is
needed due to the comparison in the result. The new polars version of
equals cares about the ordering. I removed the custom equals logic as it
causes comparisons to lock up when comparing dataframes that contain a
row that contains a list. I discovered this issue when adding examples
to `polars implode`
2024-09-11 10:33:05 -07:00
Jack Wright
0119534f61
Expression support polars replace and polars replace-all (#13726)
# Description
Adds the ability for `polars replace` and `polars replace-all` to work
as expressions.

# User-Facing Changes
- `polars replace` can be used with polars expressions
- `polars replace-all` can be used with polars expressions
2024-08-29 13:59:44 -07:00
Jack Wright
4ff33933dd
Merge polars sink and polars to-* to polars save (#13568)
# Description
This pull request merges `polars sink` and `polars to-*` into one
command `polars save`.


# User-Facing Changes
- `polars to-*` commands have all been replaced with `polars save`. When
saving a lazy frame to a type that supports a polars sink operation, a
sink operation will be performed. Sink operations are much more
performant, performing a collect while streaming to the file system.
2024-08-08 09:46:45 -07:00
Jack Wright
8316a1597e
Polars: Check to see if the cache is empty before enabling GC. More logging (#13286)
There was a bug where anytime the plugin cache remove was called, the
plugin gc was turned back on. This probably happened when I added the
reference counter logic.
2024-07-03 06:44:26 -05:00
Jack Wright
720b4cbd01
Polars 0.41 Upgrade (#13238)
# Description
Upgrading to Polars 0.41

# User-Facing Changes
* `polars melt` has been renamed to `polars unpivot` to match the change
in the polars API. Additionally, it now supports lazy dataframes.
Introduced a `--streamable` option to use the polars streaming engine
for lazy frames.
* The parameter `outer` has been replaced with `full` in `polars join`
to match polars change.
* `polars value-count` now supports the column (rename count column),
parallelize (multithread), sort, and normalize options.

The list of polars changes can be found
[here](https://github.com/pola-rs/polars/releases/tag/rs-0.41.2)
2024-06-28 06:37:45 -05:00
Jack Wright
021b8633cb
Allow the addition of an index column to be optional (#13097)
Per discussion on discord dataframes channel with @maxim-uvarov and pyz.

When converting a dataframe to an nushell value via `polars into-nu`,
the index column should not be added by default and should only be added
when specifying `--index`
2024-06-10 10:45:25 +08:00
Jack Wright
650ae537c3
Fix the use of right hand expressions in operations (#13096)
As reported by @maxim-uvarov and pyz in the dataframes discord channel:

```nushell
[[a b]; [1 1] [1 2] [2 1] [2 2] [3 1] [3 2]] | polars into-df | polars with-column ((polars col a) / (polars col b)) --name c


  × Type mismatch.
   ╭─[entry #45:1:102]
 1 │ [[a b]; [1 1] [1 2] [2 1] [2 2] [3 1] [3 2]] | polars into-df | polars with-column ((polars col a) / (polars col b)) --name c
   ·                                                                                                      ───────┬──────
   ·                                                                                                             ╰── Right hand side not a dataframe expression
   ╰────
```

This pull request corrects the type casting on the right hand side and
allows more than just polars literal expressions.
2024-06-10 10:44:04 +08:00
Jack Wright
a6b1d1f6d9
Upgrade to polars 0.40 (#13069)
Upgrading to polars 0.40
2024-06-06 07:26:47 +08:00
Jack Wright
b10325dff1
Allow int values to be converted into floats. (#13025)
Addresses the bug found by @maxim-uvarov when trying to coerce an int
Value to a polars float:

<img width="863" alt="image"
src="https://github.com/nushell/nushell/assets/56345/4d858812-a7b3-4296-98f4-dce0c544b4c6">

Conversion now works correctly:

<img width="891" alt="Screenshot 2024-05-31 at 14 28 51"
src="https://github.com/nushell/nushell/assets/56345/78d9f711-7ad5-4503-abc6-7aba64a2e675">
2024-06-04 18:51:11 -07:00
Ian Manske
84b7a99adf
Revert "Polars lazy refactor (#12669)" (#12962)
This reverts commit 68adc4657f.

# Description

Reverts the lazyframe refactor (#12669) for the next release, since
there are still a few lingering issues. This temporarily solves #12863
and #12828. After the release, the lazyframes can be added back and
cleaned up.
2024-05-24 18:09:26 -05:00
Ian Manske
aec41f3df0
Add Span merging functions (#12511)
# Description
This PR adds a few functions to `Span` for merging spans together:
- `Span::append`: merges two spans that are known to be in order.
- `Span::concat`: returns a span that encompasses all the spans in a
slice. The spans must be in order.
- `Span::merge`: merges two spans (no order necessary).
- `Span::merge_many`: merges an iterator of spans into a single span (no
order necessary).

These are meant to replace the free-standing `nu_protocol::span`
function.

The spans in a `LiteCommand` (the `parts`) should always be in order
based on the lite parser and lexer. So, the parser code sees the most
usage of `Span::append` and `Span::concat` where the order is known. In
other code areas, `Span::merge` and `Span::merge_many` are used since
the order between spans is often not known.
2024-05-16 22:34:49 +00:00
Ian Manske
6fd854ed9f
Replace ExternalStream with new ByteStream type (#12774)
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
    Read(Box<dyn Read + Send + 'static>),
    File(File),
    Child(ChildProcess),
}
```

This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.

Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
    Empty,
    Value(Value, Option<PipelineMetadata>),
    ListStream(ListStream, Option<PipelineMetadata>),
    ByteStream(ByteStream, Option<PipelineMetadata>),
}
```

The PR is relatively large, but a decent amount of it is just repetitive
changes.

This PR fixes #7017, fixes #10763, and fixes #12369.

This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |

# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.

# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.

---------

Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
2024-05-16 07:11:18 -07:00
Jack Wright
98369985b1
Allow custom value operations to work on eager and lazy dataframes interchangeably. (#12819)
Fixes Bug #12809 

The example that @maxim-uvarov posted now works as expected:

<img width="1223" alt="Screenshot 2024-05-09 at 16 21 01"
src="https://github.com/nushell/nushell/assets/56345/a4df62e3-e432-4c09-8e25-9a6c198741a3">
2024-05-13 18:17:31 -05:00
Jack Wright
68adc4657f
Polars lazy refactor (#12669)
This moves to predominantly supporting only lazy dataframes for most
operations. It removes a lot of the type conversion between lazy and
eager dataframes based on what was inputted into the command.

For the most part the changes will mean:
* You will need to run `polars collect` after performing operations
* The into-lazy command has been removed as it is redundant.
* When opening files a lazy frame will be outputted by default if the
reader supports lazy frames

A list of individual command changes can be found
[here](https://hackmd.io/@nucore/Bk-3V-hW0)

---------

Co-authored-by: Ian Manske <ian.manske@pm.me>
2024-05-06 23:19:11 +00:00
Stefan Holderbach
be6137d136
Fix clippy::wrong_self_convention in polars plugin (#12737)
Expected `into_` for `fn(self) -> T`
2024-05-02 19:31:51 +02:00
Jack Wright
a60381a932
Added commands for working with the plugin cache. (#12576)
# Description
This pull request provides three new commands:
`polars store-ls` - moved from `polars ls`. It provides the list of all
object stored in the plugin cache
`polars store-rm` - deletes a cached object
`polars store-get` - gets an object from the cache. 

The addition of `polars store-get` required adding a reference_count to
cached entries. `polars get` is the only command that will increment
this value. `polars rm` will remove the value despite it's count. Calls
to PolarsPlugin::custom_value_dropped will decrement the value.

The prefix store- was chosen due to there already being a `polars cache`
command. These commands were not made sub-commands as there isn't a way
to display help for sub commands in plugins (e.g. `polars store`
displaying help) and I felt the store- seemed fine anyways.

The output of `polars store-ls` now shows the reference count for each
object.

# User-Facing Changes
polars ls has now moved to polars store-ls

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-21 19:43:43 -05:00
Jack Wright
cc7b5c5a26
Only mark collected dataframes as from_lazy=false when collect is called from the collect command. (#12571)
I had previously changed NuLazyFrame::collect to set the NuDataFrame's
from_lazy field to false to prevent conversion back to a lazy frame. It
appears there are cases where this should happen. Instead, I am only
setting from_lazy=false inside the `polars collect` command.

[Related discord
message](https://discord.com/channels/601130461678272522/1227612017171501136/1230600465159421993)

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-18 17:10:38 -05:00
Jack Wright
410f3c5c8a
Upgrading nu_plugin_polars to polars 0.39.1 (#12551)
# Description
Upgrading nu_plugin_polars to polars 0.39.1

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-17 06:35:09 -05:00
Jack Wright
1661bb68f9
Cleaning up to_pipe_line_data and cache_and_to_value, making them part of CustomValueSupport (#12528)
# Description

This is just some cleanup. I moved to_pipeline_data and to_cache_value
to the CustomValueSupport trait, where I should've put them to begin
with.

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-16 06:35:52 -05:00
Jack Wright
5f818eaefe
Ensure that lazy frames converted via to-lazy are not converted back to eager frames later in the pipeline. (#12525)
# Description
@maxim-uvarov discovered the following error:
```
> [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
Error:   × Error using as series
   ╭─[entry #1:1:68]
 1 │ [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
   ·                                                                    ──────┬──────
   ·                                                                          ╰── dataframe has more than one column
   ╰────
 ```
 
During investigation, I discovered the root cause was that the lazy frame was incorrectly converted back to a eager dataframe. In order to keep this from happening, I explicitly set that the dataframe did not come from an eager frame. This causes the conversion logic to not attempt to convert the dataframe later in the pipeline.

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-15 18:29:42 -05:00
Devyn Cairns
2ae9ad8676
Copy-on-write for record values (#12305)
# Description
This adds a `SharedCow` type as a transparent copy-on-write pointer that
clones to unique on mutate.

As an initial test, the `Record` within `Value::Record` is shared.

There are some pretty big wins for performance. I'll post benchmark
results in a comment. The biggest winner is nested access, as that would
have cloned the records for each cell path follow before and it doesn't
have to anymore.

The reusability of the `SharedCow` type is nice and I think it could be
used to clean up the previous work I did with `Arc` in `EngineState`.
It's meant to be a mostly transparent clone-on-write that just clones on
`.to_mut()` or `.into_owned()` if there are actually multiple
references, but avoids cloning if the reference is unique.

# User-Facing Changes
- `Value::Record` field is a different type (plugin authors)

# Tests + Formatting
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`

# After Submitting
- [ ] use for `EngineState`
- [ ] use for `Value::List`
2024-04-14 01:42:03 +00:00
Jack Wright
10a9a17b8c
Two consecutive calls to into-lazy should not fail (#12505)
# Description

From @maxim-uvarov's
[post](https://discord.com/channels/601130461678272522/1227612017171501136/1228656319704203375).

When calling `to-lazy` back to back in a pipeline, an error should not
occur:

```
> [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars into-lazy
Error: nu:🐚:cant_convert

  × Can't convert to NuDataFrame.
   ╭─[entry #1:1:30]
 1 │ [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars into-lazy
   ·                              ────────┬───────
   ·                                      ╰── can't convert NuLazyFrameCustomValue to NuDataFrame
   ╰────
 ```

This pull request ensures that custom value's of NuLazyFrameCustomValue are properly converted when passed in.

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-13 13:00:46 -05:00
Jack Wright
b9dd47ebb7
Polars 0.38 upgrade (#12506)
# Description
Polars 0.38 upgrade for both the dataframe crate and the polars plugin.

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-13 13:00:04 -05:00
Jack Wright
1bded8572c
Ensure that two columns named index don't exist when converting a Dataframe to a nu Value. (#12501)
# Description
@maxim-uvarov discovered an issue with the current implementation. When
executing [[index a]; [1 1]] | polars into-df, a plugin_failed_to_decode
error occurs. This happens because a Record is created with two columns
named "index" as an index column is added during conversion. This pull
request addresses the problem by not adding an index column if there is
already a column named "index" in the dataframe.

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-13 06:33:29 -05:00
Jack Wright
b9c2f9ee56
displaying span information, creation time, and size with polars ls (#12472)
# Description
`polars ls` is already different that `dfr ls`. Currently it just shows
the cache key, columns, rows, and type. I have added:
- creation time
- size
- span contents
-  span start and end

<img width="1471" alt="Screenshot 2024-04-10 at 17 27 06"
src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408">

# Tests + Formatting
Done

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-12 09:23:46 -05:00
Jack Wright
efc1cfa939
Move dataframes support to a plugin (#12220)
WIP

This PR covers migration crates/nu-cmd-dataframes to a new plugin
./crates/nu_plugin_polars

## TODO List

Other:
- [X] Fix examples
- [x] Fix Plugin Test Harness
- [X] Move Cache to Mutex<BTreeMap>
- [X] Logic for disabling/enabling plugin GC based off whether items are
cached.
- [x] NuExpression custom values
- [X] Optimize caching (don't cache every object creation). 
- [x] Fix dataframe operations (in NuDataFrameCustomValue::operations)
- [x] Added plugin_debug! macro that for checking an env variable
POLARS_PLUGIN_DEBUG

Fix duplicated commands:
- [x] There are two polars median commands, one for lazy and one for
expr.. there should only be one that works for both. I temporarily
called on polars expr-median (inside expressions_macros.rs)
- [x] polars quantile (lazy, and expr). the expr one is temporarily
expr-median
- [x] polars is-in (renamed one series-is-in)

Commands:
- [x] AppendDF
- [x] CastDF
- [X] ColumnsDF
- [x] DataTypes
- [x] Summary
- [x] DropDF
- [x] DropDuplicates
- [x] DropNulls
- [x] Dummies
- [x] FilterWith
- [X] FirstDF
- [x] GetDF
- [x] LastDF
- [X] ListDF
- [x] MeltDF
- [X] OpenDataFrame
- [x] QueryDf
- [x] RenameDF
- [x] SampleDF
- [x] SchemaDF
- [x] ShapeDF
- [x] SliceDF
- [x] TakeDF
- [X] ToArrow
- [x] ToAvro
- [X] ToCSV
- [X] ToDataFrame
- [X] ToNu
- [x] ToParquet
- [x] ToJsonLines
- [x] WithColumn
- [x] ExprAlias
- [x] ExprArgWhere
- [x] ExprCol
- [x] ExprConcatStr
- [x] ExprCount
- [x] ExprLit
- [x] ExprWhen
- [x] ExprOtherwise
- [x] ExprQuantile
- [x] ExprList
- [x] ExprAggGroups
- [x] ExprCount
- [x] ExprIsIn
- [x] ExprNot
- [x] ExprMax
- [x] ExprMin
- [x] ExprSum
- [x] ExprMean
- [x] ExprMedian
- [x] ExprStd
- [x] ExprVar
- [x] ExprDatePart
- [X] LazyAggregate
- [x] LazyCache
- [X] LazyCollect
- [x] LazyFetch
- [x] LazyFillNA
- [x] LazyFillNull
- [x] LazyFilter
- [x] LazyJoin
- [x] LazyQuantile
- [x] LazyMedian
- [x] LazyReverse
- [x] LazySelect
- [x] LazySortBy
- [x] ToLazyFrame
- [x] ToLazyGroupBy
- [x] LazyExplode
- [x] LazyFlatten
- [x] AllFalse
- [x] AllTrue
- [x] ArgMax
- [x] ArgMin
- [x] ArgSort
- [x] ArgTrue
- [x] ArgUnique
- [x] AsDate
- [x] AsDateTime
- [x] Concatenate
- [x] Contains
- [x] Cumulative
- [x] GetDay
- [x] GetHour
- [x] GetMinute
- [x] GetMonth
- [x] GetNanosecond
- [x] GetOrdinal
- [x] GetSecond
- [x] GetWeek
- [x] GetWeekDay
- [x] GetYear
- [x] IsDuplicated
- [x] IsIn
- [x] IsNotNull
- [x] IsNull
- [x] IsUnique
- [x] NNull
- [x] NUnique
- [x] NotSeries
- [x] Replace
- [x] ReplaceAll
- [x] Rolling
- [x] SetSeries
- [x] SetWithIndex
- [x] Shift
- [x] StrLengths
- [x] StrSlice
- [x] StrFTime
- [x] ToLowerCase
- [x] ToUpperCase
- [x] Unique
- [x] ValueCount

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-09 19:31:43 -05:00