<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx
you can also mention related issues, PRs or discussions!
-->
# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.
Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR adds a number of math functions under a single `polars math`
command that apply to one or more column expressions.
Note, `polars math` currently resides in the new module
dataframe/command/command/computation/math.rs. I'm open to alternative
organization and naming suggestions.
```nushell
Collection of math functions to be applied on one or more column expressions
This is an incomplete implementation of the available functions listed here: https://docs.pola.rs/api/python/stable/reference/expressions/computation.html.
The following functions are currently available:
- abs
- cos
- dot <expression>
- exp
- log <base; default e>
- log1p
- sign
- sin
- sqrt
Usage:
> polars math <type> ...(args)
Flags:
-h, --help: Display the help message for this command
Parameters:
type <string>: Function name. See extra description for full list of accepted values
...args <any>: Extra arguments required by some functions
Input/output types:
╭───┬────────────┬────────────╮
│ # │ input │ output │
├───┼────────────┼────────────┤
│ 0 │ expression │ expression │
╰───┴────────────┴────────────╯
Examples:
Apply function to column expression
> [[a]; [0] [-1] [2] [-3] [4]]
| polars into-df
| polars select [
(polars col a | polars math abs | polars as a_abs)
(polars col a | polars math sign | polars as a_sign)
(polars col a | polars math exp | polars as a_exp)]
| polars collect
╭───┬───────┬────────┬────────╮
│ # │ a_abs │ a_sign │ a_exp │
├───┼───────┼────────┼────────┤
│ 0 │ 0 │ 0 │ 1.000 │
│ 1 │ 1 │ -1 │ 0.368 │
│ 2 │ 2 │ 1 │ 7.389 │
│ 3 │ 3 │ -1 │ 0.050 │
│ 4 │ 4 │ 1 │ 54.598 │
╰───┴───────┴────────┴────────╯
Specify arguments for select functions. See description for more information.
> [[a]; [0] [1] [2] [4] [8] [16]]
| polars into-df
| polars select [
(polars col a | polars math log 2 | polars as a_base2)]
| polars collect
╭───┬─────────╮
│ # │ a_base2 │
├───┼─────────┤
│ 0 │ -inf │
│ 1 │ 0.000 │
│ 2 │ 1.000 │
│ 3 │ 2.000 │
│ 4 │ 3.000 │
│ 5 │ 4.000 │
╰───┴─────────╯
Specify arguments for select functions. See description for more information.
> [[a b]; [0 0] [1 1] [2 2] [3 3] [4 4] [5 5]]
| polars into-df
| polars select [
(polars col a | polars math dot (polars col b) | polars as ab)]
| polars collect
╭───┬────────╮
│ # │ ab │
├───┼────────┤
│ 0 │ 55.000 │
╰───┴────────╯
```
# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
No breaking changes.
# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.
Make sure you've run and fixed any issues with these commands:
- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library
> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
Example tests were added to `polars math`.
# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
# Description
This pull request does a lot of the heavy lifting needed to supported
more complex dtypes like categorical dtypes. It introduces a new
CustomValue, NuDataType and makes NuSchema a full CustomValue. Further
more it introduces a new command `polars into-dtype` that allows a dtype
to be created. This can then be passed into schemas when they are
created.
```nu
> ❯ : let dt = ("str" | polars to-dtype)
> ❯ : [[a b]; ["one" "two"]] | polars into-df -s {a: $dt, b: str} | polars schema
╭───┬─────╮
│ a │ str │
│ b │ str │
╰───┴─────╯
```
# User-Facing Changes
- Introduces new command `polars into-dtype`, allows dtype variables to
be passed in during schema creation.
# Description
This resurrects the work from #12866 and fixes#12732.
Polars panics for a plethora or reasons. While handling panics is
generally frowned upon, in cases like with `polars collect` a panic
cause a lot of work to be lost. Often you might have multiple dataframes
in memory and you are trying one operation and lose all state.
While it possible the panic can leave things a strange state, it is
pretty unlikely as part of a polars pipeline. Most of the time polars
objects are not manipulating dataframes in memory mutability, but rather
creating a new dataframe the operations being applied. This is always
the case with a lazy pipeline. After the collect call, the original
dataframes are intact still and I haven't observed any side effects.
# Description
This PR:
- Removes the lazy_command, expr_command macros and migrates the
commands that were utilizing them.
- Removes the custom logic in DataFrameValues::is_equals to use the
polars DataFrame version of PartialEq
- Adds examples to commands that previously did not have examples or had
inadequate ones.
NOTE: A lot of examples now have a `polars sort` at the end. This is
needed due to the comparison in the result. The new polars version of
equals cares about the ordering. I removed the custom equals logic as it
causes comparisons to lock up when comparing dataframes that contain a
row that contains a list. I discovered this issue when adding examples
to `polars implode`
There was a bug where anytime the plugin cache remove was called, the
plugin gc was turned back on. This probably happened when I added the
reference counter logic.
In this pull request, I converted the `perf` function within `nu_utils`
to a macro. This change facilitates easier usage within plugins by
allowing the use of `env_logger` and setting `RUST_LOG=nu_plugin_polars`
(or another plugin). Without this conversion, the `RUST_LOG` variable
would need to be set to `RUST_LOG=nu_utils::utils`, which is less
intuitive and impossible to narrow the perf results to one plugin.
# Description
This allows plugins to report their version (and potentially other
metadata in the future). The version is shown in `plugin list` and in
`version`.
The metadata is stored in the registry file, and reflects whatever was
retrieved on `plugin add`, not necessarily the running binary. This can
help you to diagnose if there's some kind of mismatch with what you
expect. We could potentially use this functionality to show a warning or
error if a plugin being run does not have the same version as what was
in the cache file, suggesting `plugin add` be run again, but I haven't
done that at this point.
It is optional, and it requires the plugin author to make some code
changes if they want to provide it, since I can't automatically
determine the version of the calling crate or anything tricky like that
to do it.
Example:
```
> plugin list | select name version is_running pid
╭───┬────────────────┬─────────┬────────────┬─────╮
│ # │ name │ version │ is_running │ pid │
├───┼────────────────┼─────────┼────────────┼─────┤
│ 0 │ example │ 0.93.1 │ false │ │
│ 1 │ gstat │ 0.93.1 │ false │ │
│ 2 │ inc │ 0.93.1 │ false │ │
│ 3 │ python_example │ 0.1.0 │ false │ │
╰───┴────────────────┴─────────┴────────────┴─────╯
```
cc @maxim-uvarov (he asked for it)
# User-Facing Changes
- `plugin list` gets a `version` column
- `version` shows plugin versions when available
- plugin authors *should* add `fn metadata()` to their `impl Plugin`,
but don't have to
# Tests + Formatting
Tested the low level stuff and also the `plugin list` column.
# After Submitting
- [ ] update plugin guide docs
- [ ] update plugin protocol docs (`Metadata` call & response)
- [ ] update plugin template (`fn metadata()` should be easy)
- [ ] release notes
This allows performance debugging to be turned on by setting:
```nushell
$env.POLARS_PLUGIN_PERF = "true"
```
Furthermore, this improves the other plugin debugging by allowing the
env variable for debugging to be set at any time versus having to be
available when nushell is launched:
```nushell
$env.POLARS_PLUGIN_DEBUG = "true"
```
This plugin introduces a `perf` function that will output timing
results. This works very similar to the perf function available in
nu_utils::utils::perf. This version prints everything to std error to
not break the plugin stream and uses the engine interface to see if the
env variable is configured.
This pull requests uses this `perf` function when:
* opening csv files as dataframes
* opening json lines files as dataframes
This will hopefully help provide some more fine grained information on
how long it takes polars to open different dataframes. The `perf` can
also be utilized later for other dataframes use cases.
# Description
This pull request provides three new commands:
`polars store-ls` - moved from `polars ls`. It provides the list of all
object stored in the plugin cache
`polars store-rm` - deletes a cached object
`polars store-get` - gets an object from the cache.
The addition of `polars store-get` required adding a reference_count to
cached entries. `polars get` is the only command that will increment
this value. `polars rm` will remove the value despite it's count. Calls
to PolarsPlugin::custom_value_dropped will decrement the value.
The prefix store- was chosen due to there already being a `polars cache`
command. These commands were not made sub-commands as there isn't a way
to display help for sub commands in plugins (e.g. `polars store`
displaying help) and I felt the store- seemed fine anyways.
The output of `polars store-ls` now shows the reference count for each
object.
# User-Facing Changes
polars ls has now moved to polars store-ls
---------
Co-authored-by: Jack Wright <jack.wright@disqo.com>
# Description
@maxim-uvarov discovered the following error:
```
> [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
Error: × Error using as series
╭─[entry #1:1:68]
1 │ [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
· ──────┬──────
· ╰── dataframe has more than one column
╰────
```
During investigation, I discovered the root cause was that the lazy frame was incorrectly converted back to a eager dataframe. In order to keep this from happening, I explicitly set that the dataframe did not come from an eager frame. This causes the conversion logic to not attempt to convert the dataframe later in the pipeline.
---------
Co-authored-by: Jack Wright <jack.wright@disqo.com>
# Description
`polars ls` is already different that `dfr ls`. Currently it just shows
the cache key, columns, rows, and type. I have added:
- creation time
- size
- span contents
- span start and end
<img width="1471" alt="Screenshot 2024-04-10 at 17 27 06"
src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408">
# Tests + Formatting
Done
Co-authored-by: Jack Wright <jack.wright@disqo.com>