# Description
This resurrects the work from #12866 and fixes#12732.
Polars panics for a plethora or reasons. While handling panics is
generally frowned upon, in cases like with `polars collect` a panic
cause a lot of work to be lost. Often you might have multiple dataframes
in memory and you are trying one operation and lose all state.
While it possible the panic can leave things a strange state, it is
pretty unlikely as part of a polars pipeline. Most of the time polars
objects are not manipulating dataframes in memory mutability, but rather
creating a new dataframe the operations being applied. This is always
the case with a lazy pipeline. After the collect call, the original
dataframes are intact still and I haven't observed any side effects.
# Description
This PR:
- Removes the lazy_command, expr_command macros and migrates the
commands that were utilizing them.
- Removes the custom logic in DataFrameValues::is_equals to use the
polars DataFrame version of PartialEq
- Adds examples to commands that previously did not have examples or had
inadequate ones.
NOTE: A lot of examples now have a `polars sort` at the end. This is
needed due to the comparison in the result. The new polars version of
equals cares about the ordering. I removed the custom equals logic as it
causes comparisons to lock up when comparing dataframes that contain a
row that contains a list. I discovered this issue when adding examples
to `polars implode`
There was a bug where anytime the plugin cache remove was called, the
plugin gc was turned back on. This probably happened when I added the
reference counter logic.
In this pull request, I converted the `perf` function within `nu_utils`
to a macro. This change facilitates easier usage within plugins by
allowing the use of `env_logger` and setting `RUST_LOG=nu_plugin_polars`
(or another plugin). Without this conversion, the `RUST_LOG` variable
would need to be set to `RUST_LOG=nu_utils::utils`, which is less
intuitive and impossible to narrow the perf results to one plugin.
# Description
This allows plugins to report their version (and potentially other
metadata in the future). The version is shown in `plugin list` and in
`version`.
The metadata is stored in the registry file, and reflects whatever was
retrieved on `plugin add`, not necessarily the running binary. This can
help you to diagnose if there's some kind of mismatch with what you
expect. We could potentially use this functionality to show a warning or
error if a plugin being run does not have the same version as what was
in the cache file, suggesting `plugin add` be run again, but I haven't
done that at this point.
It is optional, and it requires the plugin author to make some code
changes if they want to provide it, since I can't automatically
determine the version of the calling crate or anything tricky like that
to do it.
Example:
```
> plugin list | select name version is_running pid
╭───┬────────────────┬─────────┬────────────┬─────╮
│ # │ name │ version │ is_running │ pid │
├───┼────────────────┼─────────┼────────────┼─────┤
│ 0 │ example │ 0.93.1 │ false │ │
│ 1 │ gstat │ 0.93.1 │ false │ │
│ 2 │ inc │ 0.93.1 │ false │ │
│ 3 │ python_example │ 0.1.0 │ false │ │
╰───┴────────────────┴─────────┴────────────┴─────╯
```
cc @maxim-uvarov (he asked for it)
# User-Facing Changes
- `plugin list` gets a `version` column
- `version` shows plugin versions when available
- plugin authors *should* add `fn metadata()` to their `impl Plugin`,
but don't have to
# Tests + Formatting
Tested the low level stuff and also the `plugin list` column.
# After Submitting
- [ ] update plugin guide docs
- [ ] update plugin protocol docs (`Metadata` call & response)
- [ ] update plugin template (`fn metadata()` should be easy)
- [ ] release notes
This allows performance debugging to be turned on by setting:
```nushell
$env.POLARS_PLUGIN_PERF = "true"
```
Furthermore, this improves the other plugin debugging by allowing the
env variable for debugging to be set at any time versus having to be
available when nushell is launched:
```nushell
$env.POLARS_PLUGIN_DEBUG = "true"
```
This plugin introduces a `perf` function that will output timing
results. This works very similar to the perf function available in
nu_utils::utils::perf. This version prints everything to std error to
not break the plugin stream and uses the engine interface to see if the
env variable is configured.
This pull requests uses this `perf` function when:
* opening csv files as dataframes
* opening json lines files as dataframes
This will hopefully help provide some more fine grained information on
how long it takes polars to open different dataframes. The `perf` can
also be utilized later for other dataframes use cases.
# Description
This pull request provides three new commands:
`polars store-ls` - moved from `polars ls`. It provides the list of all
object stored in the plugin cache
`polars store-rm` - deletes a cached object
`polars store-get` - gets an object from the cache.
The addition of `polars store-get` required adding a reference_count to
cached entries. `polars get` is the only command that will increment
this value. `polars rm` will remove the value despite it's count. Calls
to PolarsPlugin::custom_value_dropped will decrement the value.
The prefix store- was chosen due to there already being a `polars cache`
command. These commands were not made sub-commands as there isn't a way
to display help for sub commands in plugins (e.g. `polars store`
displaying help) and I felt the store- seemed fine anyways.
The output of `polars store-ls` now shows the reference count for each
object.
# User-Facing Changes
polars ls has now moved to polars store-ls
---------
Co-authored-by: Jack Wright <jack.wright@disqo.com>
# Description
@maxim-uvarov discovered the following error:
```
> [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
Error: × Error using as series
╭─[entry #1:1:68]
1 │ [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
· ──────┬──────
· ╰── dataframe has more than one column
╰────
```
During investigation, I discovered the root cause was that the lazy frame was incorrectly converted back to a eager dataframe. In order to keep this from happening, I explicitly set that the dataframe did not come from an eager frame. This causes the conversion logic to not attempt to convert the dataframe later in the pipeline.
---------
Co-authored-by: Jack Wright <jack.wright@disqo.com>
# Description
`polars ls` is already different that `dfr ls`. Currently it just shows
the cache key, columns, rows, and type. I have added:
- creation time
- size
- span contents
- span start and end
<img width="1471" alt="Screenshot 2024-04-10 at 17 27 06"
src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408">
# Tests + Formatting
Done
Co-authored-by: Jack Wright <jack.wright@disqo.com>