Commit Graph

17 Commits

Author SHA1 Message Date
63b94dbd28 Added expression support for polars contains (#13769)
# Description
Adds the ability to use `polars contains` as an expression:
<img width="785" alt="Screenshot 2024-09-03 at 14 39 03"
src="https://github.com/user-attachments/assets/35c0d4e5-6bef-4974-a31f-463c7203bd03">


# User-Facing Changes
- `polars contains` can now be used with expressions
2024-09-04 12:19:45 +02:00
eb0de25d19 Expression support for polars strftime (#13767)
# Description
Allows `polars strftime` to be used as an expression:
<img width="849" alt="Screenshot 2024-09-03 at 13 14 11"
src="https://github.com/user-attachments/assets/2a987fdf-cf00-4c57-b6e1-0b706c594c20">

# User-Facing Changes
- `polars strftime` can now be used as an expression.
2024-09-04 12:19:29 +02:00
0119534f61 Expression support polars replace and polars replace-all (#13726)
# Description
Adds the ability for `polars replace` and `polars replace-all` to work
as expressions.

# User-Facing Changes
- `polars replace` can be used with polars expressions
- `polars replace-all` can be used with polars expressions
2024-08-29 13:59:44 -07:00
644bebf4c6 Expression support for polars uppercase and polars lowercase (#13724) 2024-08-28 14:08:16 -07:00
a39e94de8a Added polars commands for converting string columns to integer and decimal columns (#13711)
# Description
Introduces two new polars commands for converting string columns to
decimal and integer columns:

<img width="740" alt="Screenshot 2024-08-27 at 15 32 28"
src="https://github.com/user-attachments/assets/f9573b6e-48f6-4bbf-8782-39ffb95eb934">

<img width="720" alt="Screenshot 2024-08-27 at 15 33 46"
src="https://github.com/user-attachments/assets/90a66bb5-fa78-4ed3-8b2b-ae05cddd2f3a">

# User-Facing Changes
- Addition of the `polars integer` command
- Addition of the `polars decimal` command
2024-08-28 07:54:31 -05:00
95b78eee25 Change the usage misnomer to "description" (#13598)
# Description
    
The meaning of the word usage is specific to describing how a command
function is *used* and not a synonym for general description. Usage can
be used to describe the SYNOPSIS or EXAMPLES sections of a man page
where the permitted argument combinations are shown or example *uses*
are given.
Let's not confuse people and call it what it is a description.

Our `help` command already creates its own *Usage* section based on the
available arguments and doesn't refer to the description with usage.

# User-Facing Changes

`help commands` and `scope commands` will now use `description` or
`extra_description`
`usage`-> `description`
`extra_usage` -> `extra_description`

Breaking change in the plugin protocol:

In the signature record communicated with the engine.
`usage`-> `description`
`extra_usage` -> `extra_description`

The same rename also takes place for the methods on
`SimplePluginCommand` and `PluginCommand`

# Tests + Formatting
- Updated plugin protocol specific changes
# After Submitting
- [ ] update plugin protocol doc
2024-08-22 12:02:08 +02:00
720b4cbd01 Polars 0.41 Upgrade (#13238)
# Description
Upgrading to Polars 0.41

# User-Facing Changes
* `polars melt` has been renamed to `polars unpivot` to match the change
in the polars API. Additionally, it now supports lazy dataframes.
Introduced a `--streamable` option to use the polars streaming engine
for lazy frames.
* The parameter `outer` has been replaced with `full` in `polars join`
to match polars change.
* `polars value-count` now supports the column (rename count column),
parallelize (multithread), sort, and normalize options.

The list of polars changes can be found
[here](https://github.com/pola-rs/polars/releases/tag/rs-0.41.2)
2024-06-28 06:37:45 -05:00
a6b1d1f6d9 Upgrade to polars 0.40 (#13069)
Upgrading to polars 0.40
2024-06-06 07:26:47 +08:00
84b7a99adf Revert "Polars lazy refactor (#12669)" (#12962)
This reverts commit 68adc4657f.

# Description

Reverts the lazyframe refactor (#12669) for the next release, since
there are still a few lingering issues. This temporarily solves #12863
and #12828. After the release, the lazyframes can be added back and
cleaned up.
2024-05-24 18:09:26 -05:00
6fd854ed9f Replace ExternalStream with new ByteStream type (#12774)
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
    Read(Box<dyn Read + Send + 'static>),
    File(File),
    Child(ChildProcess),
}
```

This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.

Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
    Empty,
    Value(Value, Option<PipelineMetadata>),
    ListStream(ListStream, Option<PipelineMetadata>),
    ByteStream(ByteStream, Option<PipelineMetadata>),
}
```

The PR is relatively large, but a decent amount of it is just repetitive
changes.

This PR fixes #7017, fixes #10763, and fixes #12369.

This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |

# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.

# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.

---------

Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
2024-05-16 07:11:18 -07:00
68adc4657f Polars lazy refactor (#12669)
This moves to predominantly supporting only lazy dataframes for most
operations. It removes a lot of the type conversion between lazy and
eager dataframes based on what was inputted into the command.

For the most part the changes will mean:
* You will need to run `polars collect` after performing operations
* The into-lazy command has been removed as it is redundant.
* When opening files a lazy frame will be outputted by default if the
reader supports lazy frames

A list of individual command changes can be found
[here](https://hackmd.io/@nucore/Bk-3V-hW0)

---------

Co-authored-by: Ian Manske <ian.manske@pm.me>
2024-05-06 23:19:11 +00:00
a1287f7b3f add more tests to the polars plugin (#12719)
# Description

I added some more tests to our mighty `polars` ~~, yet I don't know how
to add expected results in some of them. I would like to ask for help.~~

~~My experiments are in the last commit: [polars:
experiments](f7e5e72019).
Without those experiments `cargo test` goes well.~~
 
UPD. I moved out my unsuccessful test experiments into a separate
[branch](https://github.com/maxim-uvarov/nushell/blob/polars-tests-broken2/).
So, this branch seems ready for a merge.

@ayax79, maybe you'll find time for me please? It's not urgent for sure.

P.S. I'm very new to git. Please feel free to give me any suggestions on
how I should use it better
2024-05-03 20:14:55 -05:00
be6137d136 Fix clippy::wrong_self_convention in polars plugin (#12737)
Expected `into_` for `fn(self) -> T`
2024-05-02 19:31:51 +02:00
884d5312bb add tests to polars unique (#12683)
# Description

I would like to help with `polars` plugin development and add tests to
all the `polars` command's existing params.

Since I have never written any lines of Rust, even though the task of
creating tests is relatively simple, I would like to ask for feedback to
ensure I did everything correctly here.
2024-04-27 12:04:54 -05:00
a60381a932 Added commands for working with the plugin cache. (#12576)
# Description
This pull request provides three new commands:
`polars store-ls` - moved from `polars ls`. It provides the list of all
object stored in the plugin cache
`polars store-rm` - deletes a cached object
`polars store-get` - gets an object from the cache. 

The addition of `polars store-get` required adding a reference_count to
cached entries. `polars get` is the only command that will increment
this value. `polars rm` will remove the value despite it's count. Calls
to PolarsPlugin::custom_value_dropped will decrement the value.

The prefix store- was chosen due to there already being a `polars cache`
command. These commands were not made sub-commands as there isn't a way
to display help for sub commands in plugins (e.g. `polars store`
displaying help) and I felt the store- seemed fine anyways.

The output of `polars store-ls` now shows the reference count for each
object.

# User-Facing Changes
polars ls has now moved to polars store-ls

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-21 19:43:43 -05:00
1661bb68f9 Cleaning up to_pipe_line_data and cache_and_to_value, making them part of CustomValueSupport (#12528)
# Description

This is just some cleanup. I moved to_pipeline_data and to_cache_value
to the CustomValueSupport trait, where I should've put them to begin
with.

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-16 06:35:52 -05:00
efc1cfa939 Move dataframes support to a plugin (#12220)
WIP

This PR covers migration crates/nu-cmd-dataframes to a new plugin
./crates/nu_plugin_polars

## TODO List

Other:
- [X] Fix examples
- [x] Fix Plugin Test Harness
- [X] Move Cache to Mutex<BTreeMap>
- [X] Logic for disabling/enabling plugin GC based off whether items are
cached.
- [x] NuExpression custom values
- [X] Optimize caching (don't cache every object creation). 
- [x] Fix dataframe operations (in NuDataFrameCustomValue::operations)
- [x] Added plugin_debug! macro that for checking an env variable
POLARS_PLUGIN_DEBUG

Fix duplicated commands:
- [x] There are two polars median commands, one for lazy and one for
expr.. there should only be one that works for both. I temporarily
called on polars expr-median (inside expressions_macros.rs)
- [x] polars quantile (lazy, and expr). the expr one is temporarily
expr-median
- [x] polars is-in (renamed one series-is-in)

Commands:
- [x] AppendDF
- [x] CastDF
- [X] ColumnsDF
- [x] DataTypes
- [x] Summary
- [x] DropDF
- [x] DropDuplicates
- [x] DropNulls
- [x] Dummies
- [x] FilterWith
- [X] FirstDF
- [x] GetDF
- [x] LastDF
- [X] ListDF
- [x] MeltDF
- [X] OpenDataFrame
- [x] QueryDf
- [x] RenameDF
- [x] SampleDF
- [x] SchemaDF
- [x] ShapeDF
- [x] SliceDF
- [x] TakeDF
- [X] ToArrow
- [x] ToAvro
- [X] ToCSV
- [X] ToDataFrame
- [X] ToNu
- [x] ToParquet
- [x] ToJsonLines
- [x] WithColumn
- [x] ExprAlias
- [x] ExprArgWhere
- [x] ExprCol
- [x] ExprConcatStr
- [x] ExprCount
- [x] ExprLit
- [x] ExprWhen
- [x] ExprOtherwise
- [x] ExprQuantile
- [x] ExprList
- [x] ExprAggGroups
- [x] ExprCount
- [x] ExprIsIn
- [x] ExprNot
- [x] ExprMax
- [x] ExprMin
- [x] ExprSum
- [x] ExprMean
- [x] ExprMedian
- [x] ExprStd
- [x] ExprVar
- [x] ExprDatePart
- [X] LazyAggregate
- [x] LazyCache
- [X] LazyCollect
- [x] LazyFetch
- [x] LazyFillNA
- [x] LazyFillNull
- [x] LazyFilter
- [x] LazyJoin
- [x] LazyQuantile
- [x] LazyMedian
- [x] LazyReverse
- [x] LazySelect
- [x] LazySortBy
- [x] ToLazyFrame
- [x] ToLazyGroupBy
- [x] LazyExplode
- [x] LazyFlatten
- [x] AllFalse
- [x] AllTrue
- [x] ArgMax
- [x] ArgMin
- [x] ArgSort
- [x] ArgTrue
- [x] ArgUnique
- [x] AsDate
- [x] AsDateTime
- [x] Concatenate
- [x] Contains
- [x] Cumulative
- [x] GetDay
- [x] GetHour
- [x] GetMinute
- [x] GetMonth
- [x] GetNanosecond
- [x] GetOrdinal
- [x] GetSecond
- [x] GetWeek
- [x] GetWeekDay
- [x] GetYear
- [x] IsDuplicated
- [x] IsIn
- [x] IsNotNull
- [x] IsNull
- [x] IsUnique
- [x] NNull
- [x] NUnique
- [x] NotSeries
- [x] Replace
- [x] ReplaceAll
- [x] Rolling
- [x] SetSeries
- [x] SetWithIndex
- [x] Shift
- [x] StrLengths
- [x] StrSlice
- [x] StrFTime
- [x] ToLowerCase
- [x] ToUpperCase
- [x] Unique
- [x] ValueCount

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-09 19:31:43 -05:00