nushell

mirror of https://github.com/nushell/nushell.git synced 2025-05-31 07:08:22 +02:00

Author	SHA1	Message	Date
Jack Wright	a6b1d1f6d9	Upgrade to polars 0.40 (#13069 ) Upgrading to polars 0.40	2024-06-06 07:26:47 +08:00
Jack Wright	b10325dff1	Allow int values to be converted into floats. (#13025 ) Addresses the bug found by @maxim-uvarov when trying to coerce an int Value to a polars float: <img width="863" alt="image" src="https://github.com/nushell/nushell/assets/56345/4d858812-a7b3-4296-98f4-dce0c544b4c6"> Conversion now works correctly: <img width="891" alt="Screenshot 2024-05-31 at 14 28 51" src="https://github.com/nushell/nushell/assets/56345/78d9f711-7ad5-4503-abc6-7aba64a2e675">	2024-06-04 18:51:11 -07:00
Ian Manske	84b7a99adf	Revert "Polars lazy refactor (#12669 )" (#12962 ) This reverts commit 68adc4657f9c57bb7090df3984a5d2931f8e7358. # Description Reverts the lazyframe refactor (#12669) for the next release, since there are still a few lingering issues. This temporarily solves #12863 and #12828. After the release, the lazyframes can be added back and cleaned up.	2024-05-24 18:09:26 -05:00
Ian Manske	aec41f3df0	Add `Span` merging functions (#12511 ) # Description This PR adds a few functions to `Span` for merging spans together: - `Span::append`: merges two spans that are known to be in order. - `Span::concat`: returns a span that encompasses all the spans in a slice. The spans must be in order. - `Span::merge`: merges two spans (no order necessary). - `Span::merge_many`: merges an iterator of spans into a single span (no order necessary). These are meant to replace the free-standing `nu_protocol::span` function. The spans in a `LiteCommand` (the `parts`) should always be in order based on the lite parser and lexer. So, the parser code sees the most usage of `Span::append` and `Span::concat` where the order is known. In other code areas, `Span::merge` and `Span::merge_many` are used since the order between spans is often not known.	2024-05-16 22:34:49 +00:00
Ian Manske	6fd854ed9f	Replace `ExternalStream` with new `ByteStream` type (#12774 ) # Description This PR introduces a `ByteStream` type which is a `Read`-able stream of bytes. Internally, it has an enum over three different byte stream sources: ```rust pub enum ByteStreamSource { Read(Box<dyn Read + Send + 'static>), File(File), Child(ChildProcess), } ``` This is in comparison to the current `RawStream` type, which is an `Iterator<Item = Vec<u8>>` and has to allocate for each read chunk. Currently, `PipelineData::ExternalStream` serves a weird dual role where it is either external command output or a wrapper around `RawStream`. `ByteStream` makes this distinction more clear (via `ByteStreamSource`) and replaces `PipelineData::ExternalStream` in this PR: ```rust pub enum PipelineData { Empty, Value(Value, Option<PipelineMetadata>), ListStream(ListStream, Option<PipelineMetadata>), ByteStream(ByteStream, Option<PipelineMetadata>), } ``` The PR is relatively large, but a decent amount of it is just repetitive changes. This PR fixes #7017, fixes #10763, and fixes #12369. This PR also improves performance when piping external commands. Nushell should, in most cases, have competitive pipeline throughput compared to, e.g., bash. \| Command \| Before (MB/s) \| After (MB/s) \| Bash (MB/s) \| \| -------------------------------------------------- \| -------------:\| ------------:\| -----------:\| \| `throughput \\| rg 'x'` \| 3059 \| 3744 \| 3739 \| \| `throughput \\| nu --testbin relay o> /dev/null` \| 3508 \| 8087 \| 8136 \| # User-Facing Changes - This is a breaking change for the plugin communication protocol, because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`. Plugins now only have to deal with a single input stream, as opposed to the previous three streams: stdout, stderr, and exit code. - The output of `describe` has been changed for external/byte streams. - Temporary breaking change: `bytes starts-with` no longer works with byte streams. This is to keep the PR smaller, and `bytes ends-with` already does not work on byte streams. - If a process core dumped, then instead of having a `Value::Error` in the `exit_code` column of the output returned from `complete`, it now is a `Value::Int` with the negation of the signal number. # After Submitting - Update docs and book as necessary - Release notes (e.g., plugin protocol changes) - Adapt/convert commands to work with byte streams (high priority is `str length`, `bytes starts-with`, and maybe `bytes ends-with`). - Refactor the `tee` code, Devyn has already done some work on this. --------- Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>	2024-05-16 07:11:18 -07:00
Jack Wright	98369985b1	Allow custom value operations to work on eager and lazy dataframes interchangeably. (#12819 ) Fixes Bug #12809 The example that @maxim-uvarov posted now works as expected: <img width="1223" alt="Screenshot 2024-05-09 at 16 21 01" src="https://github.com/nushell/nushell/assets/56345/a4df62e3-e432-4c09-8e25-9a6c198741a3">	2024-05-13 18:17:31 -05:00
Jack Wright	68adc4657f	Polars lazy refactor (#12669 ) This moves to predominantly supporting only lazy dataframes for most operations. It removes a lot of the type conversion between lazy and eager dataframes based on what was inputted into the command. For the most part the changes will mean: * You will need to run `polars collect` after performing operations * The into-lazy command has been removed as it is redundant. * When opening files a lazy frame will be outputted by default if the reader supports lazy frames A list of individual command changes can be found [here](https://hackmd.io/@nucore/Bk-3V-hW0) --------- Co-authored-by: Ian Manske <ian.manske@pm.me>	2024-05-06 23:19:11 +00:00
Stefan Holderbach	be6137d136	Fix clippy::wrong_self_convention in polars plugin (#12737 ) Expected `into_` for `fn(self) -> T`	2024-05-02 19:31:51 +02:00
Jack Wright	a60381a932	Added commands for working with the plugin cache. (#12576 ) # Description This pull request provides three new commands: `polars store-ls` - moved from `polars ls`. It provides the list of all object stored in the plugin cache `polars store-rm` - deletes a cached object `polars store-get` - gets an object from the cache. The addition of `polars store-get` required adding a reference_count to cached entries. `polars get` is the only command that will increment this value. `polars rm` will remove the value despite it's count. Calls to PolarsPlugin::custom_value_dropped will decrement the value. The prefix store- was chosen due to there already being a `polars cache` command. These commands were not made sub-commands as there isn't a way to display help for sub commands in plugins (e.g. `polars store` displaying help) and I felt the store- seemed fine anyways. The output of `polars store-ls` now shows the reference count for each object. # User-Facing Changes polars ls has now moved to polars store-ls --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-21 19:43:43 -05:00
Jack Wright	cc7b5c5a26	Only mark collected dataframes as from_lazy=false when collect is called from the collect command. (#12571 ) I had previously changed NuLazyFrame::collect to set the NuDataFrame's from_lazy field to false to prevent conversion back to a lazy frame. It appears there are cases where this should happen. Instead, I am only setting from_lazy=false inside the `polars collect` command. [Related discord message](https://discord.com/channels/601130461678272522/1227612017171501136/1230600465159421993) Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-18 17:10:38 -05:00
Jack Wright	410f3c5c8a	Upgrading nu_plugin_polars to polars 0.39.1 (#12551 ) # Description Upgrading nu_plugin_polars to polars 0.39.1 Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-17 06:35:09 -05:00
Jack Wright	1661bb68f9	Cleaning up to_pipe_line_data and cache_and_to_value, making them part of CustomValueSupport (#12528 ) # Description This is just some cleanup. I moved to_pipeline_data and to_cache_value to the CustomValueSupport trait, where I should've put them to begin with. Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-16 06:35:52 -05:00
Jack Wright	5f818eaefe	Ensure that lazy frames converted via to-lazy are not converted back to eager frames later in the pipeline. (#12525 ) # Description @maxim-uvarov discovered the following error: ``` > [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars sort-by a \| polars unique --subset [a] Error: × Error using as series ╭─[entry #1:1:68] 1 │ [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars sort-by a \| polars unique --subset [a] · ──────┬────── · ╰── dataframe has more than one column ╰──── ``` During investigation, I discovered the root cause was that the lazy frame was incorrectly converted back to a eager dataframe. In order to keep this from happening, I explicitly set that the dataframe did not come from an eager frame. This causes the conversion logic to not attempt to convert the dataframe later in the pipeline. --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-15 18:29:42 -05:00
Devyn Cairns	2ae9ad8676	Copy-on-write for record values (#12305 ) # Description This adds a `SharedCow` type as a transparent copy-on-write pointer that clones to unique on mutate. As an initial test, the `Record` within `Value::Record` is shared. There are some pretty big wins for performance. I'll post benchmark results in a comment. The biggest winner is nested access, as that would have cloned the records for each cell path follow before and it doesn't have to anymore. The reusability of the `SharedCow` type is nice and I think it could be used to clean up the previous work I did with `Arc` in `EngineState`. It's meant to be a mostly transparent clone-on-write that just clones on `.to_mut()` or `.into_owned()` if there are actually multiple references, but avoids cloning if the reference is unique. # User-Facing Changes - `Value::Record` field is a different type (plugin authors) # Tests + Formatting - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting - [ ] use for `EngineState` - [ ] use for `Value::List`	2024-04-14 01:42:03 +00:00
Jack Wright	10a9a17b8c	Two consecutive calls to into-lazy should not fail (#12505 ) # Description From @maxim-uvarov's [post](https://discord.com/channels/601130461678272522/1227612017171501136/1228656319704203375). When calling `to-lazy` back to back in a pipeline, an error should not occur: ``` > [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars into-lazy Error: nu:🐚:cant_convert × Can't convert to NuDataFrame. ╭─[entry #1:1:30] 1 │ [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars into-lazy · ────────┬─────── · ╰── can't convert NuLazyFrameCustomValue to NuDataFrame ╰──── ``` This pull request ensures that custom value's of NuLazyFrameCustomValue are properly converted when passed in. Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-13 13:00:46 -05:00
Jack Wright	b9dd47ebb7	Polars 0.38 upgrade (#12506 ) # Description Polars 0.38 upgrade for both the dataframe crate and the polars plugin. --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-13 13:00:04 -05:00
Jack Wright	1bded8572c	Ensure that two columns named index don't exist when converting a Dataframe to a nu Value. (#12501 ) # Description @maxim-uvarov discovered an issue with the current implementation. When executing [[index a]; [1 1]] \| polars into-df, a plugin_failed_to_decode error occurs. This happens because a Record is created with two columns named "index" as an index column is added during conversion. This pull request addresses the problem by not adding an index column if there is already a column named "index" in the dataframe. --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-13 06:33:29 -05:00
Jack Wright	b9c2f9ee56	displaying span information, creation time, and size with polars ls (#12472 ) # Description `polars ls` is already different that `dfr ls`. Currently it just shows the cache key, columns, rows, and type. I have added: - creation time - size - span contents - span start and end <img width="1471" alt="Screenshot 2024-04-10 at 17 27 06" src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408"> # Tests + Formatting Done Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-12 09:23:46 -05:00
Jack Wright	efc1cfa939	Move dataframes support to a plugin (#12220 ) WIP This PR covers migration crates/nu-cmd-dataframes to a new plugin ./crates/nu_plugin_polars ## TODO List Other: - [X] Fix examples - [x] Fix Plugin Test Harness - [X] Move Cache to Mutex<BTreeMap> - [X] Logic for disabling/enabling plugin GC based off whether items are cached. - [x] NuExpression custom values - [X] Optimize caching (don't cache every object creation). - [x] Fix dataframe operations (in NuDataFrameCustomValue::operations) - [x] Added plugin_debug! macro that for checking an env variable POLARS_PLUGIN_DEBUG Fix duplicated commands: - [x] There are two polars median commands, one for lazy and one for expr.. there should only be one that works for both. I temporarily called on polars expr-median (inside expressions_macros.rs) - [x] polars quantile (lazy, and expr). the expr one is temporarily expr-median - [x] polars is-in (renamed one series-is-in) Commands: - [x] AppendDF - [x] CastDF - [X] ColumnsDF - [x] DataTypes - [x] Summary - [x] DropDF - [x] DropDuplicates - [x] DropNulls - [x] Dummies - [x] FilterWith - [X] FirstDF - [x] GetDF - [x] LastDF - [X] ListDF - [x] MeltDF - [X] OpenDataFrame - [x] QueryDf - [x] RenameDF - [x] SampleDF - [x] SchemaDF - [x] ShapeDF - [x] SliceDF - [x] TakeDF - [X] ToArrow - [x] ToAvro - [X] ToCSV - [X] ToDataFrame - [X] ToNu - [x] ToParquet - [x] ToJsonLines - [x] WithColumn - [x] ExprAlias - [x] ExprArgWhere - [x] ExprCol - [x] ExprConcatStr - [x] ExprCount - [x] ExprLit - [x] ExprWhen - [x] ExprOtherwise - [x] ExprQuantile - [x] ExprList - [x] ExprAggGroups - [x] ExprCount - [x] ExprIsIn - [x] ExprNot - [x] ExprMax - [x] ExprMin - [x] ExprSum - [x] ExprMean - [x] ExprMedian - [x] ExprStd - [x] ExprVar - [x] ExprDatePart - [X] LazyAggregate - [x] LazyCache - [X] LazyCollect - [x] LazyFetch - [x] LazyFillNA - [x] LazyFillNull - [x] LazyFilter - [x] LazyJoin - [x] LazyQuantile - [x] LazyMedian - [x] LazyReverse - [x] LazySelect - [x] LazySortBy - [x] ToLazyFrame - [x] ToLazyGroupBy - [x] LazyExplode - [x] LazyFlatten - [x] AllFalse - [x] AllTrue - [x] ArgMax - [x] ArgMin - [x] ArgSort - [x] ArgTrue - [x] ArgUnique - [x] AsDate - [x] AsDateTime - [x] Concatenate - [x] Contains - [x] Cumulative - [x] GetDay - [x] GetHour - [x] GetMinute - [x] GetMonth - [x] GetNanosecond - [x] GetOrdinal - [x] GetSecond - [x] GetWeek - [x] GetWeekDay - [x] GetYear - [x] IsDuplicated - [x] IsIn - [x] IsNotNull - [x] IsNull - [x] IsUnique - [x] NNull - [x] NUnique - [x] NotSeries - [x] Replace - [x] ReplaceAll - [x] Rolling - [x] SetSeries - [x] SetWithIndex - [x] Shift - [x] StrLengths - [x] StrSlice - [x] StrFTime - [x] ToLowerCase - [x] ToUpperCase - [x] Unique - [x] ValueCount --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-09 19:31:43 -05:00

19 Commits