Commit Graph

13 Commits

Author SHA1 Message Date
b10325dff1 Allow int values to be converted into floats. (#13025)
Addresses the bug found by @maxim-uvarov when trying to coerce an int
Value to a polars float:

<img width="863" alt="image"
src="https://github.com/nushell/nushell/assets/56345/4d858812-a7b3-4296-98f4-dce0c544b4c6">

Conversion now works correctly:

<img width="891" alt="Screenshot 2024-05-31 at 14 28 51"
src="https://github.com/nushell/nushell/assets/56345/78d9f711-7ad5-4503-abc6-7aba64a2e675">
2024-06-04 18:51:11 -07:00
84b7a99adf Revert "Polars lazy refactor (#12669)" (#12962)
This reverts commit 68adc4657f.

# Description

Reverts the lazyframe refactor (#12669) for the next release, since
there are still a few lingering issues. This temporarily solves #12863
and #12828. After the release, the lazyframes can be added back and
cleaned up.
2024-05-24 18:09:26 -05:00
aec41f3df0 Add Span merging functions (#12511)
# Description
This PR adds a few functions to `Span` for merging spans together:
- `Span::append`: merges two spans that are known to be in order.
- `Span::concat`: returns a span that encompasses all the spans in a
slice. The spans must be in order.
- `Span::merge`: merges two spans (no order necessary).
- `Span::merge_many`: merges an iterator of spans into a single span (no
order necessary).

These are meant to replace the free-standing `nu_protocol::span`
function.

The spans in a `LiteCommand` (the `parts`) should always be in order
based on the lite parser and lexer. So, the parser code sees the most
usage of `Span::append` and `Span::concat` where the order is known. In
other code areas, `Span::merge` and `Span::merge_many` are used since
the order between spans is often not known.
2024-05-16 22:34:49 +00:00
6fd854ed9f Replace ExternalStream with new ByteStream type (#12774)
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
    Read(Box<dyn Read + Send + 'static>),
    File(File),
    Child(ChildProcess),
}
```

This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.

Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
    Empty,
    Value(Value, Option<PipelineMetadata>),
    ListStream(ListStream, Option<PipelineMetadata>),
    ByteStream(ByteStream, Option<PipelineMetadata>),
}
```

The PR is relatively large, but a decent amount of it is just repetitive
changes.

This PR fixes #7017, fixes #10763, and fixes #12369.

This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |

# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.

# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.

---------

Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
2024-05-16 07:11:18 -07:00
98369985b1 Allow custom value operations to work on eager and lazy dataframes interchangeably. (#12819)
Fixes Bug #12809 

The example that @maxim-uvarov posted now works as expected:

<img width="1223" alt="Screenshot 2024-05-09 at 16 21 01"
src="https://github.com/nushell/nushell/assets/56345/a4df62e3-e432-4c09-8e25-9a6c198741a3">
2024-05-13 18:17:31 -05:00
68adc4657f Polars lazy refactor (#12669)
This moves to predominantly supporting only lazy dataframes for most
operations. It removes a lot of the type conversion between lazy and
eager dataframes based on what was inputted into the command.

For the most part the changes will mean:
* You will need to run `polars collect` after performing operations
* The into-lazy command has been removed as it is redundant.
* When opening files a lazy frame will be outputted by default if the
reader supports lazy frames

A list of individual command changes can be found
[here](https://hackmd.io/@nucore/Bk-3V-hW0)

---------

Co-authored-by: Ian Manske <ian.manske@pm.me>
2024-05-06 23:19:11 +00:00
410f3c5c8a Upgrading nu_plugin_polars to polars 0.39.1 (#12551)
# Description
Upgrading nu_plugin_polars to polars 0.39.1

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-17 06:35:09 -05:00
2ae9ad8676 Copy-on-write for record values (#12305)
# Description
This adds a `SharedCow` type as a transparent copy-on-write pointer that
clones to unique on mutate.

As an initial test, the `Record` within `Value::Record` is shared.

There are some pretty big wins for performance. I'll post benchmark
results in a comment. The biggest winner is nested access, as that would
have cloned the records for each cell path follow before and it doesn't
have to anymore.

The reusability of the `SharedCow` type is nice and I think it could be
used to clean up the previous work I did with `Arc` in `EngineState`.
It's meant to be a mostly transparent clone-on-write that just clones on
`.to_mut()` or `.into_owned()` if there are actually multiple
references, but avoids cloning if the reference is unique.

# User-Facing Changes
- `Value::Record` field is a different type (plugin authors)

# Tests + Formatting
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`

# After Submitting
- [ ] use for `EngineState`
- [ ] use for `Value::List`
2024-04-14 01:42:03 +00:00
10a9a17b8c Two consecutive calls to into-lazy should not fail (#12505)
# Description

From @maxim-uvarov's
[post](https://discord.com/channels/601130461678272522/1227612017171501136/1228656319704203375).

When calling `to-lazy` back to back in a pipeline, an error should not
occur:

```
> [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars into-lazy
Error: nu:🐚:cant_convert

  × Can't convert to NuDataFrame.
   ╭─[entry #1:1:30]
 1 │ [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars into-lazy
   ·                              ────────┬───────
   ·                                      ╰── can't convert NuLazyFrameCustomValue to NuDataFrame
   ╰────
 ```

This pull request ensures that custom value's of NuLazyFrameCustomValue are properly converted when passed in.

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-13 13:00:46 -05:00
b9dd47ebb7 Polars 0.38 upgrade (#12506)
# Description
Polars 0.38 upgrade for both the dataframe crate and the polars plugin.

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-13 13:00:04 -05:00
1bded8572c Ensure that two columns named index don't exist when converting a Dataframe to a nu Value. (#12501)
# Description
@maxim-uvarov discovered an issue with the current implementation. When
executing [[index a]; [1 1]] | polars into-df, a plugin_failed_to_decode
error occurs. This happens because a Record is created with two columns
named "index" as an index column is added during conversion. This pull
request addresses the problem by not adding an index column if there is
already a column named "index" in the dataframe.

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-13 06:33:29 -05:00
b9c2f9ee56 displaying span information, creation time, and size with polars ls (#12472)
# Description
`polars ls` is already different that `dfr ls`. Currently it just shows
the cache key, columns, rows, and type. I have added:
- creation time
- size
- span contents
-  span start and end

<img width="1471" alt="Screenshot 2024-04-10 at 17 27 06"
src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408">

# Tests + Formatting
Done

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-12 09:23:46 -05:00
efc1cfa939 Move dataframes support to a plugin (#12220)
WIP

This PR covers migration crates/nu-cmd-dataframes to a new plugin
./crates/nu_plugin_polars

## TODO List

Other:
- [X] Fix examples
- [x] Fix Plugin Test Harness
- [X] Move Cache to Mutex<BTreeMap>
- [X] Logic for disabling/enabling plugin GC based off whether items are
cached.
- [x] NuExpression custom values
- [X] Optimize caching (don't cache every object creation). 
- [x] Fix dataframe operations (in NuDataFrameCustomValue::operations)
- [x] Added plugin_debug! macro that for checking an env variable
POLARS_PLUGIN_DEBUG

Fix duplicated commands:
- [x] There are two polars median commands, one for lazy and one for
expr.. there should only be one that works for both. I temporarily
called on polars expr-median (inside expressions_macros.rs)
- [x] polars quantile (lazy, and expr). the expr one is temporarily
expr-median
- [x] polars is-in (renamed one series-is-in)

Commands:
- [x] AppendDF
- [x] CastDF
- [X] ColumnsDF
- [x] DataTypes
- [x] Summary
- [x] DropDF
- [x] DropDuplicates
- [x] DropNulls
- [x] Dummies
- [x] FilterWith
- [X] FirstDF
- [x] GetDF
- [x] LastDF
- [X] ListDF
- [x] MeltDF
- [X] OpenDataFrame
- [x] QueryDf
- [x] RenameDF
- [x] SampleDF
- [x] SchemaDF
- [x] ShapeDF
- [x] SliceDF
- [x] TakeDF
- [X] ToArrow
- [x] ToAvro
- [X] ToCSV
- [X] ToDataFrame
- [X] ToNu
- [x] ToParquet
- [x] ToJsonLines
- [x] WithColumn
- [x] ExprAlias
- [x] ExprArgWhere
- [x] ExprCol
- [x] ExprConcatStr
- [x] ExprCount
- [x] ExprLit
- [x] ExprWhen
- [x] ExprOtherwise
- [x] ExprQuantile
- [x] ExprList
- [x] ExprAggGroups
- [x] ExprCount
- [x] ExprIsIn
- [x] ExprNot
- [x] ExprMax
- [x] ExprMin
- [x] ExprSum
- [x] ExprMean
- [x] ExprMedian
- [x] ExprStd
- [x] ExprVar
- [x] ExprDatePart
- [X] LazyAggregate
- [x] LazyCache
- [X] LazyCollect
- [x] LazyFetch
- [x] LazyFillNA
- [x] LazyFillNull
- [x] LazyFilter
- [x] LazyJoin
- [x] LazyQuantile
- [x] LazyMedian
- [x] LazyReverse
- [x] LazySelect
- [x] LazySortBy
- [x] ToLazyFrame
- [x] ToLazyGroupBy
- [x] LazyExplode
- [x] LazyFlatten
- [x] AllFalse
- [x] AllTrue
- [x] ArgMax
- [x] ArgMin
- [x] ArgSort
- [x] ArgTrue
- [x] ArgUnique
- [x] AsDate
- [x] AsDateTime
- [x] Concatenate
- [x] Contains
- [x] Cumulative
- [x] GetDay
- [x] GetHour
- [x] GetMinute
- [x] GetMonth
- [x] GetNanosecond
- [x] GetOrdinal
- [x] GetSecond
- [x] GetWeek
- [x] GetWeekDay
- [x] GetYear
- [x] IsDuplicated
- [x] IsIn
- [x] IsNotNull
- [x] IsNull
- [x] IsUnique
- [x] NNull
- [x] NUnique
- [x] NotSeries
- [x] Replace
- [x] ReplaceAll
- [x] Rolling
- [x] SetSeries
- [x] SetWithIndex
- [x] Shift
- [x] StrLengths
- [x] StrSlice
- [x] StrFTime
- [x] ToLowerCase
- [x] ToUpperCase
- [x] Unique
- [x] ValueCount

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-09 19:31:43 -05:00