Commit Graph

16 Commits

Author SHA1 Message Date
Jack Wright
a6b1d1f6d9
Upgrade to polars 0.40 (#13069)
Upgrading to polars 0.40
2024-06-06 07:26:47 +08:00
Wind
ad5a6cdc00
bump version to 0.94.3 (#13055) 2024-06-05 06:52:40 +08:00
Devyn Cairns
6635b74d9d
Bump version to 0.94.2 (#13014)
Version bump after 0.94.1 patch release.
2024-06-03 10:28:35 +03:00
Devyn Cairns
f3991f2080
Bump version to 0.94.1 (#12988)
Merge this PR before merging any other PRs.
2024-05-28 22:41:23 +00:00
Jakub Žádník
61182deb96
Bump version to 0.94.0 (#12987) 2024-05-28 12:04:09 -07:00
Darren Schroeder
0c5a67f4e5
make polars plugin use mimalloc (#12967)
# Description
@maxim-uvarov did a ton of research and work with the dply-rs author and
ritchie from polars and found out that the allocator matters on macos
and it seems to be what was messing up the performance of polars plugin.
ritchie suggested to use jemalloc but i switched it to mimalloc to match
nushell and it seems to run better.

## Before (default allocator)
note - using 1..10 vs 1..100 since it takes so long. also notice how
high the `max` timings are compared to mimalloc below.
```nushell
❯ 1..10 | each {timeit {polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null}} |   | {mean: ($in | math avg), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in)ns' | into duration)}
╭────────┬─────────────────────────╮
│ mean   │ 4sec 999ms 605µs 995ns  │
│ min    │ 983ms 627µs 42ns        │
│ max    │ 13sec 398ms 135µs 791ns │
│ stddev │ 3sec 476ms 479µs 939ns  │
╰────────┴─────────────────────────╯
❯ use std bench
❯ bench { polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null } -n 10
╭───────┬────────────────────────╮
│ mean  │ 6sec 220ms 783µs 983ns │
│ min   │ 1sec 184ms 997µs 708ns │
│ max   │ 18sec 882ms 81µs 708ns │
│ std   │ 5sec 350ms 375µs 697ns │
│ times │ [list 10 items]        │
╰───────┴────────────────────────╯
```

## After (using mimalloc)
```nushell
❯ 1..100 | each {timeit {polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null}} |   | {mean: ($in | math avg), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in)ns' | into duration)}
╭────────┬───────────────────╮
│ mean   │ 103ms 728µs 902ns │
│ min    │ 97ms 107µs 42ns   │
│ max    │ 149ms 430µs 84ns  │
│ stddev │ 5ms 690µs 664ns   │
╰────────┴───────────────────╯
❯ use std bench
❯ bench { polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null } -n 100
╭───────┬───────────────────╮
│ mean  │ 103ms 620µs 195ns │
│ min   │ 97ms 541µs 166ns  │
│ max   │ 130ms 262µs 166ns │
│ std   │ 4ms 948µs 654ns   │
│ times │ [list 100 items]  │
╰───────┴───────────────────╯
```

## After (using jemalloc - just for comparison)
```nushell
❯ 1..100 | each {timeit {polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null}} |   | {mean: ($in | math avg), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in)ns' | into duration)}

╭────────┬───────────────────╮
│ mean   │ 113ms 939µs 777ns │
│ min    │ 108ms 337µs 333ns │
│ max    │ 166ms 467µs 458ns │
│ stddev │ 6ms 175µs 618ns   │
╰────────┴───────────────────╯
❯ use std bench
❯ bench { polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null } -n 100
╭───────┬───────────────────╮
│ mean  │ 114ms 363µs 530ns │
│ min   │ 108ms 804µs 833ns │
│ max   │ 143ms 521µs 459ns │
│ std   │ 5ms 88µs 56ns     │
│ times │ [list 100 items]  │
╰───────┴───────────────────╯
```

## After (using parquet + mimalloc)
```nushell
❯ 1..100 | each {timeit {polars open data.parquet | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null}} |   | {mean: ($in | math avg), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in)ns' | into duration)}
╭────────┬──────────────────╮
│ mean   │ 34ms 255µs 492ns │
│ min    │ 31ms 787µs 250ns │
│ max    │ 76ms 408µs 416ns │
│ stddev │ 4ms 472µs 916ns  │
╰────────┴──────────────────╯
❯ use std bench
❯ bench { polars open data.parquet | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect | null } -n 100
╭───────┬──────────────────╮
│ mean  │ 34ms 897µs 562ns │
│ min   │ 31ms 518µs 542ns │
│ max   │ 65ms 943µs 625ns │
│ std   │ 3ms 450µs 741ns  │
│ times │ [list 100 items] │
╰───────┴──────────────────╯
```

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2024-05-25 09:10:01 -05:00
Ian Manske
84b7a99adf
Revert "Polars lazy refactor (#12669)" (#12962)
This reverts commit 68adc4657f.

# Description

Reverts the lazyframe refactor (#12669) for the next release, since
there are still a few lingering issues. This temporarily solves #12863
and #12828. After the release, the lazyframes can be added back and
cleaned up.
2024-05-24 18:09:26 -05:00
Ian Manske
905e3d0715
Remove dataframes crate and feature (#12889)
# Description
Removes the old `nu-cmd-dataframe` crate in favor of the polars plugin.
As such, this PR also removes the `dataframe` feature, related CI, and
full releases of nushell.
2024-05-20 17:22:08 +00:00
Jack Wright
68adc4657f
Polars lazy refactor (#12669)
This moves to predominantly supporting only lazy dataframes for most
operations. It removes a lot of the type conversion between lazy and
eager dataframes based on what was inputted into the command.

For the most part the changes will mean:
* You will need to run `polars collect` after performing operations
* The into-lazy command has been removed as it is redundant.
* When opening files a lazy frame will be outputted by default if the
reader supports lazy frames

A list of individual command changes can be found
[here](https://hackmd.io/@nucore/Bk-3V-hW0)

---------

Co-authored-by: Ian Manske <ian.manske@pm.me>
2024-05-06 23:19:11 +00:00
Devyn Cairns
21ebdfe8d7
Bump version to 0.93.1 (#12710)
# Description

Next patch/dev release, `0.93.1`
2024-05-01 17:19:20 -05:00
Devyn Cairns
3b220e07e3
Bump version to 0.93.0 (#12709)
# Description

Bump version to `0.93.0`
2024-04-30 15:51:13 -07:00
Jack Wright
410f3c5c8a
Upgrading nu_plugin_polars to polars 0.39.1 (#12551)
# Description
Upgrading nu_plugin_polars to polars 0.39.1

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-17 06:35:09 -05:00
Jack Wright
b9dd47ebb7
Polars 0.38 upgrade (#12506)
# Description
Polars 0.38 upgrade for both the dataframe crate and the polars plugin.

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-13 13:00:04 -05:00
Jack Wright
f975c9923a
Handle relative paths correctly on polars to-(parquet|jsonl|arrow|etc) commands (#12486)
# Description

All polars commands that output a file were not handling relative paths
correctly.

A command like
``` [[a b]; [6 2] [1 4] [4 1]] | polars into-df | polars to-parquet foo.json``` 
was outputting the foo.json to the directory of the plugin executable. 

This pull request pulls in nu-path and using it for resolving the file paths.

Related discussion
https://discord.com/channels/601130461678272522/1227612017171501136/1227889870358183966

# User-Facing Changes
None

# Tests + Formatting
Done, added tests for each of the polars to-* commands.

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-12 19:30:37 -05:00
Stefan Holderbach
872945ae8e
Bump version to 0.92.3 (#12476) 2024-04-12 08:00:43 -05:00
Jack Wright
efc1cfa939
Move dataframes support to a plugin (#12220)
WIP

This PR covers migration crates/nu-cmd-dataframes to a new plugin
./crates/nu_plugin_polars

## TODO List

Other:
- [X] Fix examples
- [x] Fix Plugin Test Harness
- [X] Move Cache to Mutex<BTreeMap>
- [X] Logic for disabling/enabling plugin GC based off whether items are
cached.
- [x] NuExpression custom values
- [X] Optimize caching (don't cache every object creation). 
- [x] Fix dataframe operations (in NuDataFrameCustomValue::operations)
- [x] Added plugin_debug! macro that for checking an env variable
POLARS_PLUGIN_DEBUG

Fix duplicated commands:
- [x] There are two polars median commands, one for lazy and one for
expr.. there should only be one that works for both. I temporarily
called on polars expr-median (inside expressions_macros.rs)
- [x] polars quantile (lazy, and expr). the expr one is temporarily
expr-median
- [x] polars is-in (renamed one series-is-in)

Commands:
- [x] AppendDF
- [x] CastDF
- [X] ColumnsDF
- [x] DataTypes
- [x] Summary
- [x] DropDF
- [x] DropDuplicates
- [x] DropNulls
- [x] Dummies
- [x] FilterWith
- [X] FirstDF
- [x] GetDF
- [x] LastDF
- [X] ListDF
- [x] MeltDF
- [X] OpenDataFrame
- [x] QueryDf
- [x] RenameDF
- [x] SampleDF
- [x] SchemaDF
- [x] ShapeDF
- [x] SliceDF
- [x] TakeDF
- [X] ToArrow
- [x] ToAvro
- [X] ToCSV
- [X] ToDataFrame
- [X] ToNu
- [x] ToParquet
- [x] ToJsonLines
- [x] WithColumn
- [x] ExprAlias
- [x] ExprArgWhere
- [x] ExprCol
- [x] ExprConcatStr
- [x] ExprCount
- [x] ExprLit
- [x] ExprWhen
- [x] ExprOtherwise
- [x] ExprQuantile
- [x] ExprList
- [x] ExprAggGroups
- [x] ExprCount
- [x] ExprIsIn
- [x] ExprNot
- [x] ExprMax
- [x] ExprMin
- [x] ExprSum
- [x] ExprMean
- [x] ExprMedian
- [x] ExprStd
- [x] ExprVar
- [x] ExprDatePart
- [X] LazyAggregate
- [x] LazyCache
- [X] LazyCollect
- [x] LazyFetch
- [x] LazyFillNA
- [x] LazyFillNull
- [x] LazyFilter
- [x] LazyJoin
- [x] LazyQuantile
- [x] LazyMedian
- [x] LazyReverse
- [x] LazySelect
- [x] LazySortBy
- [x] ToLazyFrame
- [x] ToLazyGroupBy
- [x] LazyExplode
- [x] LazyFlatten
- [x] AllFalse
- [x] AllTrue
- [x] ArgMax
- [x] ArgMin
- [x] ArgSort
- [x] ArgTrue
- [x] ArgUnique
- [x] AsDate
- [x] AsDateTime
- [x] Concatenate
- [x] Contains
- [x] Cumulative
- [x] GetDay
- [x] GetHour
- [x] GetMinute
- [x] GetMonth
- [x] GetNanosecond
- [x] GetOrdinal
- [x] GetSecond
- [x] GetWeek
- [x] GetWeekDay
- [x] GetYear
- [x] IsDuplicated
- [x] IsIn
- [x] IsNotNull
- [x] IsNull
- [x] IsUnique
- [x] NNull
- [x] NUnique
- [x] NotSeries
- [x] Replace
- [x] ReplaceAll
- [x] Rolling
- [x] SetSeries
- [x] SetWithIndex
- [x] Shift
- [x] StrLengths
- [x] StrSlice
- [x] StrFTime
- [x] ToLowerCase
- [x] ToUpperCase
- [x] Unique
- [x] ValueCount

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
2024-04-09 19:31:43 -05:00