nushell

mirror of https://github.com/nushell/nushell.git synced 2025-07-17 23:02:00 +02:00

Author	SHA1	Message	Date
pyz4	37bc922a67	feat(polars): add `polars math` expression (#15822 ) <!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [linking keywords](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. Provide examples and/or screenshots if your changes affect the user experience. --> This PR adds a number of math functions under a single `polars math` command that apply to one or more column expressions. Note, `polars math` currently resides in the new module dataframe/command/command/computation/math.rs. I'm open to alternative organization and naming suggestions. ```nushell Collection of math functions to be applied on one or more column expressions This is an incomplete implementation of the available functions listed here: https://docs.pola.rs/api/python/stable/reference/expressions/computation.html. The following functions are currently available: - abs - cos - dot <expression> - exp - log <base; default e> - log1p - sign - sin - sqrt Usage: > polars math <type> ...(args) Flags: -h, --help: Display the help message for this command Parameters: type <string>: Function name. See extra description for full list of accepted values ...args <any>: Extra arguments required by some functions Input/output types: ╭───┬────────────┬────────────╮ │ # │ input │ output │ ├───┼────────────┼────────────┤ │ 0 │ expression │ expression │ ╰───┴────────────┴────────────╯ Examples: Apply function to column expression > [[a]; [0] [-1] [2] [-3] [4]] \| polars into-df \| polars select [ (polars col a \| polars math abs \| polars as a_abs) (polars col a \| polars math sign \| polars as a_sign) (polars col a \| polars math exp \| polars as a_exp)] \| polars collect ╭───┬───────┬────────┬────────╮ │ # │ a_abs │ a_sign │ a_exp │ ├───┼───────┼────────┼────────┤ │ 0 │ 0 │ 0 │ 1.000 │ │ 1 │ 1 │ -1 │ 0.368 │ │ 2 │ 2 │ 1 │ 7.389 │ │ 3 │ 3 │ -1 │ 0.050 │ │ 4 │ 4 │ 1 │ 54.598 │ ╰───┴───────┴────────┴────────╯ Specify arguments for select functions. See description for more information. > [[a]; [0] [1] [2] [4] [8] [16]] \| polars into-df \| polars select [ (polars col a \| polars math log 2 \| polars as a_base2)] \| polars collect ╭───┬─────────╮ │ # │ a_base2 │ ├───┼─────────┤ │ 0 │ -inf │ │ 1 │ 0.000 │ │ 2 │ 1.000 │ │ 3 │ 2.000 │ │ 4 │ 3.000 │ │ 5 │ 4.000 │ ╰───┴─────────╯ Specify arguments for select functions. See description for more information. > [[a b]; [0 0] [1 1] [2 2] [3 3] [4 4] [5 5]] \| polars into-df \| polars select [ (polars col a \| polars math dot (polars col b) \| polars as ab)] \| polars collect ╭───┬────────╮ │ # │ ab │ ├───┼────────┤ │ 0 │ 55.000 │ ╰───┴────────╯ ``` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> No breaking changes. # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > Note > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> Example tests were added to `polars math`. # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->	2025-05-27 16:35:48 -07:00
Jack Wright	c2ac8f730e	Rust 1.85, edition=2024 (#15741 )	2025-05-13 16:49:30 +02:00
Jack Wright	b0f9cda9b5	Introduction of NuDataType and `polars dtype` (#15529 ) # Description This pull request does a lot of the heavy lifting needed to supported more complex dtypes like categorical dtypes. It introduces a new CustomValue, NuDataType and makes NuSchema a full CustomValue. Further more it introduces a new command `polars into-dtype` that allows a dtype to be created. This can then be passed into schemas when they are created. ```nu > ❯ : let dt = ("str" \| polars to-dtype) > ❯ : [[a b]; ["one" "two"]] \| polars into-df -s {a: $dt, b: str} \| polars schema ╭───┬─────╮ │ a │ str │ │ b │ str │ ╰───┴─────╯ ``` # User-Facing Changes - Introduces new command `polars into-dtype`, allows dtype variables to be passed in during schema creation.	2025-04-09 08:13:49 -07:00
Matthias Meschede	966cebec34	Adds polars list-contains command (#15304 ) # Description This PR adds the `polars list-contains` command. It works like this: ``` ~/Projects/nushell/nushell> let df = [[a]; [[a,b,c]] [[b,c,d]] [[c,d,f]]] \| polars into-df -s {a: list<str>}; ~/Projects/nushell/nushell> $df \| polars with-column [(polars col a \| polars list-contains (polars lit a) \| polars as b)] \| polars collect ╭───┬───────────┬───────╮ │ # │ a │ b │ ├───┼───────────┼───────┤ │ 0 │ ╭───┬───╮ │ true │ │ │ │ 0 │ a │ │ │ │ │ │ 1 │ b │ │ │ │ │ │ 2 │ c │ │ │ │ │ ╰───┴───╯ │ │ │ 1 │ ╭───┬───╮ │ false │ │ │ │ 0 │ b │ │ │ │ │ │ 1 │ c │ │ │ │ │ │ 2 │ d │ │ │ │ │ ╰───┴───╯ │ │ │ 2 │ ╭───┬───╮ │ false │ │ │ │ 0 │ c │ │ │ │ │ │ 1 │ d │ │ │ │ │ │ 2 │ f │ │ │ │ │ ╰───┴───╯ │ │ ╰───┴───────────┴───────╯ ``` or ``` ~/Projects/nushell/nushell> let df = [[a, b]; [[a,b,c], a] [[b,c,d], f] [[c,d,f], f]] \| polars into-df -s {a: list<str>, b: str} ~/Projects/nushell/nushell> $df \| polars with-column [(polars col a \| polars list-contains b \| polars as c)] \| polars collect ╭───┬───────────┬───┬───────╮ │ # │ a │ b │ c │ ├───┼───────────┼───┼───────┤ │ 0 │ ╭───┬───╮ │ a │ true │ │ │ │ 0 │ a │ │ │ │ │ │ │ 1 │ b │ │ │ │ │ │ │ 2 │ c │ │ │ │ │ │ ╰───┴───╯ │ │ │ │ 1 │ ╭───┬───╮ │ f │ false │ │ │ │ 0 │ b │ │ │ │ │ │ │ 1 │ c │ │ │ │ │ │ │ 2 │ d │ │ │ │ │ │ ╰───┴───╯ │ │ │ │ 2 │ ╭───┬───╮ │ f │ true │ │ │ │ 0 │ c │ │ │ │ │ │ │ 1 │ d │ │ │ │ │ │ │ 2 │ f │ │ │ │ │ │ ╰───┴───╯ │ │ │ ╰───┴───────────┴───┴───────╯ ``` or ``` ~/Projects/nushell/nushell> let df = [[a, b]; [[1,2,3], 4] [[2,4,1], 2] [[2,1,6], 3]] \| polars into-df -s {a: list<i64>, b: i64} ~/Projects/nushell/nushell> $df \| polars with-column [(polars col a \| polars list-contains ((polars col b) * 2) \| polars as c)] \| polars collect ╭───┬───────────┬───┬───────╮ │ # │ a │ b │ c │ ├───┼───────────┼───┼───────┤ │ 0 │ ╭───┬───╮ │ 4 │ false │ │ │ │ 0 │ 1 │ │ │ │ │ │ │ 1 │ 2 │ │ │ │ │ │ │ 2 │ 3 │ │ │ │ │ │ ╰───┴───╯ │ │ │ │ 1 │ ╭───┬───╮ │ 2 │ true │ │ │ │ 0 │ 2 │ │ │ │ │ │ │ 1 │ 4 │ │ │ │ │ │ │ 2 │ 1 │ │ │ │ │ │ ╰───┴───╯ │ │ │ │ 2 │ ╭───┬───╮ │ 3 │ true │ │ │ │ 0 │ 2 │ │ │ │ │ │ │ 1 │ 1 │ │ │ │ │ │ │ 2 │ 6 │ │ │ │ │ │ ╰───┴───╯ │ │ │ ╰───┴───────────┴───┴───────╯ ``` Let me know what you think. I'm a bit surprised that a list by default seems to get converted to "object" when doing `into-df` which is why I added the extra `-s` flag every time to explicitly force it into a list.	2025-03-12 08:25:03 -07:00
Jack Wright	23ba613b00	Polars AWS S3 support (#14648 ) # Description Provides Amazon S3 support. - Utilizes your existing AWS cli configuration. - Supports AWS SSO - Supports [gimme-aws-creds](https://github.com/Nike-Inc/gimme-aws-creds). - respects the settings of AWS_PROFILE environment variable for selecting profile config - AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION environment variables for configuring without an AWS config Usage: ```nushell polars open s3://bucket/and/path.parquet ``` Supports: - CSV - Parquet - NDJSON / json lines - Arrow Doesn't support: - eager dataframes - Avro - JSON	2024-12-25 06:15:50 -06:00
Jack Wright	c535c24d03	catch unwrap on panics with `polars collect` (#13850 ) # Description This resurrects the work from #12866 and fixes #12732. Polars panics for a plethora or reasons. While handling panics is generally frowned upon, in cases like with `polars collect` a panic cause a lot of work to be lost. Often you might have multiple dataframes in memory and you are trying one operation and lose all state. While it possible the panic can leave things a strange state, it is pretty unlikely as part of a polars pipeline. Most of the time polars objects are not manipulating dataframes in memory mutability, but rather creating a new dataframe the operations being applied. This is always the case with a lazy pipeline. After the collect call, the original dataframes are intact still and I haven't observed any side effects.	2024-09-15 07:21:02 -05:00
Jack Wright	8d60c0d35d	Migrating polars commands away from macros, removed custom DataFrame comparison. (#13829 ) # Description This PR: - Removes the lazy_command, expr_command macros and migrates the commands that were utilizing them. - Removes the custom logic in DataFrameValues::is_equals to use the polars DataFrame version of PartialEq - Adds examples to commands that previously did not have examples or had inadequate ones. NOTE: A lot of examples now have a `polars sort` at the end. This is needed due to the comparison in the result. The new polars version of equals cares about the ordering. I removed the custom equals logic as it causes comparisons to lock up when comparing dataframes that contain a row that contains a list. I discovered this issue when adding examples to `polars implode`	2024-09-11 10:33:05 -07:00
Jack Wright	f531cc2058	Polars command reorg (#13798 ) # Description House keeping. Restructures polars modules as discussed in: https://docs.google.com/spreadsheets/d/1gyA58i_yTXKCJ5DbO_RxBNAlK6S7C1M22ppKwVLZltc/edit?usp=sharing	2024-09-06 13:46:37 -07:00
Jack Wright	8316a1597e	Polars: Check to see if the cache is empty before enabling GC. More logging (#13286 ) There was a bug where anytime the plugin cache remove was called, the plugin gc was turned back on. This probably happened when I added the reference counter logic.	2024-07-03 06:44:26 -05:00
Jack Wright	1f1f581357	Converted perf function to be a macro. Utilized the perf macro within the polars plugin. (#13224 ) In this pull request, I converted the `perf` function within `nu_utils` to a macro. This change facilitates easier usage within plugins by allowing the use of `env_logger` and setting `RUST_LOG=nu_plugin_polars` (or another plugin). Without this conversion, the `RUST_LOG` variable would need to be set to `RUST_LOG=nu_utils::utils`, which is less intuitive and impossible to narrow the perf results to one plugin.	2024-06-27 18:56:56 -05:00
Devyn Cairns	91d44f15c1	Allow plugins to report their own version and store it in the registry (#12883 ) # Description This allows plugins to report their version (and potentially other metadata in the future). The version is shown in `plugin list` and in `version`. The metadata is stored in the registry file, and reflects whatever was retrieved on `plugin add`, not necessarily the running binary. This can help you to diagnose if there's some kind of mismatch with what you expect. We could potentially use this functionality to show a warning or error if a plugin being run does not have the same version as what was in the cache file, suggesting `plugin add` be run again, but I haven't done that at this point. It is optional, and it requires the plugin author to make some code changes if they want to provide it, since I can't automatically determine the version of the calling crate or anything tricky like that to do it. Example: ``` > plugin list \| select name version is_running pid ╭───┬────────────────┬─────────┬────────────┬─────╮ │ # │ name │ version │ is_running │ pid │ ├───┼────────────────┼─────────┼────────────┼─────┤ │ 0 │ example │ 0.93.1 │ false │ │ │ 1 │ gstat │ 0.93.1 │ false │ │ │ 2 │ inc │ 0.93.1 │ false │ │ │ 3 │ python_example │ 0.1.0 │ false │ │ ╰───┴────────────────┴─────────┴────────────┴─────╯ ``` cc @maxim-uvarov (he asked for it) # User-Facing Changes - `plugin list` gets a `version` column - `version` shows plugin versions when available - plugin authors should add `fn metadata()` to their `impl Plugin`, but don't have to # Tests + Formatting Tested the low level stuff and also the `plugin list` column. # After Submitting - [ ] update plugin guide docs - [ ] update plugin protocol docs (`Metadata` call & response) - [ ] update plugin template (`fn metadata()` should be easy) - [ ] release notes	2024-06-21 06:27:09 -05:00
Jack Wright	20834c9d47	Added the ability to turn on performance debugging through and env var for the polars plugin (#13191 ) This allows performance debugging to be turned on by setting: ```nushell $env.POLARS_PLUGIN_PERF = "true" ``` Furthermore, this improves the other plugin debugging by allowing the env variable for debugging to be set at any time versus having to be available when nushell is launched: ```nushell $env.POLARS_PLUGIN_DEBUG = "true" ``` This plugin introduces a `perf` function that will output timing results. This works very similar to the perf function available in nu_utils::utils::perf. This version prints everything to std error to not break the plugin stream and uses the engine interface to see if the env variable is configured. This pull requests uses this `perf` function when: * opening csv files as dataframes * opening json lines files as dataframes This will hopefully help provide some more fine grained information on how long it takes polars to open different dataframes. The `perf` can also be utilized later for other dataframes use cases.	2024-06-20 16:37:38 -07:00
Jack Wright	a60381a932	Added commands for working with the plugin cache. (#12576 ) # Description This pull request provides three new commands: `polars store-ls` - moved from `polars ls`. It provides the list of all object stored in the plugin cache `polars store-rm` - deletes a cached object `polars store-get` - gets an object from the cache. The addition of `polars store-get` required adding a reference_count to cached entries. `polars get` is the only command that will increment this value. `polars rm` will remove the value despite it's count. Calls to PolarsPlugin::custom_value_dropped will decrement the value. The prefix store- was chosen due to there already being a `polars cache` command. These commands were not made sub-commands as there isn't a way to display help for sub commands in plugins (e.g. `polars store` displaying help) and I felt the store- seemed fine anyways. The output of `polars store-ls` now shows the reference count for each object. # User-Facing Changes polars ls has now moved to polars store-ls --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-21 19:43:43 -05:00
Jack Wright	5f818eaefe	Ensure that lazy frames converted via to-lazy are not converted back to eager frames later in the pipeline. (#12525 ) # Description @maxim-uvarov discovered the following error: ``` > [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars sort-by a \| polars unique --subset [a] Error: × Error using as series ╭─[entry #1:1:68] 1 │ [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars sort-by a \| polars unique --subset [a] · ──────┬────── · ╰── dataframe has more than one column ╰──── ``` During investigation, I discovered the root cause was that the lazy frame was incorrectly converted back to a eager dataframe. In order to keep this from happening, I explicitly set that the dataframe did not come from an eager frame. This causes the conversion logic to not attempt to convert the dataframe later in the pipeline. --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-15 18:29:42 -05:00
Ian Manske	211d9c685c	Fix clippy lint (#12504 ) Just fixes a clippy lint.	2024-04-13 16:19:32 +00:00
Jack Wright	b9c2f9ee56	displaying span information, creation time, and size with polars ls (#12472 ) # Description `polars ls` is already different that `dfr ls`. Currently it just shows the cache key, columns, rows, and type. I have added: - creation time - size - span contents - span start and end <img width="1471" alt="Screenshot 2024-04-10 at 17 27 06" src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408"> # Tests + Formatting Done Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-12 09:23:46 -05:00
Jack Wright	efc1cfa939	Move dataframes support to a plugin (#12220 ) WIP This PR covers migration crates/nu-cmd-dataframes to a new plugin ./crates/nu_plugin_polars ## TODO List Other: - [X] Fix examples - [x] Fix Plugin Test Harness - [X] Move Cache to Mutex<BTreeMap> - [X] Logic for disabling/enabling plugin GC based off whether items are cached. - [x] NuExpression custom values - [X] Optimize caching (don't cache every object creation). - [x] Fix dataframe operations (in NuDataFrameCustomValue::operations) - [x] Added plugin_debug! macro that for checking an env variable POLARS_PLUGIN_DEBUG Fix duplicated commands: - [x] There are two polars median commands, one for lazy and one for expr.. there should only be one that works for both. I temporarily called on polars expr-median (inside expressions_macros.rs) - [x] polars quantile (lazy, and expr). the expr one is temporarily expr-median - [x] polars is-in (renamed one series-is-in) Commands: - [x] AppendDF - [x] CastDF - [X] ColumnsDF - [x] DataTypes - [x] Summary - [x] DropDF - [x] DropDuplicates - [x] DropNulls - [x] Dummies - [x] FilterWith - [X] FirstDF - [x] GetDF - [x] LastDF - [X] ListDF - [x] MeltDF - [X] OpenDataFrame - [x] QueryDf - [x] RenameDF - [x] SampleDF - [x] SchemaDF - [x] ShapeDF - [x] SliceDF - [x] TakeDF - [X] ToArrow - [x] ToAvro - [X] ToCSV - [X] ToDataFrame - [X] ToNu - [x] ToParquet - [x] ToJsonLines - [x] WithColumn - [x] ExprAlias - [x] ExprArgWhere - [x] ExprCol - [x] ExprConcatStr - [x] ExprCount - [x] ExprLit - [x] ExprWhen - [x] ExprOtherwise - [x] ExprQuantile - [x] ExprList - [x] ExprAggGroups - [x] ExprCount - [x] ExprIsIn - [x] ExprNot - [x] ExprMax - [x] ExprMin - [x] ExprSum - [x] ExprMean - [x] ExprMedian - [x] ExprStd - [x] ExprVar - [x] ExprDatePart - [X] LazyAggregate - [x] LazyCache - [X] LazyCollect - [x] LazyFetch - [x] LazyFillNA - [x] LazyFillNull - [x] LazyFilter - [x] LazyJoin - [x] LazyQuantile - [x] LazyMedian - [x] LazyReverse - [x] LazySelect - [x] LazySortBy - [x] ToLazyFrame - [x] ToLazyGroupBy - [x] LazyExplode - [x] LazyFlatten - [x] AllFalse - [x] AllTrue - [x] ArgMax - [x] ArgMin - [x] ArgSort - [x] ArgTrue - [x] ArgUnique - [x] AsDate - [x] AsDateTime - [x] Concatenate - [x] Contains - [x] Cumulative - [x] GetDay - [x] GetHour - [x] GetMinute - [x] GetMonth - [x] GetNanosecond - [x] GetOrdinal - [x] GetSecond - [x] GetWeek - [x] GetWeekDay - [x] GetYear - [x] IsDuplicated - [x] IsIn - [x] IsNotNull - [x] IsNull - [x] IsUnique - [x] NNull - [x] NUnique - [x] NotSeries - [x] Replace - [x] ReplaceAll - [x] Rolling - [x] SetSeries - [x] SetWithIndex - [x] Shift - [x] StrLengths - [x] StrSlice - [x] StrFTime - [x] ToLowerCase - [x] ToUpperCase - [x] Unique - [x] ValueCount --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-09 19:31:43 -05:00

17 Commits