nushell

mirror of https://github.com/nushell/nushell.git synced 2025-07-18 23:24:28 +02:00

Author	SHA1	Message	Date
Jack Wright	23ba613b00	Polars AWS S3 support (#14648 ) # Description Provides Amazon S3 support. - Utilizes your existing AWS cli configuration. - Supports AWS SSO - Supports [gimme-aws-creds](https://github.com/Nike-Inc/gimme-aws-creds). - respects the settings of AWS_PROFILE environment variable for selecting profile config - AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION environment variables for configuring without an AWS config Usage: ```nushell polars open s3://bucket/and/path.parquet ``` Supports: - CSV - Parquet - NDJSON / json lines - Arrow Doesn't support: - eager dataframes - Avro - JSON	2024-12-25 06:15:50 -06:00
Jack Wright	c535c24d03	catch unwrap on panics with `polars collect` (#13850 ) # Description This resurrects the work from #12866 and fixes #12732. Polars panics for a plethora or reasons. While handling panics is generally frowned upon, in cases like with `polars collect` a panic cause a lot of work to be lost. Often you might have multiple dataframes in memory and you are trying one operation and lose all state. While it possible the panic can leave things a strange state, it is pretty unlikely as part of a polars pipeline. Most of the time polars objects are not manipulating dataframes in memory mutability, but rather creating a new dataframe the operations being applied. This is always the case with a lazy pipeline. After the collect call, the original dataframes are intact still and I haven't observed any side effects.	2024-09-15 07:21:02 -05:00
Jack Wright	8d60c0d35d	Migrating polars commands away from macros, removed custom DataFrame comparison. (#13829 ) # Description This PR: - Removes the lazy_command, expr_command macros and migrates the commands that were utilizing them. - Removes the custom logic in DataFrameValues::is_equals to use the polars DataFrame version of PartialEq - Adds examples to commands that previously did not have examples or had inadequate ones. NOTE: A lot of examples now have a `polars sort` at the end. This is needed due to the comparison in the result. The new polars version of equals cares about the ordering. I removed the custom equals logic as it causes comparisons to lock up when comparing dataframes that contain a row that contains a list. I discovered this issue when adding examples to `polars implode`	2024-09-11 10:33:05 -07:00
Jack Wright	f531cc2058	Polars command reorg (#13798 ) # Description House keeping. Restructures polars modules as discussed in: https://docs.google.com/spreadsheets/d/1gyA58i_yTXKCJ5DbO_RxBNAlK6S7C1M22ppKwVLZltc/edit?usp=sharing	2024-09-06 13:46:37 -07:00
Jack Wright	8316a1597e	Polars: Check to see if the cache is empty before enabling GC. More logging (#13286 ) There was a bug where anytime the plugin cache remove was called, the plugin gc was turned back on. This probably happened when I added the reference counter logic.	2024-07-03 06:44:26 -05:00
Jack Wright	1f1f581357	Converted perf function to be a macro. Utilized the perf macro within the polars plugin. (#13224 ) In this pull request, I converted the `perf` function within `nu_utils` to a macro. This change facilitates easier usage within plugins by allowing the use of `env_logger` and setting `RUST_LOG=nu_plugin_polars` (or another plugin). Without this conversion, the `RUST_LOG` variable would need to be set to `RUST_LOG=nu_utils::utils`, which is less intuitive and impossible to narrow the perf results to one plugin.	2024-06-27 18:56:56 -05:00
Devyn Cairns	91d44f15c1	Allow plugins to report their own version and store it in the registry (#12883 ) # Description This allows plugins to report their version (and potentially other metadata in the future). The version is shown in `plugin list` and in `version`. The metadata is stored in the registry file, and reflects whatever was retrieved on `plugin add`, not necessarily the running binary. This can help you to diagnose if there's some kind of mismatch with what you expect. We could potentially use this functionality to show a warning or error if a plugin being run does not have the same version as what was in the cache file, suggesting `plugin add` be run again, but I haven't done that at this point. It is optional, and it requires the plugin author to make some code changes if they want to provide it, since I can't automatically determine the version of the calling crate or anything tricky like that to do it. Example: ``` > plugin list \| select name version is_running pid ╭───┬────────────────┬─────────┬────────────┬─────╮ │ # │ name │ version │ is_running │ pid │ ├───┼────────────────┼─────────┼────────────┼─────┤ │ 0 │ example │ 0.93.1 │ false │ │ │ 1 │ gstat │ 0.93.1 │ false │ │ │ 2 │ inc │ 0.93.1 │ false │ │ │ 3 │ python_example │ 0.1.0 │ false │ │ ╰───┴────────────────┴─────────┴────────────┴─────╯ ``` cc @maxim-uvarov (he asked for it) # User-Facing Changes - `plugin list` gets a `version` column - `version` shows plugin versions when available - plugin authors should add `fn metadata()` to their `impl Plugin`, but don't have to # Tests + Formatting Tested the low level stuff and also the `plugin list` column. # After Submitting - [ ] update plugin guide docs - [ ] update plugin protocol docs (`Metadata` call & response) - [ ] update plugin template (`fn metadata()` should be easy) - [ ] release notes	2024-06-21 06:27:09 -05:00
Jack Wright	20834c9d47	Added the ability to turn on performance debugging through and env var for the polars plugin (#13191 ) This allows performance debugging to be turned on by setting: ```nushell $env.POLARS_PLUGIN_PERF = "true" ``` Furthermore, this improves the other plugin debugging by allowing the env variable for debugging to be set at any time versus having to be available when nushell is launched: ```nushell $env.POLARS_PLUGIN_DEBUG = "true" ``` This plugin introduces a `perf` function that will output timing results. This works very similar to the perf function available in nu_utils::utils::perf. This version prints everything to std error to not break the plugin stream and uses the engine interface to see if the env variable is configured. This pull requests uses this `perf` function when: * opening csv files as dataframes * opening json lines files as dataframes This will hopefully help provide some more fine grained information on how long it takes polars to open different dataframes. The `perf` can also be utilized later for other dataframes use cases.	2024-06-20 16:37:38 -07:00
Jack Wright	a60381a932	Added commands for working with the plugin cache. (#12576 ) # Description This pull request provides three new commands: `polars store-ls` - moved from `polars ls`. It provides the list of all object stored in the plugin cache `polars store-rm` - deletes a cached object `polars store-get` - gets an object from the cache. The addition of `polars store-get` required adding a reference_count to cached entries. `polars get` is the only command that will increment this value. `polars rm` will remove the value despite it's count. Calls to PolarsPlugin::custom_value_dropped will decrement the value. The prefix store- was chosen due to there already being a `polars cache` command. These commands were not made sub-commands as there isn't a way to display help for sub commands in plugins (e.g. `polars store` displaying help) and I felt the store- seemed fine anyways. The output of `polars store-ls` now shows the reference count for each object. # User-Facing Changes polars ls has now moved to polars store-ls --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-21 19:43:43 -05:00
Jack Wright	5f818eaefe	Ensure that lazy frames converted via to-lazy are not converted back to eager frames later in the pipeline. (#12525 ) # Description @maxim-uvarov discovered the following error: ``` > [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars sort-by a \| polars unique --subset [a] Error: × Error using as series ╭─[entry #1:1:68] 1 │ [[a b]; [6 2] [1 4] [4 1]] \| polars into-lazy \| polars sort-by a \| polars unique --subset [a] · ──────┬────── · ╰── dataframe has more than one column ╰──── ``` During investigation, I discovered the root cause was that the lazy frame was incorrectly converted back to a eager dataframe. In order to keep this from happening, I explicitly set that the dataframe did not come from an eager frame. This causes the conversion logic to not attempt to convert the dataframe later in the pipeline. --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-15 18:29:42 -05:00
Ian Manske	211d9c685c	Fix clippy lint (#12504 ) Just fixes a clippy lint.	2024-04-13 16:19:32 +00:00
Jack Wright	b9c2f9ee56	displaying span information, creation time, and size with polars ls (#12472 ) # Description `polars ls` is already different that `dfr ls`. Currently it just shows the cache key, columns, rows, and type. I have added: - creation time - size - span contents - span start and end <img width="1471" alt="Screenshot 2024-04-10 at 17 27 06" src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408"> # Tests + Formatting Done Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-12 09:23:46 -05:00
Jack Wright	efc1cfa939	Move dataframes support to a plugin (#12220 ) WIP This PR covers migration crates/nu-cmd-dataframes to a new plugin ./crates/nu_plugin_polars ## TODO List Other: - [X] Fix examples - [x] Fix Plugin Test Harness - [X] Move Cache to Mutex<BTreeMap> - [X] Logic for disabling/enabling plugin GC based off whether items are cached. - [x] NuExpression custom values - [X] Optimize caching (don't cache every object creation). - [x] Fix dataframe operations (in NuDataFrameCustomValue::operations) - [x] Added plugin_debug! macro that for checking an env variable POLARS_PLUGIN_DEBUG Fix duplicated commands: - [x] There are two polars median commands, one for lazy and one for expr.. there should only be one that works for both. I temporarily called on polars expr-median (inside expressions_macros.rs) - [x] polars quantile (lazy, and expr). the expr one is temporarily expr-median - [x] polars is-in (renamed one series-is-in) Commands: - [x] AppendDF - [x] CastDF - [X] ColumnsDF - [x] DataTypes - [x] Summary - [x] DropDF - [x] DropDuplicates - [x] DropNulls - [x] Dummies - [x] FilterWith - [X] FirstDF - [x] GetDF - [x] LastDF - [X] ListDF - [x] MeltDF - [X] OpenDataFrame - [x] QueryDf - [x] RenameDF - [x] SampleDF - [x] SchemaDF - [x] ShapeDF - [x] SliceDF - [x] TakeDF - [X] ToArrow - [x] ToAvro - [X] ToCSV - [X] ToDataFrame - [X] ToNu - [x] ToParquet - [x] ToJsonLines - [x] WithColumn - [x] ExprAlias - [x] ExprArgWhere - [x] ExprCol - [x] ExprConcatStr - [x] ExprCount - [x] ExprLit - [x] ExprWhen - [x] ExprOtherwise - [x] ExprQuantile - [x] ExprList - [x] ExprAggGroups - [x] ExprCount - [x] ExprIsIn - [x] ExprNot - [x] ExprMax - [x] ExprMin - [x] ExprSum - [x] ExprMean - [x] ExprMedian - [x] ExprStd - [x] ExprVar - [x] ExprDatePart - [X] LazyAggregate - [x] LazyCache - [X] LazyCollect - [x] LazyFetch - [x] LazyFillNA - [x] LazyFillNull - [x] LazyFilter - [x] LazyJoin - [x] LazyQuantile - [x] LazyMedian - [x] LazyReverse - [x] LazySelect - [x] LazySortBy - [x] ToLazyFrame - [x] ToLazyGroupBy - [x] LazyExplode - [x] LazyFlatten - [x] AllFalse - [x] AllTrue - [x] ArgMax - [x] ArgMin - [x] ArgSort - [x] ArgTrue - [x] ArgUnique - [x] AsDate - [x] AsDateTime - [x] Concatenate - [x] Contains - [x] Cumulative - [x] GetDay - [x] GetHour - [x] GetMinute - [x] GetMonth - [x] GetNanosecond - [x] GetOrdinal - [x] GetSecond - [x] GetWeek - [x] GetWeekDay - [x] GetYear - [x] IsDuplicated - [x] IsIn - [x] IsNotNull - [x] IsNull - [x] IsUnique - [x] NNull - [x] NUnique - [x] NotSeries - [x] Replace - [x] ReplaceAll - [x] Rolling - [x] SetSeries - [x] SetWithIndex - [x] Shift - [x] StrLengths - [x] StrSlice - [x] StrFTime - [x] ToLowerCase - [x] ToUpperCase - [x] Unique - [x] ValueCount --------- Co-authored-by: Jack Wright <jack.wright@disqo.com>	2024-04-09 19:31:43 -05:00

13 Commits