Add "--as-columns" flag to polars into-df (#13449)

<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->
Per discussion on
[Discord](https://discord.com/channels/601130461678272522/864228801851949077/1265718178927870045)
# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->

To facilitate column-oriented dataframe construction, this PR added a
`--as-columns` flag to `polars into-df` command so that when specified,
and when input shape is record of lists, each list will be treated as a
column rather than a cell value, i.e. `{a: [1 3], b: [2 4]} | polars
into-df --as-columns` returns the same dataframe as `[[a b];[1 2] [3 4]]
| polars into-df`


# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

A new flag `--as-columns`, no change of semantics if this flag is
unspecified.

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

---------

Co-authored-by: Ben Yang <ben@ya.ng>
This commit is contained in:
suimong 2024-07-30 21:50:50 +08:00 committed by GitHub
parent fe57c5c22e
commit 12f57dbc62
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 60 additions and 2 deletions

View File

@ -37,6 +37,11 @@ impl PluginCommand for ToDataFrame {
r#"Polars Schema in format [{name: str}]. CSV, JSON, and JSONL files"#,
Some('s'),
)
.switch(
"as-columns",
r#"When input shape is record of lists, treat each list as column values."#,
Some('c'),
)
.input_output_type(Type::Any, Type::Custom("dataframe".into()))
.category(Category::Custom("dataframe".into()))
}
@ -64,6 +69,27 @@ impl PluginCommand for ToDataFrame {
.into_value(Span::test_data()),
),
},
Example {
description: "Takes a record of lists and creates a dataframe",
example: "{a: [1 3], b: [2 4]} | polars into-df --as-columns",
result: Some(
NuDataFrame::try_from_columns(
vec![
Column::new(
"a".to_string(),
vec![Value::test_int(1), Value::test_int(3)],
),
Column::new(
"b".to_string(),
vec![Value::test_int(2), Value::test_int(4)],
),
],
None,
)
.expect("simple df for test should not fail")
.into_value(Span::test_data()),
),
},
Example {
description: "Takes a list of tables and creates a dataframe",
example: "[[1 2 a] [3 4 b] [5 6 c]] | polars into-df",
@ -182,7 +208,39 @@ impl PluginCommand for ToDataFrame {
.map(|schema| NuSchema::try_from(&schema))
.transpose()?;
let df = NuDataFrame::try_from_iter(plugin, input.into_iter(), maybe_schema.clone())?;
let maybe_as_columns = call.has_flag("as-columns")?;
let df = if !maybe_as_columns {
NuDataFrame::try_from_iter(plugin, input.into_iter(), maybe_schema.clone())?
} else {
match &input {
PipelineData::Value(Value::Record { val, .. }, _) => {
let items: Result<Vec<(String, Vec<Value>)>, &str> = val
.iter()
.map(|(k, v)| match v.to_owned().into_list() {
Ok(v) => Ok((k.to_owned(), v)),
_ => Err("error"),
})
.collect();
match items {
Ok(items) => {
let columns = items
.iter()
.map(|(k, v)| Column::new(k.to_owned(), v.to_owned()))
.collect::<Vec<Column>>();
NuDataFrame::try_from_columns(columns, maybe_schema)?
}
Err(_) => NuDataFrame::try_from_iter(
plugin,
input.into_iter(),
maybe_schema.clone(),
)?,
}
}
_ => NuDataFrame::try_from_iter(plugin, input.into_iter(), maybe_schema.clone())?,
}
};
df.to_pipeline_data(plugin, engine, call.head)
.map_err(LabeledError::from)
}

View File

@ -23,7 +23,7 @@ impl PluginCommand for ToNu {
}
fn usage(&self) -> &str {
"Converts a dataframe or an expression into into nushell value for access and exploration."
"Converts a dataframe or an expression into nushell value for access and exploration."
}
fn signature(&self) -> Signature {