feat(polars): introducing new polars replace (#15706)

<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR seeks to port the polars command `replace`
(https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.replace.html)
and `replace_strict`
(https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.replace_strict.html).
See examples below.

Consequently, the current `polars replace` and `polars replace-all` have
been renamed to `polars str-replace` and `polars str-replace-all` to
bring their naming better in-line with `polars str-join` and related str
commands.

```nushell

Usage:
  > polars replace {flags} <old> (new)

Flags:
  -h, --help: Display the help message for this command
  -s, --strict: Require that all values must be replaced or throw an error (ignored if `old` or `new` are expressions).
  -d, --default <any>: Set values that were not replaced to this value. If no default is specified, (default), an error is raised if any values were not replaced. Accepts expression input. Non-expression inputs are parsed as literals.
  -t, --return-dtype <string>: Data type of the resulting expression. If set to `null` (default), the data type is determined automatically based on the other inputs.

Parameters:
  old <one_of(record, list<any>)>: Values to be replaced
  new <list<any>>: Values to replace by (optional)

Input/output types:
  ╭───┬────────────┬────────────╮
  │ # │   input    │   output   │
  ├───┼────────────┼────────────┤
  │ 0 │ expression │ expression │
  ╰───┴────────────┴────────────╯

Examples:
  Replace column with different values of same type
  > [[a]; [1] [1] [2] [2]]
                | polars into-df
                | polars select (polars col a | polars replace [1 2] [10 20])
                | polars collect
  ╭───┬────╮
  │ # │ a  │
  ├───┼────┤
  │ 0 │ 10 │
  │ 1 │ 10 │
  │ 2 │ 20 │
  │ 3 │ 20 │
  ╰───┴────╯

  Replace column with different values of another type
  > [[a]; [1] [1] [2] [2]]
                | polars into-df
                | polars select (polars col a | polars replace [1 2] [a b] --strict)
                | polars collect
  ╭───┬───╮
  │ # │ a │
  ├───┼───┤
  │ 0 │ a │
  │ 1 │ a │
  │ 2 │ b │
  │ 3 │ b │
  ╰───┴───╯

  Replace column with different values based on expressions (cannot be used with strict)
  > [[a]; [1] [1] [2] [2]]
                | polars into-df
                | polars select (polars col a | polars replace [(polars col a | polars max)] [(polars col a | polars max | $in + 5)])
                | polars collect
  ╭───┬───╮
  │ # │ a │
  ├───┼───┤
  │ 0 │ 1 │
  │ 1 │ 1 │
  │ 2 │ 7 │
  │ 3 │ 7 │
  ╰───┴───╯

  Replace column with different values based on expressions with default
  > [[a]; [1] [1] [2] [3]]
                | polars into-df
                | polars select (polars col a | polars replace [1] [10] --default (polars col a | polars max | $in * 100) --strict)
                | polars collect
  ╭───┬─────╮
  │ # │  a  │
  ├───┼─────┤
  │ 0 │  10 │
  │ 1 │  10 │
  │ 2 │ 300 │
  │ 3 │ 300 │
  ╰───┴─────╯

  Replace column with different values based on expressions with default
  > [[a]; [1] [1] [2] [3]]
                | polars into-df
                | polars select (polars col a | polars replace [1] [10] --default (polars col a | polars max | $in * 100) --strict --return-dtype str)
                | polars collect
  ╭───┬─────╮
  │ # │  a  │
  ├───┼─────┤
  │ 0 │ 10  │
  │ 1 │ 10  │
  │ 2 │ 300 │
  │ 3 │ 300 │
  ╰───┴─────╯

  Replace column with different values using a record
  > [[a]; [1] [1] [2] [2]]
                | polars into-df
                | polars select (polars col a | polars replace {1: a, 2: b} --strict --return-dtype str)
                | polars collect
  ╭───┬───╮
  │ # │ a │
  ├───┼───┤
  │ 0 │ a │
  │ 1 │ a │
  │ 2 │ b │
  │ 3 │ b │
  ╰───┴───╯
```

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
**BREAKING CHANGE**: `polars replace` and `polars replace-all` have been
renamed to `polars str-replace` and `polars str-replace-all`.

The new `polars replace` now replaces elements in a series/column rather
than patterns within strings.

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
Example tests were added.

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
This commit is contained in:
pyz4 2025-06-01 15:32:56 -04:00 committed by GitHub
parent 2b524cd861
commit bdc7cdbcc4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 357 additions and 22 deletions

View File

@ -84,6 +84,7 @@ features = [
"parquet",
"pivot",
"random",
"replace",
"rolling_window",
"rows",
"round_series",

View File

@ -40,6 +40,7 @@ mod unnest;
mod unpivot;
mod with_column;
use filter::LazyFilter;
mod replace;
mod shift;
mod unique;
@ -68,6 +69,7 @@ pub use last::LastDF;
pub use lit::ExprLit;
use query_df::QueryDf;
pub use rename::RenameDF;
pub use replace::Replace;
pub use sample::SampleDF;
pub use shift::Shift;
pub use slice::SliceDF;
@ -114,6 +116,7 @@ pub(crate) fn data_commands() -> Vec<Box<dyn PluginCommand<Plugin = PolarsPlugin
Box::new(select::LazySelect),
Box::new(LazySortBy),
Box::new(LazyFilter),
Box::new(Replace),
Box::new(Shift),
Box::new(struct_json_encode::StructJsonEncode),
Box::new(qcut::QCutSeries),

View File

@ -0,0 +1,335 @@
use crate::{
PolarsPlugin,
values::{CustomValueSupport, NuDataFrame, NuExpression, str_to_dtype},
};
use nu_plugin::{EngineInterface, EvaluatedCall, PluginCommand};
use nu_protocol::{
Category, Example, LabeledError, PipelineData, ShellError, Signature, Span, SyntaxShape, Type,
Value,
};
use polars::{df, prelude::*};
#[derive(Clone)]
pub struct Replace;
impl PluginCommand for Replace {
type Plugin = PolarsPlugin;
fn name(&self) -> &str {
"polars replace"
}
fn description(&self) -> &str {
"Create an expression that replaces old values with new values"
}
fn signature(&self) -> Signature {
Signature::build(self.name())
.required(
"old",
SyntaxShape::OneOf(vec![SyntaxShape::Record(vec![]), SyntaxShape::List(Box::new(SyntaxShape::Any))]),
"Values to be replaced",
)
.optional(
"new",
SyntaxShape::List(Box::new(SyntaxShape::Any)),
"Values to replace by",
)
.switch(
"strict",
"Require that all values must be replaced or throw an error (ignored if `old` or `new` are expressions).",
Some('s'),
)
.named(
"default",
SyntaxShape::Any,
"Set values that were not replaced to this value. If no default is specified, (default), an error is raised if any values were not replaced. Accepts expression input. Non-expression inputs are parsed as literals.",
Some('d'),
)
.named(
"return-dtype",
SyntaxShape::String,
"Data type of the resulting expression. If set to `null` (default), the data type is determined automatically based on the other inputs.",
Some('t'),
)
.input_output_type(
Type::Custom("expression".into()),
Type::Custom("expression".into()),
)
.category(Category::Custom("expression".into()))
}
fn examples(&self) -> Vec<Example> {
vec![
Example {
description: "Replace column with different values of same type",
example: "[[a]; [1] [1] [2] [2]]
| polars into-df
| polars select (polars col a | polars replace [1 2] [10 20])
| polars collect",
result: Some(
NuDataFrame::from(
df!("a" => [10, 10, 20, 20])
.expect("simple df for test should not fail"),
)
.into_value(Span::test_data()),
),
},
Example {
description: "Replace column with different values of another type",
example: "[[a]; [1] [1] [2] [2]]
| polars into-df
| polars select (polars col a | polars replace [1 2] [a b] --strict)
| polars collect",
result: Some(
NuDataFrame::from(
df!("a" => ["a", "a", "b", "b"])
.expect("simple df for test should not fail"),
)
.into_value(Span::test_data()),
),
},
Example {
description: "Replace column with different values based on expressions (cannot be used with strict)",
example: "[[a]; [1] [1] [2] [2]]
| polars into-df
| polars select (polars col a | polars replace [(polars col a | polars max)] [(polars col a | polars max | $in + 5)])
| polars collect",
result: Some(
NuDataFrame::from(
df!("a" => [1, 1, 7, 7])
.expect("simple df for test should not fail"),
)
.into_value(Span::test_data()),
),
},
Example {
description: "Replace column with different values based on expressions with default",
example: "[[a]; [1] [1] [2] [3]]
| polars into-df
| polars select (polars col a | polars replace [1] [10] --default (polars col a | polars max | $in * 100) --strict)
| polars collect",
result: Some(
NuDataFrame::from(
df!("a" => [10, 10, 300, 300])
.expect("simple df for test should not fail"),
)
.into_value(Span::test_data()),
),
},
Example {
description: "Replace column with different values based on expressions with default",
example: "[[a]; [1] [1] [2] [3]]
| polars into-df
| polars select (polars col a | polars replace [1] [10] --default (polars col a | polars max | $in * 100) --strict --return-dtype str)
| polars collect",
result: Some(
NuDataFrame::from(
df!("a" => ["10", "10", "300", "300"])
.expect("simple df for test should not fail"),
)
.into_value(Span::test_data()),
),
},
Example {
description: "Replace column with different values using a record",
example: "[[a]; [1] [1] [2] [2]]
| polars into-df
| polars select (polars col a | polars replace {1: a, 2: b} --strict --return-dtype str)
| polars collect",
result: Some(
NuDataFrame::from(
df!("a" => ["a", "a", "b", "b"])
.expect("simple df for test should not fail"),
)
.into_value(Span::test_data()),
),
},
]
}
fn search_terms(&self) -> Vec<&str> {
vec!["replace"]
}
fn run(
&self,
plugin: &Self::Plugin,
engine: &EngineInterface,
call: &EvaluatedCall,
input: PipelineData,
) -> Result<PipelineData, LabeledError> {
let (old_vals, new_vals) = match (call.req(0)?, call.opt::<Value>(1)?) {
(Value::Record { val, .. }, None) => val
.iter()
.map(|(key, value)| (Value::string(key, call.head), value.clone()))
.collect::<Vec<(Value, Value)>>()
.into_iter()
.unzip(),
(Value::List { vals: old_vals, .. }, Some(Value::List { vals: new_vals, .. })) => {
(old_vals, new_vals)
}
(_, _) => {
return Err(LabeledError::from(ShellError::GenericError {
error: "Invalid arguments".into(),
msg: "".into(),
span: Some(call.head),
help: Some("`old` must be either a record or list. If `old` is a record, then `new` must not be specified. Otherwise, `new` must also be a list".into()),
inner: vec![],
}));
}
};
// let new_vals: Vec<Value> = call.req(1)?;
let old = values_to_expr(plugin, call.head, old_vals)?;
let new = values_to_expr(plugin, call.head, new_vals)?;
let strict = call.has_flag("strict")?;
let return_dtype = match call.get_flag::<String>("return-dtype")? {
Some(dtype) => {
if !strict {
return Err(LabeledError::from(ShellError::GenericError {
error: "`return-dtype` may only be used with `strict`".into(),
msg: "".into(),
span: Some(call.head),
help: None,
inner: vec![],
}));
}
Some(str_to_dtype(&dtype, call.head)?)
}
None => None,
};
let default = match call.get_flag::<Value>("default")? {
Some(default) => {
if !strict {
return Err(LabeledError::from(ShellError::GenericError {
error: "`default` may only be used with `strict`".into(),
msg: "".into(),
span: Some(call.head),
help: None,
inner: vec![],
}));
}
Some(values_to_expr(plugin, call.head, vec![default])?)
}
None => None,
};
let expr = NuExpression::try_from_pipeline(plugin, input, call.head)?;
let expr: NuExpression = if strict {
expr.into_polars()
.replace_strict(old, new, default, return_dtype)
.into()
} else {
expr.into_polars().replace(old, new).into()
};
expr.to_pipeline_data(plugin, engine, call.head)
.map_err(LabeledError::from)
}
}
fn values_to_expr(
plugin: &PolarsPlugin,
span: Span,
values: Vec<Value>,
) -> Result<Expr, ShellError> {
match values.first() {
Some(Value::Int { .. }) => {
let series_values = values
.into_iter()
.filter_map(|v| match v {
Value::Int { val, .. } => Some(val),
_ => None,
})
.collect::<Vec<i64>>();
Ok(lit(Series::new("old".into(), &series_values)))
}
Some(Value::Bool { .. }) => {
let series_values = values
.into_iter()
.filter_map(|v| match v {
Value::Bool { val, .. } => Some(val),
_ => None,
})
.collect::<Vec<bool>>();
Ok(lit(Series::new("old".into(), &series_values)))
}
Some(Value::Float { .. }) => {
let series_values = values
.into_iter()
.filter_map(|v| match v {
Value::Float { val, .. } => Some(val),
_ => None,
})
.collect::<Vec<f64>>();
Ok(lit(Series::new("old".into(), &series_values)))
}
Some(Value::String { .. }) => {
let series_values = values
.into_iter()
.filter_map(|v| match v {
Value::String { val, .. } => Some(val),
_ => None,
})
.collect::<Vec<String>>();
Ok(lit(Series::new("old".into(), &series_values)))
}
Some(Value::Custom { .. }) => {
if values.len() > 1 {
return Err(ShellError::GenericError {
error: "Multiple expressions to be replaced is not supported".into(),
msg: "".into(),
span: Some(span),
help: None,
inner: vec![],
});
}
NuExpression::try_from_value(
plugin,
values
.first()
.expect("Presence of first element is enforced at argument parsing."),
)
.map(|expr| expr.into_polars())
}
x @ Some(_) => Err(ShellError::GenericError {
error: "Cannot convert input to expression".into(),
msg: "".into(),
span: Some(span),
help: Some(format!("Unexpected type: {x:?}")),
inner: vec![],
}),
None => Err(ShellError::GenericError {
error: "Missing input values".into(),
msg: "".into(),
span: Some(span),
help: None,
inner: vec![],
}),
}
}
#[cfg(test)]
mod test {
use super::*;
use crate::test::test_polars_plugin_command;
#[test]
fn test_examples() -> Result<(), nu_protocol::ShellError> {
test_polars_plugin_command(&Replace)
}
}

View File

@ -1,9 +1,9 @@
mod concat_str;
mod contains;
mod replace;
mod replace_all;
mod str_join;
mod str_lengths;
mod str_replace;
mod str_replace_all;
mod str_slice;
mod str_split;
mod str_strip_chars;
@ -15,10 +15,10 @@ use nu_plugin::PluginCommand;
pub use concat_str::ExprConcatStr;
pub use contains::Contains;
pub use replace::Replace;
pub use replace_all::ReplaceAll;
pub use str_join::StrJoin;
pub use str_lengths::StrLengths;
pub use str_replace::StrReplace;
pub use str_replace_all::StrReplaceAll;
pub use str_slice::StrSlice;
pub use to_lowercase::ToLowerCase;
pub use to_uppercase::ToUpperCase;
@ -27,8 +27,8 @@ pub(crate) fn string_commands() -> Vec<Box<dyn PluginCommand<Plugin = PolarsPlug
vec![
Box::new(ExprConcatStr),
Box::new(Contains),
Box::new(Replace),
Box::new(ReplaceAll),
Box::new(StrReplace),
Box::new(StrReplaceAll),
Box::new(str_split::StrSplit),
Box::new(str_strip_chars::StrStripChars),
Box::new(StrJoin),

View File

@ -15,13 +15,13 @@ use nu_protocol::{
use polars::prelude::{IntoSeries, StringNameSpaceImpl, lit};
#[derive(Clone)]
pub struct Replace;
pub struct StrReplace;
impl PluginCommand for Replace {
impl PluginCommand for StrReplace {
type Plugin = PolarsPlugin;
fn name(&self) -> &str {
"polars replace"
"polars str-replace"
}
fn description(&self) -> &str {
@ -59,7 +59,7 @@ impl PluginCommand for Replace {
vec![
Example {
description: "Replaces string in column",
example: "[[a]; [abc] [abcabc]] | polars into-df | polars select (polars col a | polars replace --pattern ab --replace AB) | polars collect",
example: "[[a]; [abc] [abcabc]] | polars into-df | polars select (polars col a | polars str-replace --pattern ab --replace AB) | polars collect",
result: Some(
NuDataFrame::try_from_columns(
vec![Column::new(
@ -74,7 +74,7 @@ impl PluginCommand for Replace {
},
Example {
description: "Replaces string",
example: "[abc abc abc] | polars into-df | polars replace --pattern ab --replace AB",
example: "[abc abc abc] | polars into-df | polars str-replace --pattern ab --replace AB",
result: Some(
NuDataFrame::try_from_columns(
vec![Column::new(
@ -101,7 +101,6 @@ impl PluginCommand for Replace {
call: &EvaluatedCall,
input: PipelineData,
) -> Result<PipelineData, LabeledError> {
let metadata = input.metadata();
let value = input.into_value(call.head)?;
match PolarsPluginObject::try_from_value(plugin, &value)? {
PolarsPluginObject::NuDataFrame(df) => command_df(plugin, engine, call, df),
@ -119,7 +118,6 @@ impl PluginCommand for Replace {
)),
}
.map_err(LabeledError::from)
.map(|pd| pd.set_metadata(metadata))
}
}
@ -193,6 +191,6 @@ mod test {
#[test]
fn test_examples() -> Result<(), ShellError> {
test_polars_plugin_command(&Replace)
test_polars_plugin_command(&StrReplace)
}
}

View File

@ -15,13 +15,13 @@ use nu_protocol::{
use polars::prelude::{IntoSeries, StringNameSpaceImpl, lit};
#[derive(Clone)]
pub struct ReplaceAll;
pub struct StrReplaceAll;
impl PluginCommand for ReplaceAll {
impl PluginCommand for StrReplaceAll {
type Plugin = PolarsPlugin;
fn name(&self) -> &str {
"polars replace-all"
"polars str-replace-all"
}
fn description(&self) -> &str {
@ -59,7 +59,7 @@ impl PluginCommand for ReplaceAll {
vec![
Example {
description: "Replaces string in a column",
example: "[[a]; [abac] [abac] [abac]] | polars into-df | polars select (polars col a | polars replace-all --pattern a --replace A) | polars collect",
example: "[[a]; [abac] [abac] [abac]] | polars into-df | polars select (polars col a | polars str-replace-all --pattern a --replace A) | polars collect",
result: Some(
NuDataFrame::try_from_columns(
vec![Column::new(
@ -78,7 +78,7 @@ impl PluginCommand for ReplaceAll {
},
Example {
description: "Replaces string",
example: "[abac abac abac] | polars into-df | polars replace-all --pattern a --replace A",
example: "[abac abac abac] | polars into-df | polars str-replace-all --pattern a --replace A",
result: Some(
NuDataFrame::try_from_columns(
vec![Column::new(
@ -105,7 +105,6 @@ impl PluginCommand for ReplaceAll {
call: &EvaluatedCall,
input: PipelineData,
) -> Result<PipelineData, LabeledError> {
let metadata = input.metadata();
let value = input.into_value(call.head)?;
match PolarsPluginObject::try_from_value(plugin, &value)? {
PolarsPluginObject::NuDataFrame(df) => command_df(plugin, engine, call, df),
@ -123,7 +122,6 @@ impl PluginCommand for ReplaceAll {
)),
}
.map_err(LabeledError::from)
.map(|pd| pd.set_metadata(metadata))
}
}
@ -197,6 +195,6 @@ mod test {
#[test]
fn test_examples() -> Result<(), ShellError> {
test_polars_plugin_command(&ReplaceAll)
test_polars_plugin_command(&StrReplaceAll)
}
}