WMR b67b6f7fc5
Add a datepart expression for dfr to be used with dfr with-column (#9285)
# Description

Today the only way to extract date parts from a dfr series is the dfr
get-* set of commands. These create a new dataframe with just the
datepart in it, which is almost entirely useless. As far as I can tell
there's no way to append it as a series in the original dataframe. In
discussion with fdncred on Discord we decided the best route was to add
an expression for modifying columns created in dfr with-column. These
are the way you manipulate series within a data frame.

I'd like feedback on this approach - I think it's a fair way to handle
things. An example to test it would be:

```[[ record_time]; [ (date now)]]  | dfr into-df | dfr with-column [ ((dfr col record_time) | dfr datepart nanosecond | dfr as "ns" ), (dfr col record_time | dfr datepart second | dfr as "s"), (dfr col record_time | dfr datepart minute | dfr as "m"), (dfr col record_time | dfr datepart hour | dfr as "h") ]```

I'm also proposing we deprecate the dfr get-* commands.  I've not been able to figure out any meaningful way they could ever be useful, and this approach makes more sense by attaching the extracted date part to the row in the original dataframe as a new column.

<!--
Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes.

Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience.
-->

# User-Facing Changes

add in dfr datepart as an expression
<!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. -->

# Tests + Formatting
Need to add some better assertive tests.  I'm also not sure how to properly write the test_dataframe at the bottom, but will revisit as part of this PR.  Wanted to get feedback early.

<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect -A clippy::result_large_err` to check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass
- `cargo run -- crates/nu-std/tests/run.nu` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu  # or use an `env_change` hook to activate it automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->

---------

Co-authored-by: Robert Waugh <robert@waugh.io>
2023-05-30 09:41:18 -05:00

632 lines
19 KiB
Rust

mod custom_value;
use nu_protocol::{PipelineData, ShellError, Span, Value};
use polars::prelude::{col, AggExpr, Expr, Literal};
use serde::{Deserialize, Deserializer, Serialize, Serializer};
// Polars Expression wrapper for Nushell operations
// Object is behind and Option to allow easy implementation of
// the Deserialize trait
#[derive(Default, Clone, Debug)]
pub struct NuExpression(Option<Expr>);
// Mocked serialization of the LazyFrame object
impl Serialize for NuExpression {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.serialize_none()
}
}
// Mocked deserialization of the LazyFrame object
impl<'de> Deserialize<'de> for NuExpression {
fn deserialize<D>(_deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
Ok(NuExpression::default())
}
}
// Referenced access to the real LazyFrame
impl AsRef<Expr> for NuExpression {
fn as_ref(&self) -> &polars::prelude::Expr {
// The only case when there cannot be an expr is if it is created
// using the default function or if created by deserializing something
self.0.as_ref().expect("there should always be a frame")
}
}
impl AsMut<Expr> for NuExpression {
fn as_mut(&mut self) -> &mut polars::prelude::Expr {
// The only case when there cannot be an expr is if it is created
// using the default function or if created by deserializing something
self.0.as_mut().expect("there should always be a frame")
}
}
impl From<Expr> for NuExpression {
fn from(expr: Expr) -> Self {
Self(Some(expr))
}
}
impl NuExpression {
pub fn into_value(self, span: Span) -> Value {
Value::CustomValue {
val: Box::new(self),
span,
}
}
pub fn try_from_value(value: Value) -> Result<Self, ShellError> {
match value {
Value::CustomValue { val, span } => match val.as_any().downcast_ref::<Self>() {
Some(expr) => Ok(NuExpression(expr.0.clone())),
None => Err(ShellError::CantConvert {
to_type: "lazy expression".into(),
from_type: "non-dataframe".into(),
span,
help: None,
}),
},
Value::String { val, .. } => Ok(val.lit().into()),
Value::Int { val, .. } => Ok(val.lit().into()),
Value::Bool { val, .. } => Ok(val.lit().into()),
Value::Float { val, .. } => Ok(val.lit().into()),
x => Err(ShellError::CantConvert {
to_type: "lazy expression".into(),
from_type: x.get_type().to_string(),
span: x.span()?,
help: None,
}),
}
}
pub fn try_from_pipeline(input: PipelineData, span: Span) -> Result<Self, ShellError> {
let value = input.into_value(span);
Self::try_from_value(value)
}
pub fn can_downcast(value: &Value) -> bool {
match value {
Value::CustomValue { val, .. } => val.as_any().downcast_ref::<Self>().is_some(),
Value::List { vals, .. } => vals.iter().all(Self::can_downcast),
Value::String { .. } | Value::Int { .. } | Value::Bool { .. } | Value::Float { .. } => {
true
}
_ => false,
}
}
pub fn into_polars(self) -> Expr {
self.0.expect("Expression cannot be none to convert")
}
pub fn apply_with_expr<F>(self, other: NuExpression, f: F) -> Self
where
F: Fn(Expr, Expr) -> Expr,
{
let expr = self.0.expect("Lazy expression must not be empty to apply");
let other = other.0.expect("Lazy expression must not be empty to apply");
f(expr, other).into()
}
pub fn to_value(&self, span: Span) -> Value {
expr_to_value(self.as_ref(), span)
}
// Convenient function to extract multiple Expr that could be inside a nushell Value
pub fn extract_exprs(value: Value) -> Result<Vec<Expr>, ShellError> {
ExtractedExpr::extract_exprs(value).map(ExtractedExpr::into_exprs)
}
}
#[derive(Debug)]
// Enum to represent the parsing of the expressions from Value
enum ExtractedExpr {
Single(Expr),
List(Vec<ExtractedExpr>),
}
impl ExtractedExpr {
fn into_exprs(self) -> Vec<Expr> {
match self {
Self::Single(expr) => vec![expr],
Self::List(expressions) => expressions
.into_iter()
.flat_map(ExtractedExpr::into_exprs)
.collect(),
}
}
fn extract_exprs(value: Value) -> Result<ExtractedExpr, ShellError> {
match value {
Value::String { val, .. } => Ok(ExtractedExpr::Single(col(val.as_str()))),
Value::CustomValue { .. } => NuExpression::try_from_value(value)
.map(NuExpression::into_polars)
.map(ExtractedExpr::Single),
Value::List { vals, .. } => vals
.into_iter()
.map(Self::extract_exprs)
.collect::<Result<Vec<ExtractedExpr>, ShellError>>()
.map(ExtractedExpr::List),
x => Err(ShellError::CantConvert {
to_type: "expression".into(),
from_type: x.get_type().to_string(),
span: x.span()?,
help: None,
}),
}
}
}
pub fn expr_to_value(expr: &Expr, span: Span) -> Value {
let cols = vec!["expr".to_string(), "value".to_string()];
match expr {
Expr::Alias(expr, alias) => {
let expr = expr_to_value(expr.as_ref(), span);
let alias = Value::String {
val: alias.as_ref().into(),
span,
};
let cols = vec!["expr".into(), "alias".into()];
Value::Record {
cols,
vals: vec![expr, alias],
span,
}
}
Expr::Column(name) => {
let expr_type = Value::String {
val: "column".to_string(),
span,
};
let value = Value::String {
val: name.to_string(),
span,
};
let vals = vec![expr_type, value];
Value::Record { cols, vals, span }
}
Expr::Columns(columns) => {
let expr_type = Value::String {
val: "columns".into(),
span,
};
let value = Value::List {
vals: columns
.iter()
.map(|col| Value::String {
val: col.clone(),
span,
})
.collect(),
span,
};
let vals = vec![expr_type, value];
Value::Record { cols, vals, span }
}
Expr::Literal(literal) => {
let expr_type = Value::String {
val: "literal".into(),
span,
};
let value = Value::String {
val: format!("{literal:?}"),
span,
};
let vals = vec![expr_type, value];
Value::Record { cols, vals, span }
}
Expr::BinaryExpr { left, op, right } => {
let left_val = expr_to_value(left, span);
let right_val = expr_to_value(right, span);
let operator = Value::String {
val: format!("{op:?}"),
span,
};
let cols = vec!["left".into(), "op".into(), "right".into()];
Value::Record {
cols,
vals: vec![left_val, operator, right_val],
span,
}
}
Expr::Ternary {
predicate,
truthy,
falsy,
} => {
let predicate = expr_to_value(predicate.as_ref(), span);
let truthy = expr_to_value(truthy.as_ref(), span);
let falsy = expr_to_value(falsy.as_ref(), span);
let cols = vec!["predicate".into(), "truthy".into(), "falsy".into()];
Value::Record {
cols,
vals: vec![predicate, truthy, falsy],
span,
}
}
Expr::Agg(agg_expr) => {
let value = match agg_expr {
AggExpr::Min { input: expr, .. }
| AggExpr::Max { input: expr, .. }
| AggExpr::Median(expr)
| AggExpr::NUnique(expr)
| AggExpr::First(expr)
| AggExpr::Last(expr)
| AggExpr::Mean(expr)
| AggExpr::Implode(expr)
| AggExpr::Count(expr)
| AggExpr::Sum(expr)
| AggExpr::AggGroups(expr)
| AggExpr::Std(expr, _)
| AggExpr::Var(expr, _) => expr_to_value(expr.as_ref(), span),
AggExpr::Quantile {
expr,
quantile,
interpol,
} => {
let expr = expr_to_value(expr.as_ref(), span);
let quantile = expr_to_value(quantile.as_ref(), span);
let interpol = Value::String {
val: format!("{interpol:?}"),
span,
};
let cols = vec!["expr".into(), "quantile".into(), "interpol".into()];
Value::Record {
cols,
vals: vec![expr, quantile, interpol],
span,
}
}
};
let expr_type = Value::String {
val: "agg".into(),
span,
};
let vals = vec![expr_type, value];
Value::Record { cols, vals, span }
}
Expr::Count => {
let expr = Value::String {
val: "count".into(),
span,
};
let cols = vec!["expr".into()];
Value::Record {
cols,
vals: vec![expr],
span,
}
}
Expr::Wildcard => {
let expr = Value::String {
val: "wildcard".into(),
span,
};
let cols = vec!["expr".into()];
Value::Record {
cols,
vals: vec![expr],
span,
}
}
Expr::Explode(expr) => {
let expr = expr_to_value(expr.as_ref(), span);
let cols = vec!["expr".into()];
Value::Record {
cols,
vals: vec![expr],
span,
}
}
Expr::KeepName(expr) => {
let expr = expr_to_value(expr.as_ref(), span);
let cols = vec!["expr".into()];
Value::Record {
cols,
vals: vec![expr],
span,
}
}
Expr::Nth(i) => {
let expr = Value::int(*i, span);
let cols = vec!["expr".into()];
Value::Record {
cols,
vals: vec![expr],
span,
}
}
Expr::DtypeColumn(dtypes) => {
let vals = dtypes
.iter()
.map(|d| Value::String {
val: format!("{d}"),
span,
})
.collect();
Value::List { vals, span }
}
Expr::Sort { expr, options } => {
let expr = expr_to_value(expr.as_ref(), span);
let options = Value::String {
val: format!("{options:?}"),
span,
};
let cols = vec!["expr".into(), "options".into()];
Value::Record {
cols,
vals: vec![expr, options],
span,
}
}
Expr::Cast {
expr,
data_type,
strict,
} => {
let expr = expr_to_value(expr.as_ref(), span);
let dtype = Value::String {
val: format!("{data_type:?}"),
span,
};
let strict = Value::Bool { val: *strict, span };
let cols = vec!["expr".into(), "dtype".into(), "strict".into()];
Value::Record {
cols,
vals: vec![expr, dtype, strict],
span,
}
}
Expr::Take { expr, idx } => {
let expr = expr_to_value(expr.as_ref(), span);
let idx = expr_to_value(idx.as_ref(), span);
let cols = vec!["expr".into(), "idx".into()];
Value::Record {
cols,
vals: vec![expr, idx],
span,
}
}
Expr::SortBy {
expr,
by,
descending,
} => {
let expr = expr_to_value(expr.as_ref(), span);
let by: Vec<Value> = by.iter().map(|b| expr_to_value(b, span)).collect();
let by = Value::List { vals: by, span };
let descending: Vec<Value> = descending
.iter()
.map(|r| Value::Bool { val: *r, span })
.collect();
let descending = Value::List {
vals: descending,
span,
};
let cols = vec!["expr".into(), "by".into(), "descending".into()];
Value::Record {
cols,
vals: vec![expr, by, descending],
span,
}
}
Expr::Filter { input, by } => {
let input = expr_to_value(input.as_ref(), span);
let by = expr_to_value(by.as_ref(), span);
let cols = vec!["input".into(), "by".into()];
Value::Record {
cols,
vals: vec![input, by],
span,
}
}
Expr::Slice {
input,
offset,
length,
} => {
let input = expr_to_value(input.as_ref(), span);
let offset = expr_to_value(offset.as_ref(), span);
let length = expr_to_value(length.as_ref(), span);
let cols = vec!["input".into(), "offset".into(), "length".into()];
Value::Record {
cols,
vals: vec![input, offset, length],
span,
}
}
Expr::Exclude(expr, excluded) => {
let expr = expr_to_value(expr.as_ref(), span);
let excluded = excluded
.iter()
.map(|e| Value::String {
val: format!("{e:?}"),
span,
})
.collect();
let excluded = Value::List {
vals: excluded,
span,
};
let cols = vec!["expr".into(), "excluded".into()];
Value::Record {
cols,
vals: vec![expr, excluded],
span,
}
}
Expr::RenameAlias { expr, function } => {
let expr = expr_to_value(expr.as_ref(), span);
let function = Value::String {
val: format!("{function:?}"),
span,
};
let cols = vec!["expr".into(), "function".into()];
Value::Record {
cols,
vals: vec![expr, function],
span,
}
}
Expr::AnonymousFunction {
input,
function,
output_type,
options,
} => {
let input: Vec<Value> = input.iter().map(|e| expr_to_value(e, span)).collect();
let input = Value::List { vals: input, span };
let function = Value::String {
val: format!("{function:?}"),
span,
};
let output_type = Value::String {
val: format!("{output_type:?}"),
span,
};
let options = Value::String {
val: format!("{options:?}"),
span,
};
let cols = vec![
"input".into(),
"function".into(),
"output_type".into(),
"options".into(),
];
Value::Record {
cols,
vals: vec![input, function, output_type, options],
span,
}
}
Expr::Function {
input,
function,
options,
} => {
let input: Vec<Value> = input.iter().map(|e| expr_to_value(e, span)).collect();
let input = Value::List { vals: input, span };
let function = Value::String {
val: format!("{function:?}"),
span,
};
let options = Value::String {
val: format!("{options:?}"),
span,
};
let cols = vec!["input".into(), "function".into(), "options".into()];
Value::Record {
cols,
vals: vec![input, function, options],
span,
}
}
Expr::Cache { input, id } => {
let input = expr_to_value(input.as_ref(), span);
let id = Value::String {
val: format!("{id:?}"),
span,
};
let cols = vec!["input".into(), "id".into()];
Value::Record {
cols,
vals: vec![input, id],
span,
}
}
Expr::Window {
function,
partition_by,
order_by,
options,
} => {
let function = expr_to_value(function, span);
let partition_by: Vec<Value> = partition_by
.iter()
.map(|e| expr_to_value(e, span))
.collect();
let partition_by = Value::List {
vals: partition_by,
span,
};
let order_by = order_by
.as_ref()
.map(|e| expr_to_value(e.as_ref(), span))
.unwrap_or_else(|| Value::nothing(span));
let options = Value::String {
val: format!("{options:?}"),
span,
};
let cols = vec![
"function".into(),
"partition_by".into(),
"order_by".into(),
"options".into(),
];
Value::Record {
cols,
vals: vec![function, partition_by, order_by, options],
span,
}
}
}
}