nushell/crates/nu-command/src/sort_utils.rs
Benjamin Lee 53beba7acc
Support passing an empty list to sort, uniq, sort-by, and uniq-by (issue #5957) (#8669)
# Description

Currently, all four of these commands return a (rather-confusing)
spanless error when passed an empty list:

```
> [] | sort
Error: 
  × no values to work with
  help: no values to work with
```

This PR changes these commands to always output `[]` if the input is
`[]`.

```
> [] | sort
╭────────────╮
│ empty list │
╰────────────╯
> [] | uniq-by foo
╭────────────╮
│ empty list │
╰────────────╯
```

I'm not sure what the original logic was here, but in the case of `sort`
and `uniq`, I think the current behavior is straightforwardly wrong.

`sort-by` and `uniq-by` are a bit more complicated, since they currently
try to perform some validation that the specified column name is present
in the input (see #8667 for problems with this validation, where a
possible outcome is removing the validation entirely). When passed `[]`,
it's not possible to do any validation because there are no records.
This opens up the possibility for situations like the following:

```
> [[foo]; [5] [6]] | where foo < 3 | sort-by bar
╭────────────╮
│ empty list │
╰────────────╯
```

I think there's a strong argument that `[]` is the best output for these
commands as well, since it makes pipelines like `$table | filter
$condition | sort-by $column` more predictable. Currently, this pipeline
will throw an error if `filter` evaluates to `[]`, but work fine
otherwise. This makes it difficult to write reliable code, especially
since users are not likely to encounter the `filter -> []` case in
testing (issue #5957). The only workaround is to insert manual checks
for an empty result. IMO, this is significantly worse than the "you can
typo a column name without getting an error" problem shown above.

Other commands that take column arguments (`get`, `select`, `rename`,
etc) already have `[] -> []`, so there's existing precedent for this
behavior.

The core question here is "what columns does `[]` have"? The current
behavior of `sort-by` is "no columns", while the current behavior of
`select` is "all possible columns". Both answers lead to accepting some
likely-buggy code without throwing on error, but in order to do better
here we would need something like `Value::Table` that tracks columns on
empty tables.

If other people disagree with this logic, I'm happy to split out the
`sort-by` and `uniq-by` changes into another PR.

# User-Facing Changes

`sort`, `uniq`, `sort-by`, and `uniq-by` now return `[]` instead of
throwing an error when input is `[]`.

# After Submitting

> If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.

The existing behavior was not documented, and the new behavior is what
you would expect by default, so I don't think we need to update
documentation.

---------

Co-authored-by: Reilly Wood <reilly.wood@icloud.com>
2023-03-29 19:55:38 -07:00

363 lines
12 KiB
Rust

use alphanumeric_sort::compare_str;
use nu_engine::column::nonexistent_column;
use nu_protocol::{ShellError, Span, Value};
use std::cmp::Ordering;
// This module includes sorting functionality that is useful in sort-by and elsewhere.
// Eventually it would be nice to find a better home for it; sorting logic is only coupled
// to commands for historical reasons.
/// Sort a value. This only makes sense for lists and list-like things,
/// so for everything else we just return the value as-is.
/// CustomValues are converted to their base value and then sorted.
pub fn sort_value(
val: &Value,
sort_columns: Vec<String>,
ascending: bool,
insensitive: bool,
natural: bool,
) -> Result<Value, ShellError> {
match val {
Value::List { vals, span } => {
let mut vals = vals.clone();
sort(&mut vals, sort_columns, *span, insensitive, natural)?;
if !ascending {
vals.reverse();
}
Ok(Value::List { vals, span: *span })
}
Value::CustomValue { val, span } => {
let base_val = val.to_base_value(*span)?;
sort_value(&base_val, sort_columns, ascending, insensitive, natural)
}
_ => Ok(val.to_owned()),
}
}
/// Sort a value in-place. This is more efficient than sort_value() because it
/// avoids cloning, but it does not work for CustomValues; they are returned as-is.
pub fn sort_value_in_place(
val: &mut Value,
sort_columns: Vec<String>,
ascending: bool,
insensitive: bool,
natural: bool,
) -> Result<(), ShellError> {
if let Value::List { vals, span } = val {
sort(vals, sort_columns, *span, insensitive, natural)?;
if !ascending {
vals.reverse();
}
}
Ok(())
}
pub fn sort(
vec: &mut [Value],
sort_columns: Vec<String>,
span: Span,
insensitive: bool,
natural: bool,
) -> Result<(), ShellError> {
match vec.first() {
Some(Value::Record {
cols,
vals: _input_vals,
span: val_span,
}) => {
if sort_columns.is_empty() {
// This uses the same format as the 'requires a column name' error in split_by.rs
return Err(ShellError::GenericError(
"expected name".into(),
"requires a column name to sort table data".into(),
Some(span),
None,
Vec::new(),
));
}
if let Some(nonexistent) = nonexistent_column(sort_columns.clone(), cols.to_vec()) {
return Err(ShellError::CantFindColumn {
col_name: nonexistent,
span,
src_span: *val_span,
});
}
// check to make sure each value in each column in the record
// that we asked for is a string. So, first collect all the columns
// that we asked for into vals, then later make sure they're all
// strings.
let mut vals = vec![];
for item in vec.iter() {
for col in &sort_columns {
let val = item
.get_data_by_key(col)
.unwrap_or_else(|| Value::nothing(Span::unknown()));
vals.push(val);
}
}
let should_sort_case_insensitively = insensitive
&& vals
.iter()
.all(|x| matches!(x.get_type(), nu_protocol::Type::String));
let should_sort_case_naturally = natural
&& vals
.iter()
.all(|x| matches!(x.get_type(), nu_protocol::Type::String));
vec.sort_by(|a, b| {
compare(
a,
b,
&sort_columns,
span,
should_sort_case_insensitively,
should_sort_case_naturally,
)
});
}
_ => {
vec.sort_by(|a, b| {
if insensitive {
let lowercase_left = match a {
Value::String { val, span } => Value::String {
val: val.to_ascii_lowercase(),
span: *span,
},
_ => a.clone(),
};
let lowercase_right = match b {
Value::String { val, span } => Value::String {
val: val.to_ascii_lowercase(),
span: *span,
},
_ => b.clone(),
};
if natural {
match (lowercase_left.as_string(), lowercase_right.as_string()) {
(Ok(left), Ok(right)) => compare_str(left, right),
_ => Ordering::Equal,
}
} else {
lowercase_left
.partial_cmp(&lowercase_right)
.unwrap_or(Ordering::Equal)
}
} else if natural {
match (a.as_string(), b.as_string()) {
(Ok(left), Ok(right)) => compare_str(left, right),
_ => Ordering::Equal,
}
} else {
a.partial_cmp(b).unwrap_or(Ordering::Equal)
}
});
}
}
Ok(())
}
pub fn compare(
left: &Value,
right: &Value,
columns: &[String],
span: Span,
insensitive: bool,
natural: bool,
) -> Ordering {
for column in columns {
let left_value = left.get_data_by_key(column);
let left_res = match left_value {
Some(left_res) => left_res,
None => Value::Nothing { span },
};
let right_value = right.get_data_by_key(column);
let right_res = match right_value {
Some(right_res) => right_res,
None => Value::Nothing { span },
};
let result = if insensitive {
let lowercase_left = match left_res {
Value::String { val, span } => Value::String {
val: val.to_ascii_lowercase(),
span,
},
_ => left_res,
};
let lowercase_right = match right_res {
Value::String { val, span } => Value::String {
val: val.to_ascii_lowercase(),
span,
},
_ => right_res,
};
if natural {
match (lowercase_left.as_string(), lowercase_right.as_string()) {
(Ok(left), Ok(right)) => compare_str(left, right),
_ => Ordering::Equal,
}
} else {
lowercase_left
.partial_cmp(&lowercase_right)
.unwrap_or(Ordering::Equal)
}
} else if natural {
match (left_res.as_string(), right_res.as_string()) {
(Ok(left), Ok(right)) => compare_str(left, right),
_ => Ordering::Equal,
}
} else {
left_res.partial_cmp(&right_res).unwrap_or(Ordering::Equal)
};
if result != Ordering::Equal {
return result;
}
}
Ordering::Equal
}
#[test]
fn test_sort_value() {
let val = Value::List {
vals: vec![
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("pear"), Value::test_int(3)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("orange"), Value::test_int(7)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("apple"), Value::test_int(9)],
),
],
span: Span::test_data(),
};
let sorted_alphabetically =
sort_value(&val, vec!["fruit".to_string()], true, false, false).unwrap();
assert_eq!(
sorted_alphabetically,
Value::List {
vals: vec![
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("apple"), Value::test_int(9)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("orange"), Value::test_int(7)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("pear"), Value::test_int(3)],
),
],
span: Span::test_data(),
}
);
let sorted_by_count_desc =
sort_value(&val, vec!["count".to_string()], false, false, false).unwrap();
assert_eq!(
sorted_by_count_desc,
Value::List {
vals: vec![
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("apple"), Value::test_int(9)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("orange"), Value::test_int(7)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("pear"), Value::test_int(3)],
),
],
span: Span::test_data(),
}
);
}
#[test]
fn test_sort_value_in_place() {
let mut val = Value::List {
vals: vec![
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("pear"), Value::test_int(3)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("orange"), Value::test_int(7)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("apple"), Value::test_int(9)],
),
],
span: Span::test_data(),
};
sort_value_in_place(&mut val, vec!["fruit".to_string()], true, false, false).unwrap();
assert_eq!(
val,
Value::List {
vals: vec![
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("apple"), Value::test_int(9)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("orange"), Value::test_int(7)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("pear"), Value::test_int(3)],
),
],
span: Span::test_data(),
}
);
sort_value_in_place(&mut val, vec!["count".to_string()], false, false, false).unwrap();
assert_eq!(
val,
Value::List {
vals: vec![
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("apple"), Value::test_int(9)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("orange"), Value::test_int(7)],
),
Value::test_record(
vec!["fruit", "count"],
vec![Value::test_string("pear"), Value::test_int(3)],
),
],
span: Span::test_data(),
}
);
}