mirror of
https://github.com/nushell/nushell.git
synced 2025-08-10 09:18:04 +02:00
Add run-time type checking for command pipeline input (#14741)
<!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the `echo` command, which has an output type of `any`, so prefixing a literal with `echo` will bypass parse-time type checking. Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-`any` type to become a parse-time error if a command is missing that type in its pipeline input/output (`drop nth` and `history import` do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (`get` does this on an empty pipeline input, and `sort` used to before #13154). After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for _all_ commands. **This will likely cause some breakage**, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that `toolkit.nu` was passing `null` values to `str join`, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of `get -i`: ```nushell let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion) ``` Before this PR, if the `get -i 0` ever actually did evaluate to `null`, the second `get` invocation would error since `get` doesn't operate on `null` values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to carapace. A notable exception to the type checking is commands with input type of `nothing -> <type>`. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, `123 | echo $in` would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain) I added a `is_subtype` method to `Value` and `PipelineData`. It functions slightly differently than `get_type().is_subtype()`, as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]` is a subtype of `table<a: int>`, whereas the type returned by `Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some special handling for `ListStream`s and `ByteStream`s. The latter was needed for this PR to work properly with external commands. Here's some examples. Before: ```nu 1..2 | drop nth 1 Error: nu::parser::input_type_mismatch × Command does not support range input. ╭─[entry #9:1:8] 1 │ 1..2 | drop nth 1 · ────┬─── · ╰── command doesn't support range input ╰──── echo 1..2 | drop nth 1 # => ╭───┬───╮ # => │ 0 │ 1 │ # => ╰───┴───╯ ``` After this PR, I've adjusted `drop nth`'s input/output types to accept range input. Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not: ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => ╭───┬───────────╮ # => │ 0 │ ╭───┬───╮ │ # => │ │ │ 0 │ 1 │ │ # => │ │ │ 1 │ 2 │ │ # => │ │ ╰───┴───╯ │ # => ╰───┴───────────╯ ``` After this PR, it works the same in both cases. For cases like this, if we do decide we want `zip` or other commands to accept any input value, then we should explicitly add that to the input types. ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #14:2:6] # => 2 │ echo 1 | zip [2] # => · ┬ ─┬─ # => · │ ╰── only list<any> and range input data is supported # => · ╰── input type: int # => ╰──── ``` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> **Breaking change**: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands. Example function and corresponding parse-time error (same before and after PR): ```nushell def foo []: int -> nothing { print $"my cool int is ($in)" } 1 | foo # => my cool int is 1 "evil string" | foo # => Error: nu::parser::input_type_mismatch # => # => × Command does not support string input. # => ╭─[entry #16:1:17] # => 1 │ "evil string" | foo # => · ─┬─ # => · ╰── command doesn't support string input # => ╰──── # => ``` Before: ```nu echo "evil string" | foo # => my cool int is evil string ``` After: ```nu echo "evil string" | foo # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #17:1:6] # => 1 │ echo "evil string" | foo # => · ──────┬────── ─┬─ # => · │ ╰── only int input data is supported # => · ╰── input type: string # => ╰──── ``` Known affected internal commands which erroneously accepted any type: * `str join` * `zip` * `reduce` # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> * Play whack-a-mole with the commands and scripts this will inevitably break
This commit is contained in:
@ -122,6 +122,46 @@ impl PipelineData {
|
||||
}
|
||||
}
|
||||
|
||||
/// Determine if the `PipelineData` is a [subtype](https://en.wikipedia.org/wiki/Subtyping) of `other`.
|
||||
///
|
||||
/// This check makes no effort to collect a stream, so it may be a different result
|
||||
/// than would be returned by calling [`Value::is_subtype()`] on the result of
|
||||
/// [`.into_value()`](Self::into_value).
|
||||
///
|
||||
/// A `ListStream` acts the same as an empty list type: it is a subtype of any [`list`](Type::List)
|
||||
/// or [`table`](Type::Table) type. After converting to a value, it may become a more specific type.
|
||||
/// For example, a `ListStream` is a subtype of `list<int>` and `list<string>`.
|
||||
/// If calling [`.into_value()`](Self::into_value) results in a `list<int>`,
|
||||
/// then the value would not be a subtype of `list<string>`, in contrast to the original `ListStream`.
|
||||
///
|
||||
/// A `ByteStream` is a subtype of [`string`](Type::String) if it is coercible into a string.
|
||||
/// Likewise, a `ByteStream` is a subtype of [`binary`](Type::Binary) if it is coercible into a binary value.
|
||||
pub fn is_subtype_of(&self, other: &Type) -> bool {
|
||||
match (self, other) {
|
||||
(_, Type::Any) => true,
|
||||
(PipelineData::Empty, Type::Nothing) => true,
|
||||
(PipelineData::Value(val, ..), ty) => val.is_subtype_of(ty),
|
||||
|
||||
// a list stream could be a list with any type, including a table
|
||||
(PipelineData::ListStream(..), Type::List(..) | Type::Table(..)) => true,
|
||||
|
||||
(PipelineData::ByteStream(stream, ..), Type::String)
|
||||
if stream.type_().is_string_coercible() =>
|
||||
{
|
||||
true
|
||||
}
|
||||
(PipelineData::ByteStream(stream, ..), Type::Binary)
|
||||
if stream.type_().is_binary_coercible() =>
|
||||
{
|
||||
true
|
||||
}
|
||||
|
||||
(PipelineData::Empty, _) => false,
|
||||
(PipelineData::ListStream(..), _) => false,
|
||||
(PipelineData::ByteStream(..), _) => false,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn into_value(self, span: Span) -> Result<Value, ShellError> {
|
||||
match self {
|
||||
PipelineData::Empty => Ok(Value::nothing(span)),
|
||||
|
@ -51,7 +51,11 @@ impl Type {
|
||||
Self::Custom(name.into())
|
||||
}
|
||||
|
||||
pub fn is_subtype(&self, other: &Type) -> bool {
|
||||
/// Determine of the [`Type`] is a [subtype](https://en.wikipedia.org/wiki/Subtyping) of `other`.
|
||||
///
|
||||
/// This should only be used at parse-time. If you have a concrete [`Value`](crate::Value) or [`PipelineData`](crate::PipelineData),
|
||||
/// you should use their respective [`is_subtype_of`] methods instead.
|
||||
pub fn is_subtype_of(&self, other: &Type) -> bool {
|
||||
// Structural subtyping
|
||||
let is_subtype_collection = |this: &[(String, Type)], that: &[(String, Type)]| {
|
||||
if this.is_empty() || that.is_empty() {
|
||||
@ -61,7 +65,7 @@ impl Type {
|
||||
} else {
|
||||
that.iter().all(|(col_y, ty_y)| {
|
||||
if let Some((_, ty_x)) = this.iter().find(|(col_x, _)| col_x == col_y) {
|
||||
ty_x.is_subtype(ty_y)
|
||||
ty_x.is_subtype_of(ty_y)
|
||||
} else {
|
||||
false
|
||||
}
|
||||
@ -74,7 +78,7 @@ impl Type {
|
||||
(Type::Float, Type::Number) => true,
|
||||
(Type::Int, Type::Number) => true,
|
||||
(_, Type::Any) => true,
|
||||
(Type::List(t), Type::List(u)) if t.is_subtype(u) => true, // List is covariant
|
||||
(Type::List(t), Type::List(u)) if t.is_subtype_of(u) => true, // List is covariant
|
||||
(Type::Record(this), Type::Record(that)) | (Type::Table(this), Type::Table(that)) => {
|
||||
is_subtype_collection(this, that)
|
||||
}
|
||||
@ -227,21 +231,21 @@ mod tests {
|
||||
#[test]
|
||||
fn test_reflexivity() {
|
||||
for ty in Type::iter() {
|
||||
assert!(ty.is_subtype(&ty));
|
||||
assert!(ty.is_subtype_of(&ty));
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_any_is_top_type() {
|
||||
for ty in Type::iter() {
|
||||
assert!(ty.is_subtype(&Type::Any));
|
||||
assert!(ty.is_subtype_of(&Type::Any));
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_number_supertype() {
|
||||
assert!(Type::Int.is_subtype(&Type::Number));
|
||||
assert!(Type::Float.is_subtype(&Type::Number));
|
||||
assert!(Type::Int.is_subtype_of(&Type::Number));
|
||||
assert!(Type::Float.is_subtype_of(&Type::Number));
|
||||
}
|
||||
|
||||
#[test]
|
||||
@ -250,7 +254,7 @@ mod tests {
|
||||
for ty2 in Type::iter() {
|
||||
let list_ty1 = Type::List(Box::new(ty1.clone()));
|
||||
let list_ty2 = Type::List(Box::new(ty2.clone()));
|
||||
assert_eq!(list_ty1.is_subtype(&list_ty2), ty1.is_subtype(&ty2));
|
||||
assert_eq!(list_ty1.is_subtype_of(&list_ty2), ty1.is_subtype_of(&ty2));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -813,6 +813,66 @@ impl Value {
|
||||
}
|
||||
}
|
||||
|
||||
/// Determine of the [`Value`] is a [subtype](https://en.wikipedia.org/wiki/Subtyping) of `other`
|
||||
///
|
||||
/// If you have a [`Value`], this method should always be used over chaining [`Value::get_type`] with [`Type::is_subtype_of`](crate::Type::is_subtype_of).
|
||||
///
|
||||
/// This method is able to leverage that information encoded in a `Value` to provide more accurate
|
||||
/// type comparison than if one were to collect the type into [`Type`](crate::Type) value with [`Value::get_type`].
|
||||
///
|
||||
/// Empty lists are considered subtypes of all list<T> types.
|
||||
///
|
||||
/// Lists of mixed records where some column is present in all record is a subtype of `table<column>`.
|
||||
/// For example, `[{a: 1, b: 2}, {a: 1}]` is a subtype of `table<a: int>` (but not `table<a: int, b: int>`).
|
||||
///
|
||||
/// See also: [`PipelineData::is_subtype_of`](crate::PipelineData::is_subtype_of)
|
||||
pub fn is_subtype_of(&self, other: &Type) -> bool {
|
||||
// records are structurally typed
|
||||
let record_compatible = |val: &Value, other: &[(String, Type)]| match val {
|
||||
Value::Record { val, .. } => other
|
||||
.iter()
|
||||
.all(|(key, ty)| val.get(key).is_some_and(|inner| inner.is_subtype_of(ty))),
|
||||
_ => false,
|
||||
};
|
||||
|
||||
// All cases matched explicitly to ensure this does not accidentally allocate `Type` if any composite types are introduced in the future
|
||||
match (self, other) {
|
||||
(_, Type::Any) => true,
|
||||
|
||||
// `Type` allocation for scalar types is trivial
|
||||
(
|
||||
Value::Bool { .. }
|
||||
| Value::Int { .. }
|
||||
| Value::Float { .. }
|
||||
| Value::String { .. }
|
||||
| Value::Glob { .. }
|
||||
| Value::Filesize { .. }
|
||||
| Value::Duration { .. }
|
||||
| Value::Date { .. }
|
||||
| Value::Range { .. }
|
||||
| Value::Closure { .. }
|
||||
| Value::Error { .. }
|
||||
| Value::Binary { .. }
|
||||
| Value::CellPath { .. }
|
||||
| Value::Nothing { .. },
|
||||
_,
|
||||
) => self.get_type().is_subtype_of(other),
|
||||
|
||||
// matching composite types
|
||||
(val @ Value::Record { .. }, Type::Record(inner)) => record_compatible(val, inner),
|
||||
(Value::List { vals, .. }, Type::List(inner)) => {
|
||||
vals.iter().all(|val| val.is_subtype_of(inner))
|
||||
}
|
||||
(Value::List { vals, .. }, Type::Table(inner)) => {
|
||||
vals.iter().all(|val| record_compatible(val, inner))
|
||||
}
|
||||
(Value::Custom { val, .. }, Type::Custom(inner)) => val.type_name() == **inner,
|
||||
|
||||
// non-matching composite types
|
||||
(Value::Record { .. } | Value::List { .. } | Value::Custom { .. }, _) => false,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_data_by_key(&self, name: &str) -> Option<Value> {
|
||||
let span = self.span();
|
||||
match self {
|
||||
@ -3880,6 +3940,136 @@ mod tests {
|
||||
}
|
||||
}
|
||||
|
||||
mod is_subtype {
|
||||
use crate::Type;
|
||||
|
||||
use super::*;
|
||||
|
||||
fn assert_subtype_equivalent(value: &Value, ty: &Type) {
|
||||
assert_eq!(value.is_subtype_of(ty), value.get_type().is_subtype_of(ty));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_list() {
|
||||
let ty_int_list = Type::list(Type::Int);
|
||||
let ty_str_list = Type::list(Type::String);
|
||||
let ty_any_list = Type::list(Type::Any);
|
||||
let ty_list_list_int = Type::list(Type::list(Type::Int));
|
||||
|
||||
let list = Value::test_list(vec![
|
||||
Value::test_int(1),
|
||||
Value::test_int(2),
|
||||
Value::test_int(3),
|
||||
]);
|
||||
|
||||
assert_subtype_equivalent(&list, &ty_int_list);
|
||||
assert_subtype_equivalent(&list, &ty_str_list);
|
||||
assert_subtype_equivalent(&list, &ty_any_list);
|
||||
|
||||
let list = Value::test_list(vec![
|
||||
Value::test_int(1),
|
||||
Value::test_string("hi"),
|
||||
Value::test_int(3),
|
||||
]);
|
||||
|
||||
assert_subtype_equivalent(&list, &ty_int_list);
|
||||
assert_subtype_equivalent(&list, &ty_str_list);
|
||||
assert_subtype_equivalent(&list, &ty_any_list);
|
||||
|
||||
let list = Value::test_list(vec![Value::test_list(vec![Value::test_int(1)])]);
|
||||
|
||||
assert_subtype_equivalent(&list, &ty_list_list_int);
|
||||
|
||||
// The type of an empty lists is a subtype of any list or table type
|
||||
let ty_table = Type::Table(Box::new([
|
||||
("a".into(), Type::Int),
|
||||
("b".into(), Type::Int),
|
||||
("c".into(), Type::Int),
|
||||
]));
|
||||
let empty = Value::test_list(vec![]);
|
||||
|
||||
assert_subtype_equivalent(&empty, &ty_any_list);
|
||||
assert!(empty.is_subtype_of(&ty_int_list));
|
||||
assert!(empty.is_subtype_of(&ty_table));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_record() {
|
||||
let ty_abc = Type::Record(Box::new([
|
||||
("a".into(), Type::Int),
|
||||
("b".into(), Type::Int),
|
||||
("c".into(), Type::Int),
|
||||
]));
|
||||
let ty_ab = Type::Record(Box::new([("a".into(), Type::Int), ("b".into(), Type::Int)]));
|
||||
let ty_inner = Type::Record(Box::new([("inner".into(), ty_abc.clone())]));
|
||||
|
||||
let record_abc = Value::test_record(record! {
|
||||
"a" => Value::test_int(1),
|
||||
"b" => Value::test_int(2),
|
||||
"c" => Value::test_int(3),
|
||||
});
|
||||
let record_ab = Value::test_record(record! {
|
||||
"a" => Value::test_int(1),
|
||||
"b" => Value::test_int(2),
|
||||
});
|
||||
|
||||
assert_subtype_equivalent(&record_abc, &ty_abc);
|
||||
assert_subtype_equivalent(&record_abc, &ty_ab);
|
||||
assert_subtype_equivalent(&record_ab, &ty_abc);
|
||||
assert_subtype_equivalent(&record_ab, &ty_ab);
|
||||
|
||||
let record_inner = Value::test_record(record! {
|
||||
"inner" => record_abc
|
||||
});
|
||||
assert_subtype_equivalent(&record_inner, &ty_inner);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_table() {
|
||||
let ty_abc = Type::Table(Box::new([
|
||||
("a".into(), Type::Int),
|
||||
("b".into(), Type::Int),
|
||||
("c".into(), Type::Int),
|
||||
]));
|
||||
let ty_ab = Type::Table(Box::new([("a".into(), Type::Int), ("b".into(), Type::Int)]));
|
||||
let ty_list_any = Type::list(Type::Any);
|
||||
|
||||
let record_abc = Value::test_record(record! {
|
||||
"a" => Value::test_int(1),
|
||||
"b" => Value::test_int(2),
|
||||
"c" => Value::test_int(3),
|
||||
});
|
||||
let record_ab = Value::test_record(record! {
|
||||
"a" => Value::test_int(1),
|
||||
"b" => Value::test_int(2),
|
||||
});
|
||||
|
||||
let table_abc = Value::test_list(vec![record_abc.clone(), record_abc.clone()]);
|
||||
let table_ab = Value::test_list(vec![record_ab.clone(), record_ab.clone()]);
|
||||
|
||||
assert_subtype_equivalent(&table_abc, &ty_abc);
|
||||
assert_subtype_equivalent(&table_abc, &ty_ab);
|
||||
assert_subtype_equivalent(&table_ab, &ty_abc);
|
||||
assert_subtype_equivalent(&table_ab, &ty_ab);
|
||||
assert_subtype_equivalent(&table_abc, &ty_list_any);
|
||||
|
||||
let table_mixed = Value::test_list(vec![record_abc.clone(), record_ab.clone()]);
|
||||
assert_subtype_equivalent(&table_mixed, &ty_abc);
|
||||
assert!(table_mixed.is_subtype_of(&ty_ab));
|
||||
|
||||
let ty_a = Type::Table(Box::new([("a".into(), Type::Any)]));
|
||||
let table_mixed_types = Value::test_list(vec![
|
||||
Value::test_record(record! {
|
||||
"a" => Value::test_int(1),
|
||||
}),
|
||||
Value::test_record(record! {
|
||||
"a" => Value::test_string("a"),
|
||||
}),
|
||||
]);
|
||||
assert!(table_mixed_types.is_subtype_of(&ty_a));
|
||||
}
|
||||
}
|
||||
|
||||
mod into_string {
|
||||
use chrono::{DateTime, FixedOffset};
|
||||
|
||||
|
Reference in New Issue
Block a user