From 214714e0ab424a5a7408ecc71ed0643f80cbc258 Mon Sep 17 00:00:00 2001 From: 132ikl <132@ikl.sh> Date: Wed, 8 Jan 2025 17:09:47 -0500 Subject: [PATCH] Add run-time type checking for command pipeline input (#14741) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # Description This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the `echo` command, which has an output type of `any`, so prefixing a literal with `echo` will bypass parse-time type checking. Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-`any` type to become a parse-time error if a command is missing that type in its pipeline input/output (`drop nth` and `history import` do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (`get` does this on an empty pipeline input, and `sort` used to before #13154). After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for _all_ commands. **This will likely cause some breakage**, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that `toolkit.nu` was passing `null` values to `str join`, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of `get -i`: ```nushell let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion) ``` Before this PR, if the `get -i 0` ever actually did evaluate to `null`, the second `get` invocation would error since `get` doesn't operate on `null` values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to carapace. A notable exception to the type checking is commands with input type of `nothing -> `. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, `123 | echo $in` would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain) I added a `is_subtype` method to `Value` and `PipelineData`. It functions slightly differently than `get_type().is_subtype()`, as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]` is a subtype of `table`, whereas the type returned by `Value::get_type` is a `list`. Similarly, `PipelineData` has some special handling for `ListStream`s and `ByteStream`s. The latter was needed for this PR to work properly with external commands. Here's some examples. Before: ```nu 1..2 | drop nth 1 Error: nu::parser::input_type_mismatch × Command does not support range input. ╭─[entry #9:1:8] 1 │ 1..2 | drop nth 1 · ────┬─── · ╰── command doesn't support range input ╰──── echo 1..2 | drop nth 1 # => ╭───┬───╮ # => │ 0 │ 1 │ # => ╰───┴───╯ ``` After this PR, I've adjusted `drop nth`'s input/output types to accept range input. Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not: ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => ╭───┬───────────╮ # => │ 0 │ ╭───┬───╮ │ # => │ │ │ 0 │ 1 │ │ # => │ │ │ 1 │ 2 │ │ # => │ │ ╰───┴───╯ │ # => ╰───┴───────────╯ ``` After this PR, it works the same in both cases. For cases like this, if we do decide we want `zip` or other commands to accept any input value, then we should explicitly add that to the input types. ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => Error: nu::shell::only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #14:2:6] # => 2 │ echo 1 | zip [2] # => · ┬ ─┬─ # => · │ ╰── only list and range input data is supported # => · ╰── input type: int # => ╰──── ``` # User-Facing Changes **Breaking change**: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands. Example function and corresponding parse-time error (same before and after PR): ```nushell def foo []: int -> nothing { print $"my cool int is ($in)" } 1 | foo # => my cool int is 1 "evil string" | foo # => Error: nu::parser::input_type_mismatch # => # => × Command does not support string input. # => ╭─[entry #16:1:17] # => 1 │ "evil string" | foo # => · ─┬─ # => · ╰── command doesn't support string input # => ╰──── # => ``` Before: ```nu echo "evil string" | foo # => my cool int is evil string ``` After: ```nu echo "evil string" | foo # => Error: nu::shell::only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #17:1:6] # => 1 │ echo "evil string" | foo # => · ──────┬────── ─┬─ # => · │ ╰── only int input data is supported # => · ╰── input type: string # => ╰──── ``` Known affected internal commands which erroneously accepted any type: * `str join` * `zip` * `reduce` # Tests + Formatting - :green_circle: `toolkit fmt` - :green_circle: `toolkit clippy` - :green_circle: `toolkit test` - :green_circle: `toolkit test stdlib` # After Submitting * Play whack-a-mole with the commands and scripts this will inevitably break --- .../src/commands/history/history_import.rs | 1 + .../nu-cmd-extra/src/extra/filters/rotate.rs | 3 + crates/nu-cmd-lang/src/core_commands/do_.rs | 23 +-- crates/nu-cmd-lang/src/example_support.rs | 12 +- .../nu-command/src/conversions/into/string.rs | 1 + crates/nu-command/src/filters/drop/nth.rs | 9 +- crates/nu-command/src/filters/get.rs | 15 ++ crates/nu-command/src/filters/headers.rs | 9 +- crates/nu-command/src/filters/reject.rs | 9 + crates/nu-command/src/formats/from/csv.rs | 1 - crates/nu-command/src/formats/from/tsv.rs | 5 +- crates/nu-command/src/system/nu_check.rs | 1 + crates/nu-command/tests/commands/group_by.rs | 4 +- crates/nu-command/tests/commands/reduce.rs | 2 +- crates/nu-command/tests/commands/split_by.rs | 61 ++++++ .../tests/format_conversions/csv.rs | 2 +- crates/nu-engine/src/eval.rs | 27 +-- crates/nu-engine/src/eval_ir.rs | 83 +++++++- .../nu-protocol/src/pipeline/pipeline_data.rs | 40 ++++ crates/nu-protocol/src/ty.rs | 20 +- crates/nu-protocol/src/value/mod.rs | 190 ++++++++++++++++++ crates/nu-std/testing.nu | 2 +- crates/nu-std/tests/test_iter.nu | 4 +- .../src/dataframe/command/boolean/when.rs | 11 +- toolkit.nu | 12 +- 25 files changed, 455 insertions(+), 92 deletions(-) create mode 100644 crates/nu-command/tests/commands/split_by.rs diff --git a/crates/nu-cli/src/commands/history/history_import.rs b/crates/nu-cli/src/commands/history/history_import.rs index d56bc1da03..7666bd7eb7 100644 --- a/crates/nu-cli/src/commands/history/history_import.rs +++ b/crates/nu-cli/src/commands/history/history_import.rs @@ -35,6 +35,7 @@ Note that history item IDs are ignored when importing from file."# .category(Category::History) .input_output_types(vec![ (Type::Nothing, Type::Nothing), + (Type::String, Type::Nothing), (Type::List(Box::new(Type::String)), Type::Nothing), (Type::table(), Type::Nothing), ]) diff --git a/crates/nu-cmd-extra/src/extra/filters/rotate.rs b/crates/nu-cmd-extra/src/extra/filters/rotate.rs index 2ebe6478e4..4c25e82d37 100644 --- a/crates/nu-cmd-extra/src/extra/filters/rotate.rs +++ b/crates/nu-cmd-extra/src/extra/filters/rotate.rs @@ -13,6 +13,8 @@ impl Command for Rotate { .input_output_types(vec![ (Type::record(), Type::table()), (Type::table(), Type::table()), + (Type::list(Type::Any), Type::table()), + (Type::String, Type::table()), ]) .switch("ccw", "rotate counter clockwise", None) .rest( @@ -21,6 +23,7 @@ impl Command for Rotate { "the names to give columns once rotated", ) .category(Category::Filters) + .allow_variants_without_examples(true) } fn description(&self) -> &str { diff --git a/crates/nu-cmd-lang/src/core_commands/do_.rs b/crates/nu-cmd-lang/src/core_commands/do_.rs index 05d22c6cea..62ddea5324 100644 --- a/crates/nu-cmd-lang/src/core_commands/do_.rs +++ b/crates/nu-cmd-lang/src/core_commands/do_.rs @@ -294,22 +294,13 @@ fn bind_args_to( .expect("internal error: all custom parameters must have var_ids"); if let Some(result) = val_iter.next() { let param_type = param.shape.to_type(); - if required && !result.get_type().is_subtype(¶m_type) { - // need to check if result is an empty list, and param_type is table or list - // nushell needs to pass type checking for the case. - let empty_list_matches = result - .as_list() - .map(|l| l.is_empty() && matches!(param_type, Type::List(_) | Type::Table(_))) - .unwrap_or(false); - - if !empty_list_matches { - return Err(ShellError::CantConvert { - to_type: param.shape.to_type().to_string(), - from_type: result.get_type().to_string(), - span: result.span(), - help: None, - }); - } + if required && !result.is_subtype_of(¶m_type) { + return Err(ShellError::CantConvert { + to_type: param.shape.to_type().to_string(), + from_type: result.get_type().to_string(), + span: result.span(), + help: None, + }); } stack.add_var(var_id, result); } else if let Some(value) = ¶m.default_value { diff --git a/crates/nu-cmd-lang/src/example_support.rs b/crates/nu-cmd-lang/src/example_support.rs index c4d532058a..a26876b78c 100644 --- a/crates/nu-cmd-lang/src/example_support.rs +++ b/crates/nu-cmd-lang/src/example_support.rs @@ -19,18 +19,15 @@ pub fn check_example_input_and_output_types_match_command_signature( // Skip tests that don't have results to compare to if let Some(example_output) = example.result.as_ref() { - if let Some(example_input_type) = + if let Some(example_input) = eval_pipeline_without_terminal_expression(example.example, cwd, engine_state) { - let example_input_type = example_input_type.get_type(); - let example_output_type = example_output.get_type(); - let example_matches_signature = signature_input_output_types .iter() .any(|(sig_in_type, sig_out_type)| { - example_input_type.is_subtype(sig_in_type) - && example_output_type.is_subtype(sig_out_type) + example_input.is_subtype_of(sig_in_type) + && example_output.is_subtype_of(sig_out_type) && { witnessed_type_transformations .insert((sig_in_type.clone(), sig_out_type.clone())); @@ -38,6 +35,9 @@ pub fn check_example_input_and_output_types_match_command_signature( } }); + let example_input_type = example_input.get_type(); + let example_output_type = example_output.get_type(); + // The example type checks as a cell path operation if both: // 1. The command is declared to operate on cell paths. // 2. The example_input_type is list or record or table, and the example diff --git a/crates/nu-command/src/conversions/into/string.rs b/crates/nu-command/src/conversions/into/string.rs index b116de9752..f98f3aa5f8 100644 --- a/crates/nu-command/src/conversions/into/string.rs +++ b/crates/nu-command/src/conversions/into/string.rs @@ -38,6 +38,7 @@ impl Command for SubCommand { (Type::Filesize, Type::String), (Type::Date, Type::String), (Type::Duration, Type::String), + (Type::Range, Type::String), ( Type::List(Box::new(Type::Any)), Type::List(Box::new(Type::String)), diff --git a/crates/nu-command/src/filters/drop/nth.rs b/crates/nu-command/src/filters/drop/nth.rs index 282aa41c1d..02d97b70a3 100644 --- a/crates/nu-command/src/filters/drop/nth.rs +++ b/crates/nu-command/src/filters/drop/nth.rs @@ -13,10 +13,11 @@ impl Command for DropNth { fn signature(&self) -> Signature { Signature::build("drop nth") - .input_output_types(vec![( - Type::List(Box::new(Type::Any)), - Type::List(Box::new(Type::Any)), - )]) + .input_output_types(vec![ + (Type::Range, Type::list(Type::Number)), + (Type::list(Type::Any), Type::list(Type::Any)), + ]) + .allow_variants_without_examples(true) .required( "row number or row range", // FIXME: we can make this accept either Int or Range when we can compose SyntaxShapes diff --git a/crates/nu-command/src/filters/get.rs b/crates/nu-command/src/filters/get.rs index 6dfc66ba56..54f99386af 100644 --- a/crates/nu-command/src/filters/get.rs +++ b/crates/nu-command/src/filters/get.rs @@ -30,6 +30,7 @@ If multiple cell paths are given, this will produce a list of values."# ), (Type::table(), Type::Any), (Type::record(), Type::Any), + (Type::Nothing, Type::Nothing), ]) .required( "cell_path", @@ -163,6 +164,20 @@ fn action( } } + match input { + PipelineData::Empty => return Err(ShellError::PipelineEmpty { dst_span: span }), + // Allow chaining of get -i + PipelineData::Value(val @ Value::Nothing { .. }, ..) if !ignore_errors => { + return Err(ShellError::OnlySupportsThisInputType { + exp_input_type: "table or record".into(), + wrong_type: "nothing".into(), + dst_span: span, + src_span: val.span(), + }) + } + _ => (), + } + if rest.is_empty() { follow_cell_path_into_stream(input, signals, cell_path.members, span, !sensitive) } else { diff --git a/crates/nu-command/src/filters/headers.rs b/crates/nu-command/src/filters/headers.rs index 1a9de5367e..6798b3142a 100644 --- a/crates/nu-command/src/filters/headers.rs +++ b/crates/nu-command/src/filters/headers.rs @@ -11,14 +11,7 @@ impl Command for Headers { fn signature(&self) -> Signature { Signature::build(self.name()) - .input_output_types(vec![ - (Type::table(), Type::table()), - ( - // Tables with missing values are List - Type::List(Box::new(Type::Any)), - Type::table(), - ), - ]) + .input_output_types(vec![(Type::table(), Type::table())]) .category(Category::Filters) } diff --git a/crates/nu-command/src/filters/reject.rs b/crates/nu-command/src/filters/reject.rs index 058ad1d70b..e15c6b65e4 100644 --- a/crates/nu-command/src/filters/reject.rs +++ b/crates/nu-command/src/filters/reject.rs @@ -15,6 +15,7 @@ impl Command for Reject { .input_output_types(vec![ (Type::record(), Type::record()), (Type::table(), Type::table()), + (Type::list(Type::Any), Type::list(Type::Any)), ]) .switch( "ignore-errors", @@ -161,6 +162,14 @@ impl Command for Reject { Value::test_record(record! { "name" => Value::test_string("Cargo.lock") }), ])), }, + Example { + description: "Reject item in list", + example: "[1 2 3] | reject 1", + result: Some(Value::test_list(vec![ + Value::test_int(1), + Value::test_int(3), + ])), + }, ] } } diff --git a/crates/nu-command/src/formats/from/csv.rs b/crates/nu-command/src/formats/from/csv.rs index 74b9bcd33f..8ec9ac49d3 100644 --- a/crates/nu-command/src/formats/from/csv.rs +++ b/crates/nu-command/src/formats/from/csv.rs @@ -13,7 +13,6 @@ impl Command for FromCsv { Signature::build("from csv") .input_output_types(vec![ (Type::String, Type::table()), - (Type::String, Type::list(Type::Any)), ]) .named( "separator", diff --git a/crates/nu-command/src/formats/from/tsv.rs b/crates/nu-command/src/formats/from/tsv.rs index 2d77342307..39fae9c010 100644 --- a/crates/nu-command/src/formats/from/tsv.rs +++ b/crates/nu-command/src/formats/from/tsv.rs @@ -11,10 +11,7 @@ impl Command for FromTsv { fn signature(&self) -> Signature { Signature::build("from tsv") - .input_output_types(vec![ - (Type::String, Type::table()), - (Type::String, Type::list(Type::Any)), - ]) + .input_output_types(vec![(Type::String, Type::table())]) .named( "comment", SyntaxShape::String, diff --git a/crates/nu-command/src/system/nu_check.rs b/crates/nu-command/src/system/nu_check.rs index 945440cca4..ff3047a490 100644 --- a/crates/nu-command/src/system/nu_check.rs +++ b/crates/nu-command/src/system/nu_check.rs @@ -14,6 +14,7 @@ impl Command for NuCheck { fn signature(&self) -> Signature { Signature::build("nu-check") .input_output_types(vec![ + (Type::Nothing, Type::Bool), (Type::String, Type::Bool), (Type::List(Box::new(Type::Any)), Type::Bool), ]) diff --git a/crates/nu-command/tests/commands/group_by.rs b/crates/nu-command/tests/commands/group_by.rs index 232bbcba0f..a5b6328128 100644 --- a/crates/nu-command/tests/commands/group_by.rs +++ b/crates/nu-command/tests/commands/group_by.rs @@ -23,7 +23,7 @@ fn groups() { #[test] fn errors_if_given_unknown_column_name() { - let sample = r#"{ + let sample = r#"[{ "nu": { "committers": [ {"name": "Andrés N. Robalino"}, @@ -41,7 +41,7 @@ fn errors_if_given_unknown_column_name() { ["punto", "cero"] ] } -} +}] "#; let actual = nu!(pipeline(&format!( diff --git a/crates/nu-command/tests/commands/reduce.rs b/crates/nu-command/tests/commands/reduce.rs index cf58d093bb..4e3c6caff5 100644 --- a/crates/nu-command/tests/commands/reduce.rs +++ b/crates/nu-command/tests/commands/reduce.rs @@ -114,7 +114,7 @@ fn error_reduce_fold_type_mismatch() { fn error_reduce_empty() { let actual = nu!(pipeline("reduce { |it, acc| $acc + $it }")); - assert!(actual.err.contains("needs input")); + assert!(actual.err.contains("no input value was piped in")); } #[test] diff --git a/crates/nu-command/tests/commands/split_by.rs b/crates/nu-command/tests/commands/split_by.rs new file mode 100644 index 0000000000..a458a01a25 --- /dev/null +++ b/crates/nu-command/tests/commands/split_by.rs @@ -0,0 +1,61 @@ +use nu_test_support::fs::Stub::EmptyFile; +use nu_test_support::playground::Playground; +use nu_test_support::{nu, pipeline}; + +#[test] +fn splits() { + let sample = r#" + [[first_name, last_name, rusty_at, type]; + [Andrés, Robalino, "10/11/2013", A], + [JT, Turner, "10/12/2013", B], + [Yehuda, Katz, "10/11/2013", A]] + "#; + + let actual = nu!(pipeline(&format!( + r#" + {sample} + | group-by rusty_at + | split-by type + | get A."10/11/2013" + | length + "# + ))); + + assert_eq!(actual.out, "2"); +} + +#[test] +fn errors_if_no_input() { + Playground::setup("split_by_no_input", |dirs, _sandbox| { + let actual = nu!(cwd: dirs.test(), pipeline("split-by type")); + + assert!(actual.err.contains("no input value was piped in")); + }) +} + +#[test] +fn errors_if_non_record_input() { + Playground::setup("split_by_test_2", |dirs, sandbox| { + sandbox.with_files(&[ + EmptyFile("los.txt"), + EmptyFile("tres.txt"), + EmptyFile("amigos.txt"), + EmptyFile("arepas.clu"), + ]); + + let input_mismatch = nu!(cwd: dirs.test(), pipeline("5 | split-by type")); + + assert!(input_mismatch.err.contains("doesn't support int input")); + + let only_supports = nu!( + cwd: dirs.test(), pipeline( + " + ls + | get name + | split-by type + " + )); + + assert!(only_supports.err.contains("Input type not supported")); + }) +} diff --git a/crates/nu-command/tests/format_conversions/csv.rs b/crates/nu-command/tests/format_conversions/csv.rs index 4f50ab1492..c071b3566c 100644 --- a/crates/nu-command/tests/format_conversions/csv.rs +++ b/crates/nu-command/tests/format_conversions/csv.rs @@ -394,7 +394,7 @@ fn list_not_table_error() { "# )); - assert!(actual.err.contains("can't convert")) + assert!(actual.err.contains("Input type not supported")) } #[test] diff --git a/crates/nu-engine/src/eval.rs b/crates/nu-engine/src/eval.rs index 0e3917290f..3f47e4a42e 100644 --- a/crates/nu-engine/src/eval.rs +++ b/crates/nu-engine/src/eval.rs @@ -8,7 +8,7 @@ use nu_protocol::{ engine::{Closure, EngineState, Stack}, eval_base::Eval, BlockId, Config, DataSource, IntoPipelineData, PipelineData, PipelineMetadata, ShellError, - Span, Type, Value, VarId, ENV_VARIABLE_ID, + Span, Value, VarId, ENV_VARIABLE_ID, }; use nu_utils::IgnoreCaseExt; use std::sync::Arc; @@ -65,24 +65,13 @@ pub fn eval_call( if let Some(arg) = call.positional_nth(param_idx) { let result = eval_expression::(engine_state, caller_stack, arg)?; let param_type = param.shape.to_type(); - if required && !result.get_type().is_subtype(¶m_type) { - // need to check if result is an empty list, and param_type is table or list - // nushell needs to pass type checking for the case. - let empty_list_matches = result - .as_list() - .map(|l| { - l.is_empty() && matches!(param_type, Type::List(_) | Type::Table(_)) - }) - .unwrap_or(false); - - if !empty_list_matches { - return Err(ShellError::CantConvert { - to_type: param.shape.to_type().to_string(), - from_type: result.get_type().to_string(), - span: result.span(), - help: None, - }); - } + if required && !result.is_subtype_of(¶m_type) { + return Err(ShellError::CantConvert { + to_type: param.shape.to_type().to_string(), + from_type: result.get_type().to_string(), + span: result.span(), + help: None, + }); } callee_stack.add_var(var_id, result); } else if let Some(value) = ¶m.default_value { diff --git a/crates/nu-engine/src/eval_ir.rs b/crates/nu-engine/src/eval_ir.rs index 34e47732b1..0091963b53 100644 --- a/crates/nu-engine/src/eval_ir.rs +++ b/crates/nu-engine/src/eval_ir.rs @@ -1009,6 +1009,8 @@ fn eval_call( let args_len = caller_stack.arguments.get_len(*args_base); let decl = engine_state.get_decl(decl_id); + check_input_types(&input, decl.signature(), head)?; + // Set up redirect modes let mut caller_stack = caller_stack.push_redirection(redirect_out.take(), redirect_err.take()); @@ -1246,15 +1248,7 @@ fn gather_arguments( /// Type check helper. Produces `CantConvert` error if `val` is not compatible with `ty`. fn check_type(val: &Value, ty: &Type) -> Result<(), ShellError> { - if match val { - // An empty list is compatible with any list or table type - Value::List { vals, .. } if vals.is_empty() => { - matches!(ty, Type::Any | Type::List(_) | Type::Table(_)) - } - // FIXME: the allocation that might be required here is not great, it would be nice to be - // able to just directly check whether a value is compatible with a type - _ => val.get_type().is_subtype(ty), - } { + if val.is_subtype_of(ty) { Ok(()) } else { Err(ShellError::CantConvert { @@ -1266,6 +1260,77 @@ fn check_type(val: &Value, ty: &Type) -> Result<(), ShellError> { } } +/// Type check pipeline input against command's input types +fn check_input_types( + input: &PipelineData, + signature: Signature, + head: Span, +) -> Result<(), ShellError> { + let io_types = signature.input_output_types; + + // If a command doesn't have any input/output types, then treat command input type as any + if io_types.is_empty() { + return Ok(()); + } + + // If a command only has a nothing input type, then allow any input data + match io_types.first() { + Some((Type::Nothing, _)) if io_types.len() == 1 => { + return Ok(()); + } + _ => (), + } + + // Errors and custom values bypass input type checking + if matches!( + input, + PipelineData::Value(Value::Error { .. } | Value::Custom { .. }, ..) + ) { + return Ok(()); + } + + // Check if the input type is compatible with *any* of the command's possible input types + if io_types + .iter() + .any(|(command_type, _)| input.is_subtype_of(command_type)) + { + return Ok(()); + } + + let mut input_types = io_types + .iter() + .map(|(input, _)| input.to_string()) + .collect::>(); + + let expected_string = match input_types.len() { + 0 => { + return Err(ShellError::NushellFailed { + msg: "Command input type strings is empty, despite being non-zero earlier" + .to_string(), + }) + } + 1 => input_types.swap_remove(0), + 2 => input_types.join(" and "), + _ => { + input_types + .last_mut() + .expect("Vector with length >2 has no elements") + .insert_str(0, "and "); + input_types.join(", ") + } + }; + + match input { + PipelineData::Empty => Err(ShellError::PipelineEmpty { dst_span: head }), + _ => Err(ShellError::OnlySupportsThisInputType { + exp_input_type: expected_string, + wrong_type: input.get_type().to_string(), + dst_span: head, + src_span: input.span().unwrap_or(Span::unknown()), + }), + } +} + /// Get variable from [`Stack`] or [`EngineState`] fn get_var(ctx: &EvalContext<'_>, var_id: VarId, span: Span) -> Result { match var_id { diff --git a/crates/nu-protocol/src/pipeline/pipeline_data.rs b/crates/nu-protocol/src/pipeline/pipeline_data.rs index 8979b76ba9..b9d523b3da 100644 --- a/crates/nu-protocol/src/pipeline/pipeline_data.rs +++ b/crates/nu-protocol/src/pipeline/pipeline_data.rs @@ -122,6 +122,46 @@ impl PipelineData { } } + /// Determine if the `PipelineData` is a [subtype](https://en.wikipedia.org/wiki/Subtyping) of `other`. + /// + /// This check makes no effort to collect a stream, so it may be a different result + /// than would be returned by calling [`Value::is_subtype()`] on the result of + /// [`.into_value()`](Self::into_value). + /// + /// A `ListStream` acts the same as an empty list type: it is a subtype of any [`list`](Type::List) + /// or [`table`](Type::Table) type. After converting to a value, it may become a more specific type. + /// For example, a `ListStream` is a subtype of `list` and `list`. + /// If calling [`.into_value()`](Self::into_value) results in a `list`, + /// then the value would not be a subtype of `list`, in contrast to the original `ListStream`. + /// + /// A `ByteStream` is a subtype of [`string`](Type::String) if it is coercible into a string. + /// Likewise, a `ByteStream` is a subtype of [`binary`](Type::Binary) if it is coercible into a binary value. + pub fn is_subtype_of(&self, other: &Type) -> bool { + match (self, other) { + (_, Type::Any) => true, + (PipelineData::Empty, Type::Nothing) => true, + (PipelineData::Value(val, ..), ty) => val.is_subtype_of(ty), + + // a list stream could be a list with any type, including a table + (PipelineData::ListStream(..), Type::List(..) | Type::Table(..)) => true, + + (PipelineData::ByteStream(stream, ..), Type::String) + if stream.type_().is_string_coercible() => + { + true + } + (PipelineData::ByteStream(stream, ..), Type::Binary) + if stream.type_().is_binary_coercible() => + { + true + } + + (PipelineData::Empty, _) => false, + (PipelineData::ListStream(..), _) => false, + (PipelineData::ByteStream(..), _) => false, + } + } + pub fn into_value(self, span: Span) -> Result { match self { PipelineData::Empty => Ok(Value::nothing(span)), diff --git a/crates/nu-protocol/src/ty.rs b/crates/nu-protocol/src/ty.rs index d1b6f7140a..8f97c6f3fc 100644 --- a/crates/nu-protocol/src/ty.rs +++ b/crates/nu-protocol/src/ty.rs @@ -51,7 +51,11 @@ impl Type { Self::Custom(name.into()) } - pub fn is_subtype(&self, other: &Type) -> bool { + /// Determine of the [`Type`] is a [subtype](https://en.wikipedia.org/wiki/Subtyping) of `other`. + /// + /// This should only be used at parse-time. If you have a concrete [`Value`](crate::Value) or [`PipelineData`](crate::PipelineData), + /// you should use their respective [`is_subtype_of`] methods instead. + pub fn is_subtype_of(&self, other: &Type) -> bool { // Structural subtyping let is_subtype_collection = |this: &[(String, Type)], that: &[(String, Type)]| { if this.is_empty() || that.is_empty() { @@ -61,7 +65,7 @@ impl Type { } else { that.iter().all(|(col_y, ty_y)| { if let Some((_, ty_x)) = this.iter().find(|(col_x, _)| col_x == col_y) { - ty_x.is_subtype(ty_y) + ty_x.is_subtype_of(ty_y) } else { false } @@ -74,7 +78,7 @@ impl Type { (Type::Float, Type::Number) => true, (Type::Int, Type::Number) => true, (_, Type::Any) => true, - (Type::List(t), Type::List(u)) if t.is_subtype(u) => true, // List is covariant + (Type::List(t), Type::List(u)) if t.is_subtype_of(u) => true, // List is covariant (Type::Record(this), Type::Record(that)) | (Type::Table(this), Type::Table(that)) => { is_subtype_collection(this, that) } @@ -227,21 +231,21 @@ mod tests { #[test] fn test_reflexivity() { for ty in Type::iter() { - assert!(ty.is_subtype(&ty)); + assert!(ty.is_subtype_of(&ty)); } } #[test] fn test_any_is_top_type() { for ty in Type::iter() { - assert!(ty.is_subtype(&Type::Any)); + assert!(ty.is_subtype_of(&Type::Any)); } } #[test] fn test_number_supertype() { - assert!(Type::Int.is_subtype(&Type::Number)); - assert!(Type::Float.is_subtype(&Type::Number)); + assert!(Type::Int.is_subtype_of(&Type::Number)); + assert!(Type::Float.is_subtype_of(&Type::Number)); } #[test] @@ -250,7 +254,7 @@ mod tests { for ty2 in Type::iter() { let list_ty1 = Type::List(Box::new(ty1.clone())); let list_ty2 = Type::List(Box::new(ty2.clone())); - assert_eq!(list_ty1.is_subtype(&list_ty2), ty1.is_subtype(&ty2)); + assert_eq!(list_ty1.is_subtype_of(&list_ty2), ty1.is_subtype_of(&ty2)); } } } diff --git a/crates/nu-protocol/src/value/mod.rs b/crates/nu-protocol/src/value/mod.rs index b9c942d3b0..5ffe376269 100644 --- a/crates/nu-protocol/src/value/mod.rs +++ b/crates/nu-protocol/src/value/mod.rs @@ -813,6 +813,66 @@ impl Value { } } + /// Determine of the [`Value`] is a [subtype](https://en.wikipedia.org/wiki/Subtyping) of `other` + /// + /// If you have a [`Value`], this method should always be used over chaining [`Value::get_type`] with [`Type::is_subtype_of`](crate::Type::is_subtype_of). + /// + /// This method is able to leverage that information encoded in a `Value` to provide more accurate + /// type comparison than if one were to collect the type into [`Type`](crate::Type) value with [`Value::get_type`]. + /// + /// Empty lists are considered subtypes of all list types. + /// + /// Lists of mixed records where some column is present in all record is a subtype of `table`. + /// For example, `[{a: 1, b: 2}, {a: 1}]` is a subtype of `table` (but not `table`). + /// + /// See also: [`PipelineData::is_subtype_of`](crate::PipelineData::is_subtype_of) + pub fn is_subtype_of(&self, other: &Type) -> bool { + // records are structurally typed + let record_compatible = |val: &Value, other: &[(String, Type)]| match val { + Value::Record { val, .. } => other + .iter() + .all(|(key, ty)| val.get(key).is_some_and(|inner| inner.is_subtype_of(ty))), + _ => false, + }; + + // All cases matched explicitly to ensure this does not accidentally allocate `Type` if any composite types are introduced in the future + match (self, other) { + (_, Type::Any) => true, + + // `Type` allocation for scalar types is trivial + ( + Value::Bool { .. } + | Value::Int { .. } + | Value::Float { .. } + | Value::String { .. } + | Value::Glob { .. } + | Value::Filesize { .. } + | Value::Duration { .. } + | Value::Date { .. } + | Value::Range { .. } + | Value::Closure { .. } + | Value::Error { .. } + | Value::Binary { .. } + | Value::CellPath { .. } + | Value::Nothing { .. }, + _, + ) => self.get_type().is_subtype_of(other), + + // matching composite types + (val @ Value::Record { .. }, Type::Record(inner)) => record_compatible(val, inner), + (Value::List { vals, .. }, Type::List(inner)) => { + vals.iter().all(|val| val.is_subtype_of(inner)) + } + (Value::List { vals, .. }, Type::Table(inner)) => { + vals.iter().all(|val| record_compatible(val, inner)) + } + (Value::Custom { val, .. }, Type::Custom(inner)) => val.type_name() == **inner, + + // non-matching composite types + (Value::Record { .. } | Value::List { .. } | Value::Custom { .. }, _) => false, + } + } + pub fn get_data_by_key(&self, name: &str) -> Option { let span = self.span(); match self { @@ -3880,6 +3940,136 @@ mod tests { } } + mod is_subtype { + use crate::Type; + + use super::*; + + fn assert_subtype_equivalent(value: &Value, ty: &Type) { + assert_eq!(value.is_subtype_of(ty), value.get_type().is_subtype_of(ty)); + } + + #[test] + fn test_list() { + let ty_int_list = Type::list(Type::Int); + let ty_str_list = Type::list(Type::String); + let ty_any_list = Type::list(Type::Any); + let ty_list_list_int = Type::list(Type::list(Type::Int)); + + let list = Value::test_list(vec![ + Value::test_int(1), + Value::test_int(2), + Value::test_int(3), + ]); + + assert_subtype_equivalent(&list, &ty_int_list); + assert_subtype_equivalent(&list, &ty_str_list); + assert_subtype_equivalent(&list, &ty_any_list); + + let list = Value::test_list(vec![ + Value::test_int(1), + Value::test_string("hi"), + Value::test_int(3), + ]); + + assert_subtype_equivalent(&list, &ty_int_list); + assert_subtype_equivalent(&list, &ty_str_list); + assert_subtype_equivalent(&list, &ty_any_list); + + let list = Value::test_list(vec![Value::test_list(vec![Value::test_int(1)])]); + + assert_subtype_equivalent(&list, &ty_list_list_int); + + // The type of an empty lists is a subtype of any list or table type + let ty_table = Type::Table(Box::new([ + ("a".into(), Type::Int), + ("b".into(), Type::Int), + ("c".into(), Type::Int), + ])); + let empty = Value::test_list(vec![]); + + assert_subtype_equivalent(&empty, &ty_any_list); + assert!(empty.is_subtype_of(&ty_int_list)); + assert!(empty.is_subtype_of(&ty_table)); + } + + #[test] + fn test_record() { + let ty_abc = Type::Record(Box::new([ + ("a".into(), Type::Int), + ("b".into(), Type::Int), + ("c".into(), Type::Int), + ])); + let ty_ab = Type::Record(Box::new([("a".into(), Type::Int), ("b".into(), Type::Int)])); + let ty_inner = Type::Record(Box::new([("inner".into(), ty_abc.clone())])); + + let record_abc = Value::test_record(record! { + "a" => Value::test_int(1), + "b" => Value::test_int(2), + "c" => Value::test_int(3), + }); + let record_ab = Value::test_record(record! { + "a" => Value::test_int(1), + "b" => Value::test_int(2), + }); + + assert_subtype_equivalent(&record_abc, &ty_abc); + assert_subtype_equivalent(&record_abc, &ty_ab); + assert_subtype_equivalent(&record_ab, &ty_abc); + assert_subtype_equivalent(&record_ab, &ty_ab); + + let record_inner = Value::test_record(record! { + "inner" => record_abc + }); + assert_subtype_equivalent(&record_inner, &ty_inner); + } + + #[test] + fn test_table() { + let ty_abc = Type::Table(Box::new([ + ("a".into(), Type::Int), + ("b".into(), Type::Int), + ("c".into(), Type::Int), + ])); + let ty_ab = Type::Table(Box::new([("a".into(), Type::Int), ("b".into(), Type::Int)])); + let ty_list_any = Type::list(Type::Any); + + let record_abc = Value::test_record(record! { + "a" => Value::test_int(1), + "b" => Value::test_int(2), + "c" => Value::test_int(3), + }); + let record_ab = Value::test_record(record! { + "a" => Value::test_int(1), + "b" => Value::test_int(2), + }); + + let table_abc = Value::test_list(vec![record_abc.clone(), record_abc.clone()]); + let table_ab = Value::test_list(vec![record_ab.clone(), record_ab.clone()]); + + assert_subtype_equivalent(&table_abc, &ty_abc); + assert_subtype_equivalent(&table_abc, &ty_ab); + assert_subtype_equivalent(&table_ab, &ty_abc); + assert_subtype_equivalent(&table_ab, &ty_ab); + assert_subtype_equivalent(&table_abc, &ty_list_any); + + let table_mixed = Value::test_list(vec![record_abc.clone(), record_ab.clone()]); + assert_subtype_equivalent(&table_mixed, &ty_abc); + assert!(table_mixed.is_subtype_of(&ty_ab)); + + let ty_a = Type::Table(Box::new([("a".into(), Type::Any)])); + let table_mixed_types = Value::test_list(vec![ + Value::test_record(record! { + "a" => Value::test_int(1), + }), + Value::test_record(record! { + "a" => Value::test_string("a"), + }), + ]); + assert!(table_mixed_types.is_subtype_of(&ty_a)); + } + } + mod into_string { use chrono::{DateTime, FixedOffset}; diff --git a/crates/nu-std/testing.nu b/crates/nu-std/testing.nu index e409920569..71a634333a 100644 --- a/crates/nu-std/testing.nu +++ b/crates/nu-std/testing.nu @@ -28,7 +28,7 @@ def valid-annotations [] { # Returns a table containing the list of function names together with their annotations (comments above the declaration) def get-annotated [ file: path -]: path -> table { +]: nothing -> table { let raw_file = ( open $file | lines diff --git a/crates/nu-std/tests/test_iter.nu b/crates/nu-std/tests/test_iter.nu index 3ace342788..291ecffe90 100644 --- a/crates/nu-std/tests/test_iter.nu +++ b/crates/nu-std/tests/test_iter.nu @@ -37,7 +37,7 @@ def iter_intersperse [] { let res = (1..4 | iter intersperse 0) assert equal $res [1 0 2 0 3 0 4] - let res = (4 | iter intersperse 1) + let res = ([4] | iter intersperse 1) assert equal $res [4] } @@ -92,7 +92,7 @@ def iter_zip_with [] { assert equal $res [3 5 7] - let res = (42 | iter zip-with [1 2 3] {|a, b| $a // $b}) + let res = ([42] | iter zip-with [1 2 3] {|a, b| $a // $b}) assert equal $res [42] let res = (2..5 | iter zip-with 4 {|a, b| $a * $b}) diff --git a/crates/nu_plugin_polars/src/dataframe/command/boolean/when.rs b/crates/nu_plugin_polars/src/dataframe/command/boolean/when.rs index 886ee50f3b..9c6976aa54 100644 --- a/crates/nu_plugin_polars/src/dataframe/command/boolean/when.rs +++ b/crates/nu_plugin_polars/src/dataframe/command/boolean/when.rs @@ -35,10 +35,13 @@ impl PluginCommand for ExprWhen { SyntaxShape::Any, "expression that will be applied when predicate is true", ) - .input_output_type( - Type::Custom("expression".into()), - Type::Custom("expression".into()), - ) + .input_output_types(vec![ + (Type::Nothing, Type::Custom("expression".into())), + ( + Type::Custom("expression".into()), + Type::Custom("expression".into()), + ), + ]) .category(Category::Custom("expression".into())) } diff --git a/toolkit.nu b/toolkit.nu index 8b34e9a53b..c185f2ea7f 100644 --- a/toolkit.nu +++ b/toolkit.nu @@ -46,7 +46,7 @@ export def clippy [ ^cargo clippy --workspace --exclude nu_plugin_* - --features ($features | str join ",") + --features ($features | default [] | str join ",") -- -D warnings -D clippy::unwrap_used @@ -62,7 +62,7 @@ export def clippy [ --tests --workspace --exclude nu_plugin_* - --features ($features | str join ",") + --features ($features | default [] | str join ",") -- -D warnings ) @@ -96,13 +96,13 @@ export def test [ if $workspace { ^cargo nextest run --all } else { - ^cargo nextest run --features ($features | str join ",") + ^cargo nextest run --features ($features | default [] | str join ",") } } else { if $workspace { ^cargo test --workspace } else { - ^cargo test --features ($features | str join ",") + ^cargo test --features ($features | default [] | str join ",") } } } @@ -345,7 +345,7 @@ export def build [ ...features: string@"nu-complete list features" # a space-separated list of feature to install with Nushell --all # build all plugins with Nushell ] { - build-nushell ($features | str join ",") + build-nushell ($features | default [] | str join ",") if not $all { return @@ -384,7 +384,7 @@ export def install [ --all # install all plugins with Nushell ] { touch crates/nu-cmd-lang/build.rs # needed to make sure `version` has the correct `commit_hash` - ^cargo install --path . --features ($features | str join ",") --locked --force + ^cargo install --path . --features ($features | default [] | str join ",") --locked --force if not $all { return }