Add run-time type checking for command pipeline input (#14741)

# Description  This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the `echo` command, which has an output type of `any`, so prefixing a literal with `echo` will bypass parse-time type checking. Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-`any` type to become a parse-time error if a command is missing that type in its pipeline input/output (`drop nth` and `history import` do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (`get` does this on an empty pipeline input, and `sort` used to before #13154). After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for _all_ commands. **This will likely cause some breakage**, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that `toolkit.nu` was passing `null` values to `str join`, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of `get -i`: ```nushell let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion) ``` Before this PR, if the `get -i 0` ever actually did evaluate to `null`, the second `get` invocation would error since `get` doesn't operate on `null` values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to carapace. A notable exception to the type checking is commands with input type of `nothing -> <type>`. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, `123 | echo $in` would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain) I added a `is_subtype` method to `Value` and `PipelineData`. It functions slightly differently than `get_type().is_subtype()`, as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]` is a subtype of `table<a: int>`, whereas the type returned by `Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some special handling for `ListStream`s and `ByteStream`s. The latter was needed for this PR to work properly with external commands. Here's some examples. Before: ```nu 1..2 | drop nth 1 Error: nu::parser::input_type_mismatch × Command does not support range input. ╭─[entry #9:1:8] 1 │ 1..2 | drop nth 1 · ────┬─── · ╰── command doesn't support range input ╰──── echo 1..2 | drop nth 1 # => ╭───┬───╮ # => │ 0 │ 1 │ # => ╰───┴───╯ ``` After this PR, I've adjusted `drop nth`'s input/output types to accept range input. Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not: ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => ╭───┬───────────╮ # => │ 0 │ ╭───┬───╮ │ # => │ │ │ 0 │ 1 │ │ # => │ │ │ 1 │ 2 │ │ # => │ │ ╰───┴───╯ │ # => ╰───┴───────────╯ ``` After this PR, it works the same in both cases. For cases like this, if we do decide we want `zip` or other commands to accept any input value, then we should explicitly add that to the input types. ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #14:2:6] # => 2 │ echo 1 | zip [2] # => · ┬ ─┬─ # => · │ ╰── only list<any> and range input data is supported # => · ╰── input type: int # => ╰──── ``` # User-Facing Changes  **Breaking change**: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands. Example function and corresponding parse-time error (same before and after PR): ```nushell def foo []: int -> nothing { print $"my cool int is ($in)" } 1 | foo # => my cool int is 1 "evil string" | foo # => Error: nu::parser::input_type_mismatch # => # => × Command does not support string input. # => ╭─[entry #16:1:17] # => 1 │ "evil string" | foo # => · ─┬─ # => · ╰── command doesn't support string input # => ╰──── # => ``` Before: ```nu echo "evil string" | foo # => my cool int is evil string ``` After: ```nu echo "evil string" | foo # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #17:1:6] # => 1 │ echo "evil string" | foo # => · ──────┬────── ─┬─ # => · │ ╰── only int input data is supported # => · ╰── input type: string # => ╰──── ``` Known affected internal commands which erroneously accepted any type: * `str join` * `zip` * `reduce` # Tests + Formatting  - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting  * Play whack-a-mole with the commands and scripts this will inevitably break
2025-08-09 17:25:15 +02:00 · 2025-01-08 17:09:47 -05:00
parent d894c8befe
commit 214714e0ab
25 changed files with 455 additions and 92 deletions
--- a/crates/nu-engine/src/eval.rs
+++ b/crates/nu-engine/src/eval.rs
@ -8,7 +8,7 @@ use nu_protocol::{
    engine::{Closure, EngineState, Stack},
    eval_base::Eval,
    BlockId, Config, DataSource, IntoPipelineData, PipelineData, PipelineMetadata, ShellError,
-    Span, Type, Value, VarId, ENV_VARIABLE_ID,
+    Span, Value, VarId, ENV_VARIABLE_ID,
 };
 use nu_utils::IgnoreCaseExt;
 use std::sync::Arc;
@ -65,24 +65,13 @@ pub fn eval_call<D: DebugContext>(
            if let Some(arg) = call.positional_nth(param_idx) {
                let result = eval_expression::<D>(engine_state, caller_stack, arg)?;
                let param_type = param.shape.to_type();
-                if required && !result.get_type().is_subtype(&param_type) {
-                    // need to check if result is an empty list, and param_type is table or list
-                    // nushell needs to pass type checking for the case.
-                    let empty_list_matches = result
-                        .as_list()
-                        .map(|l| {
-                            l.is_empty() && matches!(param_type, Type::List(_) | Type::Table(_))
-                        })
-                        .unwrap_or(false);
-
-                    if !empty_list_matches {
-                        return Err(ShellError::CantConvert {
-                            to_type: param.shape.to_type().to_string(),
-                            from_type: result.get_type().to_string(),
-                            span: result.span(),
-                            help: None,
-                        });
-                    }
+                if required && !result.is_subtype_of(&param_type) {
+                    return Err(ShellError::CantConvert {
+                        to_type: param.shape.to_type().to_string(),
+                        from_type: result.get_type().to_string(),
+                        span: result.span(),
+                        help: None,
+                    });
                }
                callee_stack.add_var(var_id, result);
            } else if let Some(value) = &param.default_value {
--- a/crates/nu-engine/src/eval_ir.rs
+++ b/crates/nu-engine/src/eval_ir.rs
@ -1009,6 +1009,8 @@ fn eval_call<D: DebugContext>(
    let args_len = caller_stack.arguments.get_len(*args_base);
    let decl = engine_state.get_decl(decl_id);

+    check_input_types(&input, decl.signature(), head)?;
+
    // Set up redirect modes
    let mut caller_stack = caller_stack.push_redirection(redirect_out.take(), redirect_err.take());

@ -1246,15 +1248,7 @@ fn gather_arguments(

 /// Type check helper. Produces `CantConvert` error if `val` is not compatible with `ty`.
 fn check_type(val: &Value, ty: &Type) -> Result<(), ShellError> {
-    if match val {
-        // An empty list is compatible with any list or table type
-        Value::List { vals, .. } if vals.is_empty() => {
-            matches!(ty, Type::Any | Type::List(_) | Type::Table(_))
-        }
-        // FIXME: the allocation that might be required here is not great, it would be nice to be
-        // able to just directly check whether a value is compatible with a type
-        _ => val.get_type().is_subtype(ty),
-    } {
+    if val.is_subtype_of(ty) {
        Ok(())
    } else {
        Err(ShellError::CantConvert {
@ -1266,6 +1260,77 @@ fn check_type(val: &Value, ty: &Type) -> Result<(), ShellError> {
    }
 }

+/// Type check pipeline input against command's input types
+fn check_input_types(
+    input: &PipelineData,
+    signature: Signature,
+    head: Span,
+) -> Result<(), ShellError> {
+    let io_types = signature.input_output_types;
+
+    // If a command doesn't have any input/output types, then treat command input type as any
+    if io_types.is_empty() {
+        return Ok(());
+    }
+
+    // If a command only has a nothing input type, then allow any input data
+    match io_types.first() {
+        Some((Type::Nothing, _)) if io_types.len() == 1 => {
+            return Ok(());
+        }
+        _ => (),
+    }
+
+    // Errors and custom values bypass input type checking
+    if matches!(
+        input,
+        PipelineData::Value(Value::Error { .. } | Value::Custom { .. }, ..)
+    ) {
+        return Ok(());
+    }
+
+    // Check if the input type is compatible with *any* of the command's possible input types
+    if io_types
+        .iter()
+        .any(|(command_type, _)| input.is_subtype_of(command_type))
+    {
+        return Ok(());
+    }
+
+    let mut input_types = io_types
+        .iter()
+        .map(|(input, _)| input.to_string())
+        .collect::<Vec<String>>();
+
+    let expected_string = match input_types.len() {
+        0 => {
+            return Err(ShellError::NushellFailed {
+                msg: "Command input type strings is empty, despite being non-zero earlier"
+                    .to_string(),
+            })
+        }
+        1 => input_types.swap_remove(0),
+        2 => input_types.join(" and "),
+        _ => {
+            input_types
+                .last_mut()
+                .expect("Vector with length >2 has no elements")
+                .insert_str(0, "and ");
+            input_types.join(", ")
+        }
+    };
+
+    match input {
+        PipelineData::Empty => Err(ShellError::PipelineEmpty { dst_span: head }),
+        _ => Err(ShellError::OnlySupportsThisInputType {
+            exp_input_type: expected_string,
+            wrong_type: input.get_type().to_string(),
+            dst_span: head,
+            src_span: input.span().unwrap_or(Span::unknown()),
+        }),
+    }
+}
+
 /// Get variable from [`Stack`] or [`EngineState`]
 fn get_var(ctx: &EvalContext<'_>, var_id: VarId, span: Span) -> Result<Value, ShellError> {
    match var_id {