Files
nushell/crates/nu-command/src/system/nu_check.rs
132ikl 214714e0ab Add run-time type checking for command pipeline input (#14741)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->

This PR adds type checking of all command input types at run-time.
Generally, these errors should be caught by the parser, but sometimes we
can't know the type of a value at parse-time. The simplest example is
using the `echo` command, which has an output type of `any`, so
prefixing a literal with `echo` will bypass parse-time type checking.

Before this PR, each command has to individually check its input types.
This can result in scenarios where the input/output types don't match
the actual command behavior. This can cause valid usage with an
non-`any` type to become a parse-time error if a command is missing that
type in its pipeline input/output (`drop nth` and `history import` do
this before this PR). Alternatively, a command may not list a type in
its input/output types, but doesn't actually reject that type in its
code, which can have unintended side effects (`get` does this on an
empty pipeline input, and `sort` used to before #13154).

After this PR, the type of the pipeline input is checked to ensure it
matches one of the input types listed in the proceeding command's
input/output types. While each of the issues in the "before this PR"
section could be addressed with each command individually, this PR
solves this issue for _all_ commands.

**This will likely cause some breakage**, as some commands have
incorrect input/output types, and should be adjusted. Also, some scripts
may have erroneous usage of commands. In writing this PR, I discovered
that `toolkit.nu` was passing `null` values to `str join`, which doesn't
accept nothing types (if folks think it should, we can adjust it in this
PR or in a different PR). I found some issues in the standard library
and its tests. I also found that carapace's vendor script had an
incorrect chaining of `get -i`:

```nushell
let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)
```

Before this PR, if the `get -i 0` ever actually did evaluate to `null`,
the second `get` invocation would error since `get` doesn't operate on
`null` values. After this PR, this is immediately a run-time error,
alerting the user to the problematic code. As a side note, we'll need to
PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to
carapace.

A notable exception to the type checking is commands with input type of
`nothing -> <type>`. In this case, any input type is allowed. This
allows piping values into the command without an error being thrown. For
example, `123 | echo $in` would be an error without this exception.
Additionally, custom types bypass type checking (I believe this also
happens during parsing, but not certain)

I added a `is_subtype` method to `Value` and `PipelineData`. It
functions slightly differently than `get_type().is_subtype()`, as noted
in the doccomments. Notably, it respects structural typing of lists and
tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]`
is a subtype of `table<a: int>`, whereas the type returned by
`Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some
special handling for `ListStream`s and `ByteStream`s. The latter was
needed for this PR to work properly with external commands.

Here's some examples.

Before:
```nu
1..2 | drop nth 1
Error: nu::parser::input_type_mismatch

  × Command does not support range input.
   ╭─[entry #9:1:8]
 1 │ 1..2 | drop nth 1
   ·        ────┬───
   ·            ╰── command doesn't support range input
   ╰────

echo 1..2 | drop nth 1
# => ╭───┬───╮
# => │ 0 │ 1 │
# => ╰───┴───╯
```

After this PR, I've adjusted `drop nth`'s input/output types to accept
range input.

Before this PR, zip accepted any value despite not being listed in its
input/output types. This caused different behavior depending on if you
triggered a parse error or not:
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => ╭───┬───────────╮
# => │ 0 │ ╭───┬───╮ │
# => │   │ │ 0 │ 1 │ │
# => │   │ │ 1 │ 2 │ │
# => │   │ ╰───┴───╯ │
# => ╰───┴───────────╯
```

After this PR, it works the same in both cases. For cases like this, if
we do decide we want `zip` or other commands to accept any input value,
then we should explicitly add that to the input types.
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => Error: nu:🐚:only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #14:2:6]
# =>  2 │ echo 1 | zip [2]
# =>    ·      ┬   ─┬─
# =>    ·      │    ╰── only list<any> and range input data is supported
# =>    ·      ╰── input type: int
# =>    ╰────
```

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

**Breaking change**: The type of a command's input is now checked
against the input/output types of that command at run-time. While these
errors should mostly be caught at parse-time, in cases where they can't
be detected at parse-time they will be caught at run-time instead. This
applies to both internal commands and custom commands.

Example function and corresponding parse-time error (same before and
after PR):
```nushell
def foo []: int -> nothing {
  print $"my cool int is ($in)"
}

1 | foo
# => my cool int is 1

"evil string" | foo
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support string input.
# =>    ╭─[entry #16:1:17]
# =>  1 │ "evil string" | foo
# =>    ·                 ─┬─
# =>    ·                  ╰── command doesn't support string input
# =>    ╰────
# => 
```

Before:
```nu
echo "evil string" | foo
# => my cool int is evil string
```

After:
```nu
echo "evil string" | foo
# => Error: nu:🐚:only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #17:1:6]
# =>  1 │ echo "evil string" | foo
# =>    ·      ──────┬──────   ─┬─
# =>    ·            │          ╰── only int input data is supported
# =>    ·            ╰── input type: string
# =>    ╰────
```

Known affected internal commands which erroneously accepted any type:
* `str join`
* `zip`
* `reduce`

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`


# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
* Play whack-a-mole with the commands and scripts this will inevitably
break
2025-01-08 23:09:47 +01:00

326 lines
11 KiB
Rust

use nu_engine::{command_prelude::*, find_in_dirs_env, get_dirs_var_from_call};
use nu_parser::{parse, parse_module_block, parse_module_file_or_dir, unescape_unquote_string};
use nu_protocol::engine::{FileStack, StateWorkingSet};
use std::path::Path;
#[derive(Clone)]
pub struct NuCheck;
impl Command for NuCheck {
fn name(&self) -> &str {
"nu-check"
}
fn signature(&self) -> Signature {
Signature::build("nu-check")
.input_output_types(vec![
(Type::Nothing, Type::Bool),
(Type::String, Type::Bool),
(Type::List(Box::new(Type::Any)), Type::Bool),
])
// type is string to avoid automatically canonicalizing the path
.optional("path", SyntaxShape::String, "File path to parse.")
.switch("as-module", "Parse content as module", Some('m'))
.switch("debug", "Show error messages", Some('d'))
.category(Category::Strings)
}
fn description(&self) -> &str {
"Validate and parse input content."
}
fn search_terms(&self) -> Vec<&str> {
vec!["syntax", "parse", "debug"]
}
fn run(
&self,
engine_state: &EngineState,
stack: &mut Stack,
call: &Call,
input: PipelineData,
) -> Result<PipelineData, ShellError> {
let path_arg: Option<Spanned<String>> = call.opt(engine_state, stack, 0)?;
let as_module = call.has_flag(engine_state, stack, "as-module")?;
let is_debug = call.has_flag(engine_state, stack, "debug")?;
// DO NOT ever try to merge the working_set in this command
let mut working_set = StateWorkingSet::new(engine_state);
let input_span = input.span().unwrap_or(call.head);
match input {
PipelineData::Value(Value::String { val, .. }, ..) => {
let contents = Vec::from(val);
if as_module {
parse_module(&mut working_set, None, &contents, is_debug, input_span)
} else {
parse_script(&mut working_set, None, &contents, is_debug, input_span)
}
}
PipelineData::ListStream(stream, ..) => {
let config = stack.get_config(engine_state);
let list_stream = stream.into_string("\n", &config);
let contents = Vec::from(list_stream);
if as_module {
parse_module(&mut working_set, None, &contents, is_debug, call.head)
} else {
parse_script(&mut working_set, None, &contents, is_debug, call.head)
}
}
PipelineData::ByteStream(stream, ..) => {
let contents = stream.into_bytes()?;
if as_module {
parse_module(&mut working_set, None, &contents, is_debug, call.head)
} else {
parse_script(&mut working_set, None, &contents, is_debug, call.head)
}
}
_ => {
if let Some(path_str) = path_arg {
let path_span = path_str.span;
// look up the path as relative to FILE_PWD or inside NU_LIB_DIRS (same process as source-env)
let path = match find_in_dirs_env(
&path_str.item,
engine_state,
stack,
get_dirs_var_from_call(stack, call),
) {
Ok(path) => {
if let Some(path) = path {
path
} else {
return Err(ShellError::FileNotFound {
file: path_str.item,
span: path_span,
});
}
}
Err(error) => return Err(error),
};
let result = if as_module || path.is_dir() {
parse_file_or_dir_module(
path.to_string_lossy().as_bytes(),
&mut working_set,
is_debug,
path_span,
call.head,
)
} else {
// Unlike `parse_file_or_dir_module`, `parse_file_script` parses the content directly,
// without adding the file to the stack. Therefore we need to handle this manually.
working_set.files = FileStack::with_file(path.clone());
parse_file_script(&path, &mut working_set, is_debug, path_span, call.head)
// The working set is not merged, so no need to pop the file from the stack.
};
result
} else {
Err(ShellError::GenericError {
error: "Failed to execute command".into(),
msg: "Requires path argument if ran without pipeline input".into(),
span: Some(call.head),
help: Some("Please run 'nu-check --help' for more details".into()),
inner: vec![],
})
}
}
}
}
fn examples(&self) -> Vec<Example> {
vec![
Example {
description: "Parse a input file as script(Default)",
example: "nu-check script.nu",
result: None,
},
Example {
description: "Parse a input file as module",
example: "nu-check --as-module module.nu",
result: None,
},
Example {
description: "Parse a input file by showing error message",
example: "nu-check --debug script.nu",
result: None,
},
Example {
description: "Parse a byte stream as script by showing error message",
example: "open foo.nu | nu-check --debug script.nu",
result: None,
},
Example {
description: "Parse an internal stream as module by showing error message",
example: "open module.nu | lines | nu-check --debug --as-module module.nu",
result: None,
},
Example {
description: "Parse a string as script",
example: "$'two(char nl)lines' | nu-check ",
result: None,
},
Example {
description: "Heuristically parse which begins with script first, if it sees a failure, try module afterwards",
example: "nu-check -a script.nu",
result: None,
},
Example {
description: "Heuristically parse by showing error message",
example: "open foo.nu | lines | nu-check --all --debug",
result: None,
},
]
}
}
fn parse_module(
working_set: &mut StateWorkingSet,
filename: Option<String>,
contents: &[u8],
is_debug: bool,
call_head: Span,
) -> Result<PipelineData, ShellError> {
let filename = filename.unwrap_or_else(|| "empty".to_string());
let file_id = working_set.add_file(filename.clone(), contents);
let new_span = working_set.get_span_for_file(file_id);
let starting_error_count = working_set.parse_errors.len();
parse_module_block(working_set, new_span, filename.as_bytes());
check_parse(
starting_error_count,
working_set,
is_debug,
Some(
"If the content is intended to be a script, please try to remove `--as-module` flag "
.to_string(),
),
call_head,
)
}
fn parse_script(
working_set: &mut StateWorkingSet,
filename: Option<&str>,
contents: &[u8],
is_debug: bool,
call_head: Span,
) -> Result<PipelineData, ShellError> {
let starting_error_count = working_set.parse_errors.len();
parse(working_set, filename, contents, false);
check_parse(starting_error_count, working_set, is_debug, None, call_head)
}
fn check_parse(
starting_error_count: usize,
working_set: &StateWorkingSet,
is_debug: bool,
help: Option<String>,
call_head: Span,
) -> Result<PipelineData, ShellError> {
if starting_error_count != working_set.parse_errors.len() {
let msg = format!(
r#"Found : {}"#,
working_set
.parse_errors
.first()
.expect("Missing parser error")
);
if is_debug {
Err(ShellError::GenericError {
error: "Failed to parse content".into(),
msg,
span: Some(call_head),
help,
inner: vec![],
})
} else {
Ok(PipelineData::Value(Value::bool(false, call_head), None))
}
} else {
Ok(PipelineData::Value(Value::bool(true, call_head), None))
}
}
fn parse_file_script(
path: &Path,
working_set: &mut StateWorkingSet,
is_debug: bool,
path_span: Span,
call_head: Span,
) -> Result<PipelineData, ShellError> {
let filename = check_path(working_set, path_span, call_head)?;
if let Ok(contents) = std::fs::read(path) {
parse_script(working_set, Some(&filename), &contents, is_debug, call_head)
} else {
Err(ShellError::IOErrorSpanned {
msg: "Could not read path".to_string(),
span: path_span,
})
}
}
fn parse_file_or_dir_module(
path_bytes: &[u8],
working_set: &mut StateWorkingSet,
is_debug: bool,
path_span: Span,
call_head: Span,
) -> Result<PipelineData, ShellError> {
let _ = check_path(working_set, path_span, call_head)?;
let starting_error_count = working_set.parse_errors.len();
let _ = parse_module_file_or_dir(working_set, path_bytes, path_span, None);
if starting_error_count != working_set.parse_errors.len() {
if is_debug {
let msg = format!(
r#"Found : {}"#,
working_set
.parse_errors
.first()
.expect("Missing parser error")
);
Err(ShellError::GenericError {
error: "Failed to parse content".into(),
msg,
span: Some(path_span),
help: Some("If the content is intended to be a script, please try to remove `--as-module` flag ".into()),
inner: vec![],
})
} else {
Ok(PipelineData::Value(Value::bool(false, call_head), None))
}
} else {
Ok(PipelineData::Value(Value::bool(true, call_head), None))
}
}
fn check_path(
working_set: &mut StateWorkingSet,
path_span: Span,
call_head: Span,
) -> Result<String, ShellError> {
let bytes = working_set.get_span_contents(path_span);
let (filename, err) = unescape_unquote_string(bytes, path_span);
if let Some(e) = err {
Err(ShellError::GenericError {
error: "Could not escape filename".to_string(),
msg: "could not escape filename".to_string(),
span: Some(call_head),
help: Some(format!("Returned error: {e}")),
inner: vec![],
})
} else {
Ok(filename)
}
}