forked from extern/nushell
<!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the `echo` command, which has an output type of `any`, so prefixing a literal with `echo` will bypass parse-time type checking. Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-`any` type to become a parse-time error if a command is missing that type in its pipeline input/output (`drop nth` and `history import` do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (`get` does this on an empty pipeline input, and `sort` used to before #13154). After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for _all_ commands. **This will likely cause some breakage**, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that `toolkit.nu` was passing `null` values to `str join`, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of `get -i`: ```nushell let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion) ``` Before this PR, if the `get -i 0` ever actually did evaluate to `null`, the second `get` invocation would error since `get` doesn't operate on `null` values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to carapace. A notable exception to the type checking is commands with input type of `nothing -> <type>`. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, `123 | echo $in` would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain) I added a `is_subtype` method to `Value` and `PipelineData`. It functions slightly differently than `get_type().is_subtype()`, as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]` is a subtype of `table<a: int>`, whereas the type returned by `Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some special handling for `ListStream`s and `ByteStream`s. The latter was needed for this PR to work properly with external commands. Here's some examples. Before: ```nu 1..2 | drop nth 1 Error: nu::parser::input_type_mismatch × Command does not support range input. ╭─[entry #9:1:8] 1 │ 1..2 | drop nth 1 · ────┬─── · ╰── command doesn't support range input ╰──── echo 1..2 | drop nth 1 # => ╭───┬───╮ # => │ 0 │ 1 │ # => ╰───┴───╯ ``` After this PR, I've adjusted `drop nth`'s input/output types to accept range input. Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not: ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => ╭───┬───────────╮ # => │ 0 │ ╭───┬───╮ │ # => │ │ │ 0 │ 1 │ │ # => │ │ │ 1 │ 2 │ │ # => │ │ ╰───┴───╯ │ # => ╰───┴───────────╯ ``` After this PR, it works the same in both cases. For cases like this, if we do decide we want `zip` or other commands to accept any input value, then we should explicitly add that to the input types. ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #14:2:6] # => 2 │ echo 1 | zip [2] # => · ┬ ─┬─ # => · │ ╰── only list<any> and range input data is supported # => · ╰── input type: int # => ╰──── ``` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> **Breaking change**: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands. Example function and corresponding parse-time error (same before and after PR): ```nushell def foo []: int -> nothing { print $"my cool int is ($in)" } 1 | foo # => my cool int is 1 "evil string" | foo # => Error: nu::parser::input_type_mismatch # => # => × Command does not support string input. # => ╭─[entry #16:1:17] # => 1 │ "evil string" | foo # => · ─┬─ # => · ╰── command doesn't support string input # => ╰──── # => ``` Before: ```nu echo "evil string" | foo # => my cool int is evil string ``` After: ```nu echo "evil string" | foo # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #17:1:6] # => 1 │ echo "evil string" | foo # => · ──────┬────── ─┬─ # => · │ ╰── only int input data is supported # => · ╰── input type: string # => ╰──── ``` Known affected internal commands which erroneously accepted any type: * `str join` * `zip` * `reduce` # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> * Play whack-a-mole with the commands and scripts this will inevitably break
326 lines
11 KiB
Rust
326 lines
11 KiB
Rust
use nu_engine::{command_prelude::*, find_in_dirs_env, get_dirs_var_from_call};
|
|
use nu_parser::{parse, parse_module_block, parse_module_file_or_dir, unescape_unquote_string};
|
|
use nu_protocol::engine::{FileStack, StateWorkingSet};
|
|
use std::path::Path;
|
|
|
|
#[derive(Clone)]
|
|
pub struct NuCheck;
|
|
|
|
impl Command for NuCheck {
|
|
fn name(&self) -> &str {
|
|
"nu-check"
|
|
}
|
|
|
|
fn signature(&self) -> Signature {
|
|
Signature::build("nu-check")
|
|
.input_output_types(vec![
|
|
(Type::Nothing, Type::Bool),
|
|
(Type::String, Type::Bool),
|
|
(Type::List(Box::new(Type::Any)), Type::Bool),
|
|
])
|
|
// type is string to avoid automatically canonicalizing the path
|
|
.optional("path", SyntaxShape::String, "File path to parse.")
|
|
.switch("as-module", "Parse content as module", Some('m'))
|
|
.switch("debug", "Show error messages", Some('d'))
|
|
.category(Category::Strings)
|
|
}
|
|
|
|
fn description(&self) -> &str {
|
|
"Validate and parse input content."
|
|
}
|
|
|
|
fn search_terms(&self) -> Vec<&str> {
|
|
vec!["syntax", "parse", "debug"]
|
|
}
|
|
|
|
fn run(
|
|
&self,
|
|
engine_state: &EngineState,
|
|
stack: &mut Stack,
|
|
call: &Call,
|
|
input: PipelineData,
|
|
) -> Result<PipelineData, ShellError> {
|
|
let path_arg: Option<Spanned<String>> = call.opt(engine_state, stack, 0)?;
|
|
let as_module = call.has_flag(engine_state, stack, "as-module")?;
|
|
let is_debug = call.has_flag(engine_state, stack, "debug")?;
|
|
|
|
// DO NOT ever try to merge the working_set in this command
|
|
let mut working_set = StateWorkingSet::new(engine_state);
|
|
|
|
let input_span = input.span().unwrap_or(call.head);
|
|
|
|
match input {
|
|
PipelineData::Value(Value::String { val, .. }, ..) => {
|
|
let contents = Vec::from(val);
|
|
if as_module {
|
|
parse_module(&mut working_set, None, &contents, is_debug, input_span)
|
|
} else {
|
|
parse_script(&mut working_set, None, &contents, is_debug, input_span)
|
|
}
|
|
}
|
|
PipelineData::ListStream(stream, ..) => {
|
|
let config = stack.get_config(engine_state);
|
|
let list_stream = stream.into_string("\n", &config);
|
|
let contents = Vec::from(list_stream);
|
|
|
|
if as_module {
|
|
parse_module(&mut working_set, None, &contents, is_debug, call.head)
|
|
} else {
|
|
parse_script(&mut working_set, None, &contents, is_debug, call.head)
|
|
}
|
|
}
|
|
PipelineData::ByteStream(stream, ..) => {
|
|
let contents = stream.into_bytes()?;
|
|
|
|
if as_module {
|
|
parse_module(&mut working_set, None, &contents, is_debug, call.head)
|
|
} else {
|
|
parse_script(&mut working_set, None, &contents, is_debug, call.head)
|
|
}
|
|
}
|
|
_ => {
|
|
if let Some(path_str) = path_arg {
|
|
let path_span = path_str.span;
|
|
|
|
// look up the path as relative to FILE_PWD or inside NU_LIB_DIRS (same process as source-env)
|
|
let path = match find_in_dirs_env(
|
|
&path_str.item,
|
|
engine_state,
|
|
stack,
|
|
get_dirs_var_from_call(stack, call),
|
|
) {
|
|
Ok(path) => {
|
|
if let Some(path) = path {
|
|
path
|
|
} else {
|
|
return Err(ShellError::FileNotFound {
|
|
file: path_str.item,
|
|
span: path_span,
|
|
});
|
|
}
|
|
}
|
|
Err(error) => return Err(error),
|
|
};
|
|
|
|
let result = if as_module || path.is_dir() {
|
|
parse_file_or_dir_module(
|
|
path.to_string_lossy().as_bytes(),
|
|
&mut working_set,
|
|
is_debug,
|
|
path_span,
|
|
call.head,
|
|
)
|
|
} else {
|
|
// Unlike `parse_file_or_dir_module`, `parse_file_script` parses the content directly,
|
|
// without adding the file to the stack. Therefore we need to handle this manually.
|
|
working_set.files = FileStack::with_file(path.clone());
|
|
parse_file_script(&path, &mut working_set, is_debug, path_span, call.head)
|
|
// The working set is not merged, so no need to pop the file from the stack.
|
|
};
|
|
|
|
result
|
|
} else {
|
|
Err(ShellError::GenericError {
|
|
error: "Failed to execute command".into(),
|
|
msg: "Requires path argument if ran without pipeline input".into(),
|
|
span: Some(call.head),
|
|
help: Some("Please run 'nu-check --help' for more details".into()),
|
|
inner: vec![],
|
|
})
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
fn examples(&self) -> Vec<Example> {
|
|
vec![
|
|
Example {
|
|
description: "Parse a input file as script(Default)",
|
|
example: "nu-check script.nu",
|
|
result: None,
|
|
},
|
|
Example {
|
|
description: "Parse a input file as module",
|
|
example: "nu-check --as-module module.nu",
|
|
result: None,
|
|
},
|
|
Example {
|
|
description: "Parse a input file by showing error message",
|
|
example: "nu-check --debug script.nu",
|
|
result: None,
|
|
},
|
|
Example {
|
|
description: "Parse a byte stream as script by showing error message",
|
|
example: "open foo.nu | nu-check --debug script.nu",
|
|
result: None,
|
|
},
|
|
Example {
|
|
description: "Parse an internal stream as module by showing error message",
|
|
example: "open module.nu | lines | nu-check --debug --as-module module.nu",
|
|
result: None,
|
|
},
|
|
Example {
|
|
description: "Parse a string as script",
|
|
example: "$'two(char nl)lines' | nu-check ",
|
|
result: None,
|
|
},
|
|
Example {
|
|
description: "Heuristically parse which begins with script first, if it sees a failure, try module afterwards",
|
|
example: "nu-check -a script.nu",
|
|
result: None,
|
|
},
|
|
Example {
|
|
description: "Heuristically parse by showing error message",
|
|
example: "open foo.nu | lines | nu-check --all --debug",
|
|
result: None,
|
|
},
|
|
]
|
|
}
|
|
}
|
|
|
|
fn parse_module(
|
|
working_set: &mut StateWorkingSet,
|
|
filename: Option<String>,
|
|
contents: &[u8],
|
|
is_debug: bool,
|
|
call_head: Span,
|
|
) -> Result<PipelineData, ShellError> {
|
|
let filename = filename.unwrap_or_else(|| "empty".to_string());
|
|
|
|
let file_id = working_set.add_file(filename.clone(), contents);
|
|
let new_span = working_set.get_span_for_file(file_id);
|
|
|
|
let starting_error_count = working_set.parse_errors.len();
|
|
parse_module_block(working_set, new_span, filename.as_bytes());
|
|
|
|
check_parse(
|
|
starting_error_count,
|
|
working_set,
|
|
is_debug,
|
|
Some(
|
|
"If the content is intended to be a script, please try to remove `--as-module` flag "
|
|
.to_string(),
|
|
),
|
|
call_head,
|
|
)
|
|
}
|
|
|
|
fn parse_script(
|
|
working_set: &mut StateWorkingSet,
|
|
filename: Option<&str>,
|
|
contents: &[u8],
|
|
is_debug: bool,
|
|
call_head: Span,
|
|
) -> Result<PipelineData, ShellError> {
|
|
let starting_error_count = working_set.parse_errors.len();
|
|
parse(working_set, filename, contents, false);
|
|
check_parse(starting_error_count, working_set, is_debug, None, call_head)
|
|
}
|
|
|
|
fn check_parse(
|
|
starting_error_count: usize,
|
|
working_set: &StateWorkingSet,
|
|
is_debug: bool,
|
|
help: Option<String>,
|
|
call_head: Span,
|
|
) -> Result<PipelineData, ShellError> {
|
|
if starting_error_count != working_set.parse_errors.len() {
|
|
let msg = format!(
|
|
r#"Found : {}"#,
|
|
working_set
|
|
.parse_errors
|
|
.first()
|
|
.expect("Missing parser error")
|
|
);
|
|
|
|
if is_debug {
|
|
Err(ShellError::GenericError {
|
|
error: "Failed to parse content".into(),
|
|
msg,
|
|
span: Some(call_head),
|
|
help,
|
|
inner: vec![],
|
|
})
|
|
} else {
|
|
Ok(PipelineData::Value(Value::bool(false, call_head), None))
|
|
}
|
|
} else {
|
|
Ok(PipelineData::Value(Value::bool(true, call_head), None))
|
|
}
|
|
}
|
|
|
|
fn parse_file_script(
|
|
path: &Path,
|
|
working_set: &mut StateWorkingSet,
|
|
is_debug: bool,
|
|
path_span: Span,
|
|
call_head: Span,
|
|
) -> Result<PipelineData, ShellError> {
|
|
let filename = check_path(working_set, path_span, call_head)?;
|
|
|
|
if let Ok(contents) = std::fs::read(path) {
|
|
parse_script(working_set, Some(&filename), &contents, is_debug, call_head)
|
|
} else {
|
|
Err(ShellError::IOErrorSpanned {
|
|
msg: "Could not read path".to_string(),
|
|
span: path_span,
|
|
})
|
|
}
|
|
}
|
|
|
|
fn parse_file_or_dir_module(
|
|
path_bytes: &[u8],
|
|
working_set: &mut StateWorkingSet,
|
|
is_debug: bool,
|
|
path_span: Span,
|
|
call_head: Span,
|
|
) -> Result<PipelineData, ShellError> {
|
|
let _ = check_path(working_set, path_span, call_head)?;
|
|
|
|
let starting_error_count = working_set.parse_errors.len();
|
|
let _ = parse_module_file_or_dir(working_set, path_bytes, path_span, None);
|
|
|
|
if starting_error_count != working_set.parse_errors.len() {
|
|
if is_debug {
|
|
let msg = format!(
|
|
r#"Found : {}"#,
|
|
working_set
|
|
.parse_errors
|
|
.first()
|
|
.expect("Missing parser error")
|
|
);
|
|
Err(ShellError::GenericError {
|
|
error: "Failed to parse content".into(),
|
|
msg,
|
|
span: Some(path_span),
|
|
help: Some("If the content is intended to be a script, please try to remove `--as-module` flag ".into()),
|
|
inner: vec![],
|
|
})
|
|
} else {
|
|
Ok(PipelineData::Value(Value::bool(false, call_head), None))
|
|
}
|
|
} else {
|
|
Ok(PipelineData::Value(Value::bool(true, call_head), None))
|
|
}
|
|
}
|
|
|
|
fn check_path(
|
|
working_set: &mut StateWorkingSet,
|
|
path_span: Span,
|
|
call_head: Span,
|
|
) -> Result<String, ShellError> {
|
|
let bytes = working_set.get_span_contents(path_span);
|
|
let (filename, err) = unescape_unquote_string(bytes, path_span);
|
|
if let Some(e) = err {
|
|
Err(ShellError::GenericError {
|
|
error: "Could not escape filename".to_string(),
|
|
msg: "could not escape filename".to_string(),
|
|
span: Some(call_head),
|
|
help: Some(format!("Returned error: {e}")),
|
|
inner: vec![],
|
|
})
|
|
} else {
|
|
Ok(filename)
|
|
}
|
|
}
|