nushell/crates/nu-parser/tests/test_parser_unicode_escapes.rs

84 lines
2.9 KiB
Rust
Raw Normal View History

#![cfg(test)]
use nu_parser::*;
use nu_protocol::{
IO and redirection overhaul (#11934) # Description The PR overhauls how IO redirection is handled, allowing more explicit and fine-grain control over `stdout` and `stderr` output as well as more efficient IO and piping. To summarize the changes in this PR: - Added a new `IoStream` type to indicate the intended destination for a pipeline element's `stdout` and `stderr`. - The `stdout` and `stderr` `IoStream`s are stored in the `Stack` and to avoid adding 6 additional arguments to every eval function and `Command::run`. The `stdout` and `stderr` streams can be temporarily overwritten through functions on `Stack` and these functions will return a guard that restores the original `stdout` and `stderr` when dropped. - In the AST, redirections are now directly part of a `PipelineElement` as a `Option<Redirection>` field instead of having multiple different `PipelineElement` enum variants for each kind of redirection. This required changes to the parser, mainly in `lite_parser.rs`. - `Command`s can also set a `IoStream` override/redirection which will apply to the previous command in the pipeline. This is used, for example, in `ignore` to allow the previous external command to have its stdout redirected to `Stdio::null()` at spawn time. In contrast, the current implementation has to create an os pipe and manually consume the output on nushell's side. File and pipe redirections (`o>`, `e>`, `e>|`, etc.) have precedence over overrides from commands. This PR improves piping and IO speed, partially addressing #10763. Using the `throughput` command from that issue, this PR gives the following speedup on my setup for the commands below: | Command | Before (MB/s) | After (MB/s) | Bash (MB/s) | | --------------------------- | -------------:| ------------:| -----------:| | `throughput o> /dev/null` | 1169 | 52938 | 54305 | | `throughput \| ignore` | 840 | 55438 | N/A | | `throughput \| null` | Error | 53617 | N/A | | `throughput \| rg 'x'` | 1165 | 3049 | 3736 | | `(throughput) \| rg 'x'` | 810 | 3085 | 3815 | (Numbers above are the median samples for throughput) This PR also paves the way to refactor our `ExternalStream` handling in the various commands. For example, this PR already fixes the following code: ```nushell ^sh -c 'echo -n "hello "; sleep 0; echo "world"' | find "hello world" ``` This returns an empty list on 0.90.1 and returns a highlighted "hello world" on this PR. Since the `stdout` and `stderr` `IoStream`s are available to commands when they are run, then this unlocks the potential for more convenient behavior. E.g., the `find` command can disable its ansi highlighting if it detects that the output `IoStream` is not the terminal. Knowing the output streams will also allow background job output to be redirected more easily and efficiently. # User-Facing Changes - External commands returned from closures will be collected (in most cases): ```nushell 1..2 | each {|_| nu -c "print a" } ``` This gives `["a", "a"]` on this PR, whereas this used to print "a\na\n" and then return an empty list. ```nushell 1..2 | each {|_| nu -c "print -e a" } ``` This gives `["", ""]` and prints "a\na\n" to stderr, whereas this used to return an empty list and print "a\na\n" to stderr. - Trailing new lines are always trimmed for external commands when piping into internal commands or collecting it as a value. (Failure to decode the output as utf-8 will keep the trailing newline for the last binary value.) In the current nushell version, the following three code snippets differ only in parenthesis placement, but they all also have different outputs: 1. `1..2 | each { ^echo a }` ``` a a ╭────────────╮ │ empty list │ ╰────────────╯ ``` 2. `1..2 | each { (^echo a) }` ``` ╭───┬───╮ │ 0 │ a │ │ 1 │ a │ ╰───┴───╯ ``` 3. `1..2 | (each { ^echo a })` ``` ╭───┬───╮ │ 0 │ a │ │ │ │ │ 1 │ a │ │ │ │ ╰───┴───╯ ``` But in this PR, the above snippets will all have the same output: ``` ╭───┬───╮ │ 0 │ a │ │ 1 │ a │ ╰───┴───╯ ``` - All existing flags on `run-external` are now deprecated. - File redirections now apply to all commands inside a code block: ```nushell (nu -c "print -e a"; nu -c "print -e b") e> test.out ``` This gives "a\nb\n" in `test.out` and prints nothing. The same result would happen when printing to stdout and using a `o>` file redirection. - External command output will (almost) never be ignored, and ignoring output must be explicit now: ```nushell (^echo a; ^echo b) ``` This prints "a\nb\n", whereas this used to print only "b\n". This only applies to external commands; values and internal commands not in return position will not print anything (e.g., `(echo a; echo b)` still only prints "b"). - `complete` now always captures stderr (`do` is not necessary). # After Submitting The language guide and other documentation will need to be updated.
2024-03-14 21:51:55 +01:00
ast::Expr,
engine::{EngineState, StateWorkingSet},
};
pub fn do_test(test: &[u8], expected: &str, error_contains: Option<&str>) {
let engine_state = EngineState::new();
let mut working_set = StateWorkingSet::new(&engine_state);
let block = parse(&mut working_set, None, test, true);
match working_set.parse_errors.first() {
None => {
assert_eq!(block.len(), 1);
IO and redirection overhaul (#11934) # Description The PR overhauls how IO redirection is handled, allowing more explicit and fine-grain control over `stdout` and `stderr` output as well as more efficient IO and piping. To summarize the changes in this PR: - Added a new `IoStream` type to indicate the intended destination for a pipeline element's `stdout` and `stderr`. - The `stdout` and `stderr` `IoStream`s are stored in the `Stack` and to avoid adding 6 additional arguments to every eval function and `Command::run`. The `stdout` and `stderr` streams can be temporarily overwritten through functions on `Stack` and these functions will return a guard that restores the original `stdout` and `stderr` when dropped. - In the AST, redirections are now directly part of a `PipelineElement` as a `Option<Redirection>` field instead of having multiple different `PipelineElement` enum variants for each kind of redirection. This required changes to the parser, mainly in `lite_parser.rs`. - `Command`s can also set a `IoStream` override/redirection which will apply to the previous command in the pipeline. This is used, for example, in `ignore` to allow the previous external command to have its stdout redirected to `Stdio::null()` at spawn time. In contrast, the current implementation has to create an os pipe and manually consume the output on nushell's side. File and pipe redirections (`o>`, `e>`, `e>|`, etc.) have precedence over overrides from commands. This PR improves piping and IO speed, partially addressing #10763. Using the `throughput` command from that issue, this PR gives the following speedup on my setup for the commands below: | Command | Before (MB/s) | After (MB/s) | Bash (MB/s) | | --------------------------- | -------------:| ------------:| -----------:| | `throughput o> /dev/null` | 1169 | 52938 | 54305 | | `throughput \| ignore` | 840 | 55438 | N/A | | `throughput \| null` | Error | 53617 | N/A | | `throughput \| rg 'x'` | 1165 | 3049 | 3736 | | `(throughput) \| rg 'x'` | 810 | 3085 | 3815 | (Numbers above are the median samples for throughput) This PR also paves the way to refactor our `ExternalStream` handling in the various commands. For example, this PR already fixes the following code: ```nushell ^sh -c 'echo -n "hello "; sleep 0; echo "world"' | find "hello world" ``` This returns an empty list on 0.90.1 and returns a highlighted "hello world" on this PR. Since the `stdout` and `stderr` `IoStream`s are available to commands when they are run, then this unlocks the potential for more convenient behavior. E.g., the `find` command can disable its ansi highlighting if it detects that the output `IoStream` is not the terminal. Knowing the output streams will also allow background job output to be redirected more easily and efficiently. # User-Facing Changes - External commands returned from closures will be collected (in most cases): ```nushell 1..2 | each {|_| nu -c "print a" } ``` This gives `["a", "a"]` on this PR, whereas this used to print "a\na\n" and then return an empty list. ```nushell 1..2 | each {|_| nu -c "print -e a" } ``` This gives `["", ""]` and prints "a\na\n" to stderr, whereas this used to return an empty list and print "a\na\n" to stderr. - Trailing new lines are always trimmed for external commands when piping into internal commands or collecting it as a value. (Failure to decode the output as utf-8 will keep the trailing newline for the last binary value.) In the current nushell version, the following three code snippets differ only in parenthesis placement, but they all also have different outputs: 1. `1..2 | each { ^echo a }` ``` a a ╭────────────╮ │ empty list │ ╰────────────╯ ``` 2. `1..2 | each { (^echo a) }` ``` ╭───┬───╮ │ 0 │ a │ │ 1 │ a │ ╰───┴───╯ ``` 3. `1..2 | (each { ^echo a })` ``` ╭───┬───╮ │ 0 │ a │ │ │ │ │ 1 │ a │ │ │ │ ╰───┴───╯ ``` But in this PR, the above snippets will all have the same output: ``` ╭───┬───╮ │ 0 │ a │ │ 1 │ a │ ╰───┴───╯ ``` - All existing flags on `run-external` are now deprecated. - File redirections now apply to all commands inside a code block: ```nushell (nu -c "print -e a"; nu -c "print -e b") e> test.out ``` This gives "a\nb\n" in `test.out` and prints nothing. The same result would happen when printing to stdout and using a `o>` file redirection. - External command output will (almost) never be ignored, and ignoring output must be explicit now: ```nushell (^echo a; ^echo b) ``` This prints "a\nb\n", whereas this used to print only "b\n". This only applies to external commands; values and internal commands not in return position will not print anything (e.g., `(echo a; echo b)` still only prints "b"). - `complete` now always captures stderr (`do` is not necessary). # After Submitting The language guide and other documentation will need to be updated.
2024-03-14 21:51:55 +01:00
let pipeline = &block.pipelines[0];
assert_eq!(pipeline.len(), 1);
let element = &pipeline.elements[0];
assert!(element.redirection.is_none());
assert_eq!(element.expr.expr, Expr::String(expected.to_string()));
}
Some(pev) => match error_contains {
None => {
panic!("Err:{pev:#?}");
}
Some(contains_string) => {
let full_err = format!("{pev:#?}");
assert!(
full_err.contains(contains_string),
"Expected error containing {contains_string}, instead got {full_err}"
);
}
},
}
}
// cases that all should work
#[test]
pub fn unicode_escapes_in_strings() {
pub struct Tc(&'static [u8], &'static str);
let test_vec = vec![
Tc(b"\"hello \\u{6e}\\u{000075}\\u{073}hell\"", "hello nushell"),
// template: Tc(br#""<string literal without #'s>"", "<Rust literal comparand>")
//deprecated Tc(br#""\u006enu\u0075\u0073\u0073""#, "nnuuss"),
Tc(br#""hello \u{6e}\u{000075}\u{073}hell""#, "hello nushell"),
Tc(br#""\u{39}8\u{10ffff}""#, "98\u{10ffff}"),
Tc(br#""abc\u{41}""#, "abcA"), // at end of string
Tc(br#""\u{41}abc""#, "Aabc"), // at start of string
Tc(br#""\u{a}""#, "\n"), // single digit
];
for tci in test_vec {
println!("Expecting: {}", tci.1);
do_test(tci.0, tci.1, None);
}
}
// cases that all should fail (in expected way)
#[test]
pub fn unicode_escapes_in_strings_expected_failures() {
// input, substring of expected failure
pub struct Tc(&'static [u8], &'static str);
let test_vec = vec![
// template: Tc(br#""<string literal without #'s>"", "<pattern in expected error>")
//deprecated Tc(br#""\u06e""#, "any shape"), // 4digit too short, next char is EOF
//deprecatedTc(br#""\u06ex""#, "any shape"), // 4digit too short, next char is non-hex-digit
Syntax errors for string and int (#7952) # Description Added a few syntax errors in ints and strings, changed parser to stop and show that error rather than continue trying to parse those tokens as some other shape. However, I don't see how to push this direction much further, and most of the classic confusing errors can't be changed. Flagged as WIP for the moment, but passes all checks and works better than current release: 1. I have yet to figure out how to make these errors refer back to the book, as I see some other errors do. 2. How to give syntax error when malformed int is first token in line? Currently parsed as external command, user gets confusing error message. 3. Would like to be more strict with *decimal* int literals (lacking, e.g, `0x' prefix). Need to tinker more with the order of parse shape calls, currently, float is tried after int, so '1.4' has to be passed. _(Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience.)_ ```bash 〉"\z" Error: ╭─[entry #3:1:1] 1 │ "\z" · ─┬─ · ╰── Syntax error in string, unrecognized character after escape '\'. ╰──── ``` Canonic presentation of a syntax error. ```bash 〉" \u{01ffbogus}" Error: × Invalid syntax ╭─[entry #2:1:1] 1 │ " \u{01ffbogus}" · ───────┬────── · ╰── Syntax error in string, expecting 1 to 6 hex digits in unicode escape '\u{X...}', max value 10FFFF. ╰──── ``` Malformed unicode escape in string, flagged as error. String parse can be opinionated, it's the last shape tried. ```bash 〉0x22bogus Error: nu::shell::external_command (link) × External command failed ╭─[entry #4:1:1] 1 │ 0x22bogus · ────┬──── · ╰── executable was not found ╰──── help: No such file or directory (os error 2) ``` A *correct* number in first token would be evaluated, but an *incorrect* one is treated as external command? Confusing to users. ```bash 〉0 + 0x22bogus Error: × Invalid syntax ╭─[entry #5:1:1] 1 │ 0 + 0x22bogus · ────┬──── · ╰── Syntax error in int, invalid digits in radix 16 int. ╰──── ``` Can give syntax error if token is unambiguously int literal. e.g has 0b or 0x prefix, could not be a float. ```bash 〉0 + 098bogus Error: nu::parser::unsupported_operation (link) × Types mismatched for operation. ╭─[entry #6:1:1] 1 │ 0 + 098bogus · ┬ ┬ ────┬─── · │ │ ╰── string · │ ╰── doesn't support these values. · ╰── int ╰──── help: Change int or string to be the right types and try again. ``` But *decimal* literal (no prefix) can't be too strict. Parser is going to try float later. So '1.4' must be passed. # User-Facing Changes First and foremost, more specific error messages for typos in string and int literals. Probably improves interactive user experience. But a script that was causing and then checking for specific error might notice a different error message. _(List of all changes that impact the user experience here. This helps us keep track of breaking changes.)_ # Tests + Formatting Added (positive and negative unit tests in `cargo test -p nu-parser`. Didn't add integration tests. Make sure you've run and fixed any issues with these commands: - [x] `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - [x] `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - [x] `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --------- Co-authored-by: Stefan Holderbach <sholderbach@users.noreply.github.com>
2023-02-13 17:09:50 +01:00
Tc(br#""hello \u{6e""#, "missing '}'"), // extended, missing close delim
Tc(
br#""\u{39}8\u{000000000000000000000000000000000000000000000037}""#,
Syntax errors for string and int (#7952) # Description Added a few syntax errors in ints and strings, changed parser to stop and show that error rather than continue trying to parse those tokens as some other shape. However, I don't see how to push this direction much further, and most of the classic confusing errors can't be changed. Flagged as WIP for the moment, but passes all checks and works better than current release: 1. I have yet to figure out how to make these errors refer back to the book, as I see some other errors do. 2. How to give syntax error when malformed int is first token in line? Currently parsed as external command, user gets confusing error message. 3. Would like to be more strict with *decimal* int literals (lacking, e.g, `0x' prefix). Need to tinker more with the order of parse shape calls, currently, float is tried after int, so '1.4' has to be passed. _(Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience.)_ ```bash 〉"\z" Error: ╭─[entry #3:1:1] 1 │ "\z" · ─┬─ · ╰── Syntax error in string, unrecognized character after escape '\'. ╰──── ``` Canonic presentation of a syntax error. ```bash 〉" \u{01ffbogus}" Error: × Invalid syntax ╭─[entry #2:1:1] 1 │ " \u{01ffbogus}" · ───────┬────── · ╰── Syntax error in string, expecting 1 to 6 hex digits in unicode escape '\u{X...}', max value 10FFFF. ╰──── ``` Malformed unicode escape in string, flagged as error. String parse can be opinionated, it's the last shape tried. ```bash 〉0x22bogus Error: nu::shell::external_command (link) × External command failed ╭─[entry #4:1:1] 1 │ 0x22bogus · ────┬──── · ╰── executable was not found ╰──── help: No such file or directory (os error 2) ``` A *correct* number in first token would be evaluated, but an *incorrect* one is treated as external command? Confusing to users. ```bash 〉0 + 0x22bogus Error: × Invalid syntax ╭─[entry #5:1:1] 1 │ 0 + 0x22bogus · ────┬──── · ╰── Syntax error in int, invalid digits in radix 16 int. ╰──── ``` Can give syntax error if token is unambiguously int literal. e.g has 0b or 0x prefix, could not be a float. ```bash 〉0 + 098bogus Error: nu::parser::unsupported_operation (link) × Types mismatched for operation. ╭─[entry #6:1:1] 1 │ 0 + 098bogus · ┬ ┬ ────┬─── · │ │ ╰── string · │ ╰── doesn't support these values. · ╰── int ╰──── help: Change int or string to be the right types and try again. ``` But *decimal* literal (no prefix) can't be too strict. Parser is going to try float later. So '1.4' must be passed. # User-Facing Changes First and foremost, more specific error messages for typos in string and int literals. Probably improves interactive user experience. But a script that was causing and then checking for specific error might notice a different error message. _(List of all changes that impact the user experience here. This helps us keep track of breaking changes.)_ # Tests + Formatting Added (positive and negative unit tests in `cargo test -p nu-parser`. Didn't add integration tests. Make sure you've run and fixed any issues with these commands: - [x] `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - [x] `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - [x] `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --------- Co-authored-by: Stefan Holderbach <sholderbach@users.noreply.github.com>
2023-02-13 17:09:50 +01:00
"must be 1-6 hex digits",
), // hex too long, but small value
Syntax errors for string and int (#7952) # Description Added a few syntax errors in ints and strings, changed parser to stop and show that error rather than continue trying to parse those tokens as some other shape. However, I don't see how to push this direction much further, and most of the classic confusing errors can't be changed. Flagged as WIP for the moment, but passes all checks and works better than current release: 1. I have yet to figure out how to make these errors refer back to the book, as I see some other errors do. 2. How to give syntax error when malformed int is first token in line? Currently parsed as external command, user gets confusing error message. 3. Would like to be more strict with *decimal* int literals (lacking, e.g, `0x' prefix). Need to tinker more with the order of parse shape calls, currently, float is tried after int, so '1.4' has to be passed. _(Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience.)_ ```bash 〉"\z" Error: ╭─[entry #3:1:1] 1 │ "\z" · ─┬─ · ╰── Syntax error in string, unrecognized character after escape '\'. ╰──── ``` Canonic presentation of a syntax error. ```bash 〉" \u{01ffbogus}" Error: × Invalid syntax ╭─[entry #2:1:1] 1 │ " \u{01ffbogus}" · ───────┬────── · ╰── Syntax error in string, expecting 1 to 6 hex digits in unicode escape '\u{X...}', max value 10FFFF. ╰──── ``` Malformed unicode escape in string, flagged as error. String parse can be opinionated, it's the last shape tried. ```bash 〉0x22bogus Error: nu::shell::external_command (link) × External command failed ╭─[entry #4:1:1] 1 │ 0x22bogus · ────┬──── · ╰── executable was not found ╰──── help: No such file or directory (os error 2) ``` A *correct* number in first token would be evaluated, but an *incorrect* one is treated as external command? Confusing to users. ```bash 〉0 + 0x22bogus Error: × Invalid syntax ╭─[entry #5:1:1] 1 │ 0 + 0x22bogus · ────┬──── · ╰── Syntax error in int, invalid digits in radix 16 int. ╰──── ``` Can give syntax error if token is unambiguously int literal. e.g has 0b or 0x prefix, could not be a float. ```bash 〉0 + 098bogus Error: nu::parser::unsupported_operation (link) × Types mismatched for operation. ╭─[entry #6:1:1] 1 │ 0 + 098bogus · ┬ ┬ ────┬─── · │ │ ╰── string · │ ╰── doesn't support these values. · ╰── int ╰──── help: Change int or string to be the right types and try again. ``` But *decimal* literal (no prefix) can't be too strict. Parser is going to try float later. So '1.4' must be passed. # User-Facing Changes First and foremost, more specific error messages for typos in string and int literals. Probably improves interactive user experience. But a script that was causing and then checking for specific error might notice a different error message. _(List of all changes that impact the user experience here. This helps us keep track of breaking changes.)_ # Tests + Formatting Added (positive and negative unit tests in `cargo test -p nu-parser`. Didn't add integration tests. Make sure you've run and fixed any issues with these commands: - [x] `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - [x] `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - [x] `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --------- Co-authored-by: Stefan Holderbach <sholderbach@users.noreply.github.com>
2023-02-13 17:09:50 +01:00
Tc(br#""\u{110000}""#, "max value 10FFF"), // max unicode <= 0x10ffff
];
for tci in test_vec {
println!("Expecting failure containing: {}", tci.1);
do_test(tci.0, "--success not expected--", Some(tci.1));
}
}