nushell/crates/nu-parser/tests/test_parser_unicode_escapes.rs
Bob Hyman 007916c2c1
Syntax errors for string and int (#7952)
# Description

Added a few syntax errors in ints and strings, changed parser to stop
and show that error rather than continue trying to parse those tokens as
some other shape. However, I don't see how to push this direction much
further, and most of the classic confusing errors can't be changed.

Flagged as WIP for the moment, but passes all checks and works better
than current release:
1. I have yet to figure out how to make these errors refer back to the
book, as I see some other errors do.
2. How to give syntax error when malformed int is first token in line?
Currently parsed as external command, user gets confusing error message.
3. Would like to be more strict with *decimal* int literals (lacking,
e.g, `0x' prefix). Need to tinker more with the order of parse shape
calls, currently, float is tried after int, so '1.4' has to be passed.

_(Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.)_

```bash

〉"\z"
Error: 
   ╭─[entry #3:1:1]
 1 │ "\z"
   ·  ─┬─
   ·   ╰── Syntax error in string, unrecognized character after escape '\'.
   ╰────
```
Canonic presentation of a syntax error.
```bash
〉"  \u{01ffbogus}"
Error: 
  × Invalid syntax
   ╭─[entry #2:1:1]
 1 │ "  \u{01ffbogus}"
   ·    ───────┬──────
   ·           ╰── Syntax error in string, expecting 1 to 6 hex digits in unicode escape '\u{X...}', max value 10FFFF.
   ╰────
```
Malformed unicode escape in string, flagged as error.  
String parse can be opinionated, it's the last shape tried.
```bash
〉0x22bogus
Error: nu:🐚:external_command (link)
  × External command failed
   ╭─[entry #4:1:1]
1 │ 0x22bogus
   · ────┬────
   ·     ╰── executable was not found
   ╰────
  help: No such file or directory (os error 2)
```
A *correct* number in first token would be evaluated, but an *incorrect*
one is treated as external command? Confusing to users.
```bash
〉0 + 0x22bogus
Error: 
  × Invalid syntax
   ╭─[entry #5:1:1]
1 │ 0 + 0x22bogus
   ·     ────┬────
   ·         ╰── Syntax error in int, invalid digits in radix 16 int.
   ╰────
```
Can give syntax error if token is unambiguously int literal. e.g has 0b
or 0x prefix, could not be a float.
```bash
〉0 + 098bogus
Error: nu::parser::unsupported_operation (link)

  × Types mismatched for operation.
   ╭─[entry #6:1:1]
 1 │ 0 + 098bogus
   · ┬ ┬ ────┬───
   · │ │     ╰── string
   · │ ╰── doesn't support these values.
   · ╰── int
   ╰────
  help: Change int or string to be the right types and try again.
```
But *decimal* literal (no prefix) can't be too strict. Parser is going
to try float later. So '1.4' must be passed.

# User-Facing Changes

First and foremost, more specific error messages for typos in string and
int literals. Probably improves interactive user experience.

But a script that was causing and then checking for specific error might
notice a different error message.

_(List of all changes that impact the user experience here. This helps
us keep track of breaking changes.)_

# Tests + Formatting

Added (positive and negative unit tests in `cargo test -p nu-parser`.
Didn't add integration tests.

Make sure you've run and fixed any issues with these commands:

- [x] `cargo fmt --all -- --check` to check standard code formatting
(`cargo fmt --all` applies these changes)
- [x] `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A
clippy::needless_collect` to check that you're using the standard code
style
- [x] `cargo test --workspace` to check that all tests pass

# After Submitting

If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.

---------

Co-authored-by: Stefan Holderbach <sholderbach@users.noreply.github.com>
2023-02-13 16:09:50 +00:00

90 lines
3.1 KiB
Rust

#![cfg(test)]
//use nu_parser::ParseError;
use nu_parser::*;
use nu_protocol::{
//ast::{Expr, Expression, PipelineElement},
ast::{Expr, PipelineElement},
//engine::{Command, EngineState, Stack, StateWorkingSet},
engine::{EngineState, StateWorkingSet},
//Signature, SyntaxShape,
};
pub fn do_test(test: &[u8], expected: &str, error_contains: Option<&str>) {
let engine_state = EngineState::new();
let mut working_set = StateWorkingSet::new(&engine_state);
let (block, err) = parse(&mut working_set, None, test, true, &[]);
match err {
None => {
assert_eq!(block.len(), 1);
let expressions = &block[0];
assert_eq!(expressions.len(), 1);
if let PipelineElement::Expression(_, expr) = &expressions[0] {
assert_eq!(expr.expr, Expr::String(expected.to_string()))
} else {
panic!("Not an expression")
}
}
Some(pev) => match error_contains {
None => {
panic!("Err:{pev:#?}");
}
Some(contains_string) => {
let full_err = format!("{pev:#?}");
assert!(
full_err.contains(contains_string),
"Expected error containing {contains_string}, instead got {full_err}"
);
}
},
}
}
// cases that all should work
#[test]
pub fn unicode_escapes_in_strings() {
pub struct Tc(&'static [u8], &'static str);
let test_vec = vec![
Tc(b"\"hello \\u{6e}\\u{000075}\\u{073}hell\"", "hello nushell"),
// template: Tc(br#""<string literal without #'s>"", "<Rust literal comparand>")
//deprecated Tc(br#""\u006enu\u0075\u0073\u0073""#, "nnuuss"),
Tc(br#""hello \u{6e}\u{000075}\u{073}hell""#, "hello nushell"),
Tc(br#""\u{39}8\u{10ffff}""#, "98\u{10ffff}"),
Tc(br#""abc\u{41}""#, "abcA"), // at end of string
Tc(br#""\u{41}abc""#, "Aabc"), // at start of string
Tc(br#""\u{a}""#, "\n"), // single digit
];
for tci in test_vec {
println!("Expecting: {}", tci.1);
do_test(tci.0, tci.1, None);
}
}
// cases that all should fail (in expected way)
#[test]
pub fn unicode_escapes_in_strings_expected_failures() {
// input, substring of expected failure
pub struct Tc(&'static [u8], &'static str);
let test_vec = vec![
// template: Tc(br#""<string literal without #'s>"", "<pattern in expected error>")
//deprecated Tc(br#""\u06e""#, "any shape"), // 4digit too short, next char is EOF
//deprecatedTc(br#""\u06ex""#, "any shape"), // 4digit too short, next char is non-hex-digit
Tc(br#""hello \u{6e""#, "missing '}'"), // extended, missing close delim
Tc(
br#""\u{39}8\u{000000000000000000000000000000000000000000000037}""#,
"must be 1-6 hex digits",
), // hex too long, but small value
Tc(br#""\u{110000}""#, "max value 10FFF"), // max unicode <= 0x10ffff
];
for tci in test_vec {
println!("Expecting failure containing: {}", tci.1);
do_test(tci.0, "--success not expected--", Some(tci.1));
}
}