nushell/crates/nu-parser/src/lex.rs

697 lines
24 KiB
Rust
Raw Normal View History

use nu_protocol::{ParseError, Span};
2021-06-30 03:42:56 +02:00
#[derive(Debug, PartialEq, Eq, Clone, Copy)]
2021-06-30 03:42:56 +02:00
pub enum TokenContents {
Item,
Comment,
Pipe,
Better errors when bash-like operators are used (#7241) # Description Adds improved errors for when a user uses a bashism that nu doesn't support. fixes #7237 Examples: ``` Error: nu::parser::shell_andand (link) × The '&&' operator is not supported in Nushell ╭─[entry #1:1:1] 1 │ ls && ls · ─┬ · ╰── instead of '&&', use ';' or 'and' ╰──── help: use ';' instead of the shell '&&', or 'and' instead of the boolean '&&' ``` ``` Error: nu::parser::shell_oror (link) × The '||' operator is not supported in Nushell ╭─[entry #8:1:1] 1 │ ls || ls · ─┬ · ╰── instead of '||', use 'try' or 'or' ╰──── help: use 'try' instead of the shell '||', or 'or' instead of the boolean '||' ``` ``` Error: nu::parser::shell_err (link) × The '2>' shell operation is 'err>' in Nushell. ╭─[entry #9:1:1] 1 │ foo 2> bar.txt · ─┬ · ╰── use 'err>' instead of '2>' in Nushell ╰──── ``` ``` Error: nu::parser::shell_outerr (link) × The '2>&1' shell operation is 'out+err>' in Nushell. ╭─[entry #10:1:1] 1 │ foo 2>&1 bar.txt · ──┬─ · ╰── use 'out+err>' instead of '2>&1' in Nushell ╰──── help: Nushell redirection will write all of stdout before stderr. ``` # User-Facing Changes **BREAKING CHANGES** This removes the `&&` and `||` operators. We previously supported by `&&`/`and` and `||`/`or`. With this change, only `and` and `or` are valid boolean operators. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2022-12-08 00:02:11 +01:00
PipePipe,
AssignmentOperator,
ErrGreaterPipe,
OutErrGreaterPipe,
2021-06-30 03:42:56 +02:00
Semicolon,
OutGreaterThan,
OutGreaterGreaterThan,
ErrGreaterThan,
ErrGreaterGreaterThan,
OutErrGreaterThan,
OutErrGreaterGreaterThan,
2021-06-30 03:42:56 +02:00
Eol,
}
#[derive(Debug, PartialEq, Eq)]
pub struct Token {
pub contents: TokenContents,
pub span: Span,
}
impl Token {
pub fn new(contents: TokenContents, span: Span) -> Token {
Token { contents, span }
}
}
#[derive(Clone, Copy, Debug)]
pub enum BlockKind {
Paren,
CurlyBracket,
SquareBracket,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
AngleBracket,
2021-06-30 03:42:56 +02:00
}
impl BlockKind {
fn closing(self) -> u8 {
match self {
BlockKind::Paren => b')',
BlockKind::SquareBracket => b']',
BlockKind::CurlyBracket => b'}',
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
BlockKind::AngleBracket => b'>',
2021-06-30 03:42:56 +02:00
}
}
}
// A baseline token is terminated if it's not nested inside of a paired
// delimiter and the next character is one of: `|`, `;`, `#` or any
// whitespace.
2021-07-17 00:11:15 +02:00
fn is_item_terminator(
block_level: &[BlockKind],
c: u8,
additional_whitespace: &[u8],
special_tokens: &[u8],
) -> bool {
2021-06-30 03:42:56 +02:00
block_level.is_empty()
2021-07-06 00:58:56 +02:00
&& (c == b' '
|| c == b'\t'
|| c == b'\n'
|| c == b'\r'
2021-07-06 00:58:56 +02:00
|| c == b'|'
|| c == b';'
2021-07-17 00:11:15 +02:00
|| additional_whitespace.contains(&c)
|| special_tokens.contains(&c))
2021-07-16 23:55:12 +02:00
}
/// Assignment operators have special handling distinct from math expressions, as they cause the
/// rest of the pipeline to be consumed.
pub fn is_assignment_operator(bytes: &[u8]) -> bool {
matches!(bytes, b"=" | b"+=" | b"++=" | b"-=" | b"*=" | b"/=")
}
2021-07-16 23:55:12 +02:00
// A special token is one that is a byte that stands alone as its own token. For example
// when parsing a signature you may want to have `:` be able to separate tokens and also
// to be handled as its own token to notify you you're about to parse a type in the example
// `foo:bar`
2021-07-17 00:11:15 +02:00
fn is_special_item(block_level: &[BlockKind], c: u8, special_tokens: &[u8]) -> bool {
block_level.is_empty() && special_tokens.contains(&c)
2021-06-30 03:42:56 +02:00
}
2021-07-06 00:58:56 +02:00
pub fn lex_item(
input: &[u8],
curr_offset: &mut usize,
span_offset: usize,
2021-07-17 00:11:15 +02:00
additional_whitespace: &[u8],
special_tokens: &[u8],
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
in_signature: bool,
) -> (Token, Option<ParseError>) {
2021-06-30 03:42:56 +02:00
// This variable tracks the starting character of a string literal, so that
// we remain inside the string literal lexer mode until we encounter the
// closing quote.
let mut quote_start: Option<u8> = None;
let mut in_comment = false;
let token_start = *curr_offset;
// This Vec tracks paired delimiters
let mut block_level: Vec<BlockKind> = vec![];
// The process of slurping up a baseline token repeats:
//
// - String literal, which begins with `'` or `"`, and continues until
2021-06-30 03:42:56 +02:00
// the same character is encountered again.
// - Delimiter pair, which begins with `[`, `(`, or `{`, and continues until
// the matching closing delimiter is found, skipping comments and string
// literals.
// - When not nested inside of a delimiter pair, when a terminating
// character (whitespace, `|`, `;` or `#`) is encountered, the baseline
// token is done.
// - Otherwise, accumulate the character into the current baseline token.
while let Some(c) = input.get(*curr_offset) {
let c = *c;
if let Some(start) = quote_start {
// Check if we're in an escape sequence
if c == b'\\' && start == b'"' {
// Go ahead and consume the escape character if possible
if input.get(*curr_offset + 1).is_some() {
// Successfully escaped the character
*curr_offset += 2;
continue;
} else {
let span = Span::new(span_offset + token_start, span_offset + *curr_offset);
return (
Token {
contents: TokenContents::Item,
span,
},
Some(ParseError::UnexpectedEof(
(start as char).to_string(),
Span::new(span.end - 1, span.end),
)),
);
}
}
2021-06-30 03:42:56 +02:00
// If we encountered the closing quote character for the current
// string, we're done with the current string.
if c == start {
// Also need to check to make sure we aren't escaped
2021-06-30 03:42:56 +02:00
quote_start = None;
}
} else if c == b'#' {
2021-07-17 00:11:15 +02:00
if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
2021-06-30 03:42:56 +02:00
break;
}
in_comment = true;
2021-12-14 21:17:02 +01:00
} else if c == b'\n' || c == b'\r' {
2021-06-30 03:42:56 +02:00
in_comment = false;
2021-07-17 00:11:15 +02:00
if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
2021-06-30 03:42:56 +02:00
break;
}
} else if in_comment {
2021-07-17 00:11:15 +02:00
if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
2021-06-30 03:42:56 +02:00
break;
}
2021-07-17 00:11:15 +02:00
} else if is_special_item(&block_level, c, special_tokens) && token_start == *curr_offset {
2021-07-16 23:55:12 +02:00
*curr_offset += 1;
break;
} else if c == b'\'' || c == b'"' || c == b'`' {
2021-06-30 03:42:56 +02:00
// We encountered the opening quote of a string literal.
quote_start = Some(c);
} else if c == b'[' {
// We encountered an opening `[` delimiter.
block_level.push(BlockKind::SquareBracket);
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
} else if c == b'<' && in_signature {
block_level.push(BlockKind::AngleBracket);
} else if c == b'>' && in_signature {
if let Some(BlockKind::AngleBracket) = block_level.last() {
let _ = block_level.pop();
}
2021-06-30 03:42:56 +02:00
} else if c == b']' {
// We encountered a closing `]` delimiter. Pop off the opening `[`
// delimiter.
if let Some(BlockKind::SquareBracket) = block_level.last() {
let _ = block_level.pop();
}
} else if c == b'{' {
// We encountered an opening `{` delimiter.
block_level.push(BlockKind::CurlyBracket);
} else if c == b'}' {
// We encountered a closing `}` delimiter. Pop off the opening `{`.
if let Some(BlockKind::CurlyBracket) = block_level.last() {
let _ = block_level.pop();
} else {
// We encountered a closing `}` delimiter, but the last opening
// delimiter was not a `{`. This is an error.
let span = Span::new(span_offset + token_start, span_offset + *curr_offset);
*curr_offset += 1;
return (
Token {
contents: TokenContents::Item,
span,
},
Some(ParseError::Unbalanced(
"{".to_string(),
"}".to_string(),
Span::new(span.end, span.end + 1),
)),
);
2021-06-30 03:42:56 +02:00
}
} else if c == b'(' {
// We encountered an opening `(` delimiter.
2021-06-30 03:42:56 +02:00
block_level.push(BlockKind::Paren);
} else if c == b')' {
// We encountered a closing `)` delimiter. Pop off the opening `(`.
if let Some(BlockKind::Paren) = block_level.last() {
let _ = block_level.pop();
} else {
// We encountered a closing `)` delimiter, but the last opening
// delimiter was not a `(`. This is an error.
let span = Span::new(span_offset + token_start, span_offset + *curr_offset);
*curr_offset += 1;
return (
Token {
contents: TokenContents::Item,
span,
},
Some(ParseError::Unbalanced(
"(".to_string(),
")".to_string(),
Span::new(span.end, span.end + 1),
)),
);
2021-06-30 03:42:56 +02:00
}
} else if c == b'r' && input.get(*curr_offset + 1) == Some(b'#').as_ref() {
// already checked `r#` pattern, so it's a raw string.
let lex_result = lex_raw_string(input, curr_offset, span_offset);
let span = Span::new(span_offset + token_start, span_offset + *curr_offset);
if let Err(e) = lex_result {
return (
Token {
contents: TokenContents::Item,
span,
},
Some(e),
);
}
} else if c == b'|' && is_redirection(&input[token_start..*curr_offset]) {
// matches err>| etc.
*curr_offset += 1;
break;
2021-07-17 00:11:15 +02:00
} else if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
2021-06-30 03:42:56 +02:00
break;
}
*curr_offset += 1;
}
let span = Span::new(span_offset + token_start, span_offset + *curr_offset);
2021-06-30 03:42:56 +02:00
if let Some(delim) = quote_start {
// The non-lite parse trims quotes on both sides, so we add the expected quote so that
// anyone wanting to consume this partial parse (e.g., completions) will be able to get
// correct information from the non-lite parse.
return (
Token {
contents: TokenContents::Item,
span,
},
Some(ParseError::UnexpectedEof(
(delim as char).to_string(),
Span::new(span.end - 1, span.end),
)),
);
2021-06-30 03:42:56 +02:00
}
// If there is still unclosed opening delimiters, remember they were missing
if let Some(block) = block_level.last() {
let delim = block.closing();
let cause = ParseError::UnexpectedEof(
(delim as char).to_string(),
Span::new(span.end - 1, span.end),
);
2021-06-30 03:42:56 +02:00
return (
Token {
contents: TokenContents::Item,
span,
},
Some(cause),
2021-06-30 03:42:56 +02:00
);
}
// If we didn't accumulate any characters, it's an unexpected error.
if *curr_offset - token_start == 0 {
return (
Token {
contents: TokenContents::Item,
span,
},
2021-06-30 03:42:56 +02:00
Some(ParseError::UnexpectedEof("command".to_string(), span)),
);
}
let mut err = None;
let output = match &input[(span.start - span_offset)..(span.end - span_offset)] {
bytes if is_assignment_operator(bytes) => Token {
contents: TokenContents::AssignmentOperator,
span,
},
b"out>" | b"o>" => Token {
contents: TokenContents::OutGreaterThan,
span,
},
b"out>>" | b"o>>" => Token {
contents: TokenContents::OutGreaterGreaterThan,
span,
},
b"out>|" | b"o>|" => {
err = Some(ParseError::Expected(
"`|`. Redirecting stdout to a pipe is the same as normal piping.",
span,
));
Token {
contents: TokenContents::Item,
span,
}
}
b"err>" | b"e>" => Token {
contents: TokenContents::ErrGreaterThan,
span,
},
b"err>>" | b"e>>" => Token {
contents: TokenContents::ErrGreaterGreaterThan,
span,
},
b"err>|" | b"e>|" => Token {
contents: TokenContents::ErrGreaterPipe,
span,
},
b"out+err>" | b"err+out>" | b"o+e>" | b"e+o>" => Token {
contents: TokenContents::OutErrGreaterThan,
span,
},
b"out+err>>" | b"err+out>>" | b"o+e>>" | b"e+o>>" => Token {
contents: TokenContents::OutErrGreaterGreaterThan,
span,
},
b"out+err>|" | b"err+out>|" | b"o+e>|" | b"e+o>|" => Token {
contents: TokenContents::OutErrGreaterPipe,
span,
},
b"&&" => {
err = Some(ParseError::ShellAndAnd(span));
Better errors when bash-like operators are used (#7241) # Description Adds improved errors for when a user uses a bashism that nu doesn't support. fixes #7237 Examples: ``` Error: nu::parser::shell_andand (link) × The '&&' operator is not supported in Nushell ╭─[entry #1:1:1] 1 │ ls && ls · ─┬ · ╰── instead of '&&', use ';' or 'and' ╰──── help: use ';' instead of the shell '&&', or 'and' instead of the boolean '&&' ``` ``` Error: nu::parser::shell_oror (link) × The '||' operator is not supported in Nushell ╭─[entry #8:1:1] 1 │ ls || ls · ─┬ · ╰── instead of '||', use 'try' or 'or' ╰──── help: use 'try' instead of the shell '||', or 'or' instead of the boolean '||' ``` ``` Error: nu::parser::shell_err (link) × The '2>' shell operation is 'err>' in Nushell. ╭─[entry #9:1:1] 1 │ foo 2> bar.txt · ─┬ · ╰── use 'err>' instead of '2>' in Nushell ╰──── ``` ``` Error: nu::parser::shell_outerr (link) × The '2>&1' shell operation is 'out+err>' in Nushell. ╭─[entry #10:1:1] 1 │ foo 2>&1 bar.txt · ──┬─ · ╰── use 'out+err>' instead of '2>&1' in Nushell ╰──── help: Nushell redirection will write all of stdout before stderr. ``` # User-Facing Changes **BREAKING CHANGES** This removes the `&&` and `||` operators. We previously supported by `&&`/`and` and `||`/`or`. With this change, only `and` and `or` are valid boolean operators. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2022-12-08 00:02:11 +01:00
Token {
contents: TokenContents::Item,
Better errors when bash-like operators are used (#7241) # Description Adds improved errors for when a user uses a bashism that nu doesn't support. fixes #7237 Examples: ``` Error: nu::parser::shell_andand (link) × The '&&' operator is not supported in Nushell ╭─[entry #1:1:1] 1 │ ls && ls · ─┬ · ╰── instead of '&&', use ';' or 'and' ╰──── help: use ';' instead of the shell '&&', or 'and' instead of the boolean '&&' ``` ``` Error: nu::parser::shell_oror (link) × The '||' operator is not supported in Nushell ╭─[entry #8:1:1] 1 │ ls || ls · ─┬ · ╰── instead of '||', use 'try' or 'or' ╰──── help: use 'try' instead of the shell '||', or 'or' instead of the boolean '||' ``` ``` Error: nu::parser::shell_err (link) × The '2>' shell operation is 'err>' in Nushell. ╭─[entry #9:1:1] 1 │ foo 2> bar.txt · ─┬ · ╰── use 'err>' instead of '2>' in Nushell ╰──── ``` ``` Error: nu::parser::shell_outerr (link) × The '2>&1' shell operation is 'out+err>' in Nushell. ╭─[entry #10:1:1] 1 │ foo 2>&1 bar.txt · ──┬─ · ╰── use 'out+err>' instead of '2>&1' in Nushell ╰──── help: Nushell redirection will write all of stdout before stderr. ``` # User-Facing Changes **BREAKING CHANGES** This removes the `&&` and `||` operators. We previously supported by `&&`/`and` and `||`/`or`. With this change, only `and` and `or` are valid boolean operators. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2022-12-08 00:02:11 +01:00
span,
}
}
b"2>" => {
err = Some(ParseError::ShellErrRedirect(span));
Better errors when bash-like operators are used (#7241) # Description Adds improved errors for when a user uses a bashism that nu doesn't support. fixes #7237 Examples: ``` Error: nu::parser::shell_andand (link) × The '&&' operator is not supported in Nushell ╭─[entry #1:1:1] 1 │ ls && ls · ─┬ · ╰── instead of '&&', use ';' or 'and' ╰──── help: use ';' instead of the shell '&&', or 'and' instead of the boolean '&&' ``` ``` Error: nu::parser::shell_oror (link) × The '||' operator is not supported in Nushell ╭─[entry #8:1:1] 1 │ ls || ls · ─┬ · ╰── instead of '||', use 'try' or 'or' ╰──── help: use 'try' instead of the shell '||', or 'or' instead of the boolean '||' ``` ``` Error: nu::parser::shell_err (link) × The '2>' shell operation is 'err>' in Nushell. ╭─[entry #9:1:1] 1 │ foo 2> bar.txt · ─┬ · ╰── use 'err>' instead of '2>' in Nushell ╰──── ``` ``` Error: nu::parser::shell_outerr (link) × The '2>&1' shell operation is 'out+err>' in Nushell. ╭─[entry #10:1:1] 1 │ foo 2>&1 bar.txt · ──┬─ · ╰── use 'out+err>' instead of '2>&1' in Nushell ╰──── help: Nushell redirection will write all of stdout before stderr. ``` # User-Facing Changes **BREAKING CHANGES** This removes the `&&` and `||` operators. We previously supported by `&&`/`and` and `||`/`or`. With this change, only `and` and `or` are valid boolean operators. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2022-12-08 00:02:11 +01:00
Token {
contents: TokenContents::Item,
span,
}
}
b"2>&1" => {
err = Some(ParseError::ShellOutErrRedirect(span));
Token {
contents: TokenContents::Item,
span,
}
}
_ => Token {
contents: TokenContents::Item,
span,
},
};
(output, err)
2021-06-30 03:42:56 +02:00
}
fn lex_raw_string(
input: &[u8],
curr_offset: &mut usize,
span_offset: usize,
) -> Result<(), ParseError> {
// A raw string literal looks like `echo r#'Look, I can use 'single quotes'!'#`
// If the next character is `#` we're probably looking at a raw string literal
// so we need to read all the text until we find a closing `#`. This raw string
// can contain any character, including newlines and double quotes without needing
// to escape them.
//
// A raw string can contain many `#` as prefix,
// incase if there is a `'#` or `#'` in the string itself.
// E.g: r##'I can use '#' in a raw string'##
let mut prefix_sharp_cnt = 0;
let start = *curr_offset;
while let Some(b'#') = input.get(start + prefix_sharp_cnt + 1) {
prefix_sharp_cnt += 1;
}
// curr_offset is the character `r`, we need to move forward and skip all `#`
// characters.
//
// e.g: r###'<body>
// ^
// ^
// curr_offset
*curr_offset += prefix_sharp_cnt + 1;
// the next one should be a single quote.
if input.get(*curr_offset) != Some(&b'\'') {
return Err(ParseError::Expected(
"'",
Span::new(span_offset + *curr_offset, span_offset + *curr_offset + 1),
));
}
*curr_offset += 1;
let mut matches = false;
while let Some(ch) = input.get(*curr_offset) {
// check for postfix '###
if *ch == b'#' {
let start_ch = input[*curr_offset - prefix_sharp_cnt];
let postfix = &input[*curr_offset - prefix_sharp_cnt + 1..=*curr_offset];
if start_ch == b'\'' && postfix.iter().all(|x| *x == b'#') {
matches = true;
break;
}
}
*curr_offset += 1
}
if !matches {
let mut expected = '\''.to_string();
expected.push_str(&"#".repeat(prefix_sharp_cnt));
return Err(ParseError::UnexpectedEof(
expected,
Span::new(span_offset + *curr_offset - 1, span_offset + *curr_offset),
));
}
Ok(())
}
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
pub fn lex_signature(
input: &[u8],
span_offset: usize,
additional_whitespace: &[u8],
special_tokens: &[u8],
skip_comment: bool,
) -> (Vec<Token>, Option<ParseError>) {
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
let mut state = LexState {
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
input,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
output: Vec::new(),
error: None,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
span_offset,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
};
lex_internal(
&mut state,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
additional_whitespace,
special_tokens,
skip_comment,
true,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
None,
);
(state.output, state.error)
}
#[derive(Debug)]
pub struct LexState<'a> {
pub input: &'a [u8],
pub output: Vec<Token>,
pub error: Option<ParseError>,
pub span_offset: usize,
}
/// Lex until the output is `max_tokens` longer than before the call, or until the input is exhausted.
/// The return value indicates how many tokens the call added to / removed from the output.
///
/// The behaviour here is non-obvious when `additional_whitespace` doesn't include newline:
/// If you pass a `state` where the last token in the output is an Eol, this might *remove* tokens.
pub fn lex_n_tokens(
state: &mut LexState,
additional_whitespace: &[u8],
special_tokens: &[u8],
skip_comment: bool,
max_tokens: usize,
) -> isize {
let n_tokens = state.output.len();
lex_internal(
state,
additional_whitespace,
special_tokens,
skip_comment,
false,
Some(max_tokens),
);
// If this lex_internal call reached the end of the input, there may now be fewer tokens
// in the output than before.
let tokens_n_diff = (state.output.len() as isize) - (n_tokens as isize);
let next_offset = state.output.last().map(|token| token.span.end);
if let Some(next_offset) = next_offset {
state.input = &state.input[next_offset - state.span_offset..];
state.span_offset = next_offset;
}
tokens_n_diff
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
}
2021-06-30 03:42:56 +02:00
pub fn lex(
input: &[u8],
span_offset: usize,
2021-07-17 00:11:15 +02:00
additional_whitespace: &[u8],
special_tokens: &[u8],
2021-11-21 19:13:09 +01:00
skip_comment: bool,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
) -> (Vec<Token>, Option<ParseError>) {
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
let mut state = LexState {
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
input,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
output: Vec::new(),
error: None,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
span_offset,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
};
lex_internal(
&mut state,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
additional_whitespace,
special_tokens,
skip_comment,
false,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
None,
);
(state.output, state.error)
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
}
fn lex_internal(
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state: &mut LexState,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
additional_whitespace: &[u8],
special_tokens: &[u8],
skip_comment: bool,
// within signatures we want to treat `<` and `>` specially
in_signature: bool,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
max_tokens: Option<usize>,
) {
let initial_output_len = state.output.len();
2021-06-30 03:42:56 +02:00
let mut curr_offset = 0;
2021-06-30 03:42:56 +02:00
let mut is_complete = true;
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
while let Some(c) = state.input.get(curr_offset) {
if max_tokens
.is_some_and(|max_tokens| state.output.len() >= initial_output_len + max_tokens)
{
break;
}
2021-06-30 03:42:56 +02:00
let c = *c;
if c == b'|' {
// If the next character is `|`, it's either `|` or `||`.
let idx = curr_offset;
let prev_idx = idx;
curr_offset += 1;
// If the next character is `|`, we're looking at a `||`.
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
if let Some(c) = state.input.get(curr_offset) {
2021-06-30 03:42:56 +02:00
if *c == b'|' {
let idx = curr_offset;
curr_offset += 1;
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(Token::new(
Better errors when bash-like operators are used (#7241) # Description Adds improved errors for when a user uses a bashism that nu doesn't support. fixes #7237 Examples: ``` Error: nu::parser::shell_andand (link) × The '&&' operator is not supported in Nushell ╭─[entry #1:1:1] 1 │ ls && ls · ─┬ · ╰── instead of '&&', use ';' or 'and' ╰──── help: use ';' instead of the shell '&&', or 'and' instead of the boolean '&&' ``` ``` Error: nu::parser::shell_oror (link) × The '||' operator is not supported in Nushell ╭─[entry #8:1:1] 1 │ ls || ls · ─┬ · ╰── instead of '||', use 'try' or 'or' ╰──── help: use 'try' instead of the shell '||', or 'or' instead of the boolean '||' ``` ``` Error: nu::parser::shell_err (link) × The '2>' shell operation is 'err>' in Nushell. ╭─[entry #9:1:1] 1 │ foo 2> bar.txt · ─┬ · ╰── use 'err>' instead of '2>' in Nushell ╰──── ``` ``` Error: nu::parser::shell_outerr (link) × The '2>&1' shell operation is 'out+err>' in Nushell. ╭─[entry #10:1:1] 1 │ foo 2>&1 bar.txt · ──┬─ · ╰── use 'out+err>' instead of '2>&1' in Nushell ╰──── help: Nushell redirection will write all of stdout before stderr. ``` # User-Facing Changes **BREAKING CHANGES** This removes the `&&` and `||` operators. We previously supported by `&&`/`and` and `||`/`or`. With this change, only `and` and `or` are valid boolean operators. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2022-12-08 00:02:11 +01:00
TokenContents::PipePipe,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + prev_idx, state.span_offset + idx + 1),
2021-06-30 03:42:56 +02:00
));
continue;
}
}
// Otherwise, it's just a regular `|` token.
// Before we push, check to see if the previous character was a newline.
// If so, then this is a continuation of the previous line
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
if let Some(prev) = state.output.last_mut() {
match prev.contents {
TokenContents::Eol => {
*prev = Token::new(
TokenContents::Pipe,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + idx, state.span_offset + idx + 1),
allow comment in multiple line pipeline (#9436) <!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description - fixes: #5517 - fixes: #9250 For the following commands: ``` ls # | le | length ``` I found that it generates a bad lite parsing result: ``` LiteBlock { block: [ LitePipeline { commands: [ Command(None, LiteCommand { comments: [], parts: [Span { start: 138600, end: 138602 }] }) ] }, LitePipeline { commands: [ Command(Some(Span { start: 138610, end: 138611 }), LiteCommand { comments: [Span { start: 138603, end: 138609 }], parts: [Span { start: 138612, end: 138618 }] }) ] } ] } ``` Which should contains only one `LitePipeline`, and the second `LitePipeline` is generated because of `Eol` lex token: ``` [ Token { contents: Item, span: Span { start: 138600, end: 138602 } }, Token { contents: Eol, span: Span { start: 138602, end: 138603 } }, // it generates the second LitePipeline Token { contents: Comment, span: Span { start: 138603, end: 138609 } }, Token { contents: Pipe, span: Span { start: 138610, end: 138611 } }, Token { contents: Item, span: Span { start: 138612, end: 138618 } } ] ``` To fix the issue, I remove the `Eol` token when we meet `Comment` right after `Eol`, then it will generate a good LiteBlock, and everything will work fine. ### After the fix: Token: ``` [ Token { contents: Item, span: Span { start: 138618, end: 138620 } }, Token { contents: Comment, span: Span { start: 138622, end: 138628 } }, Token { contents: Pipe, span: Span { start: 138629, end: 138630 } }, Token { contents: Item, span: Span { start: 138631, end: 138637 } } ] ``` LiteBlock: ``` LiteBlock { block: [ LitePipeline { commands: [ Command( None, LiteCommand { comments: [Span { start: 138622, end: 138628 }], parts: [Span { start: 138618, end: 138620 }] } ), Command( Some(Span { start: 138629, end: 138630 }), LiteCommand { comments: [], parts: [Span { start: 138631, end: 138637 }] })] }] } ``` <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect -A clippy::result_large_err` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass - `cargo run -- crates/nu-std/tests/run.nu` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->
2023-06-15 13:11:42 +02:00
);
// And this is a continuation of the previous line if previous line is a
// comment line (combined with EOL + Comment)
//
// Initially, the last one token is TokenContents::Pipe, we don't need to
// check it, so the beginning offset is 2.
let mut offset = 2;
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
while state.output.len() > offset {
let index = state.output.len() - offset;
if state.output[index].contents == TokenContents::Comment
&& state.output[index - 1].contents == TokenContents::Eol
allow comment in multiple line pipeline (#9436) <!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description - fixes: #5517 - fixes: #9250 For the following commands: ``` ls # | le | length ``` I found that it generates a bad lite parsing result: ``` LiteBlock { block: [ LitePipeline { commands: [ Command(None, LiteCommand { comments: [], parts: [Span { start: 138600, end: 138602 }] }) ] }, LitePipeline { commands: [ Command(Some(Span { start: 138610, end: 138611 }), LiteCommand { comments: [Span { start: 138603, end: 138609 }], parts: [Span { start: 138612, end: 138618 }] }) ] } ] } ``` Which should contains only one `LitePipeline`, and the second `LitePipeline` is generated because of `Eol` lex token: ``` [ Token { contents: Item, span: Span { start: 138600, end: 138602 } }, Token { contents: Eol, span: Span { start: 138602, end: 138603 } }, // it generates the second LitePipeline Token { contents: Comment, span: Span { start: 138603, end: 138609 } }, Token { contents: Pipe, span: Span { start: 138610, end: 138611 } }, Token { contents: Item, span: Span { start: 138612, end: 138618 } } ] ``` To fix the issue, I remove the `Eol` token when we meet `Comment` right after `Eol`, then it will generate a good LiteBlock, and everything will work fine. ### After the fix: Token: ``` [ Token { contents: Item, span: Span { start: 138618, end: 138620 } }, Token { contents: Comment, span: Span { start: 138622, end: 138628 } }, Token { contents: Pipe, span: Span { start: 138629, end: 138630 } }, Token { contents: Item, span: Span { start: 138631, end: 138637 } } ] ``` LiteBlock: ``` LiteBlock { block: [ LitePipeline { commands: [ Command( None, LiteCommand { comments: [Span { start: 138622, end: 138628 }], parts: [Span { start: 138618, end: 138620 }] } ), Command( Some(Span { start: 138629, end: 138630 }), LiteCommand { comments: [], parts: [Span { start: 138631, end: 138637 }] })] }] } ``` <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect -A clippy::result_large_err` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass - `cargo run -- crates/nu-std/tests/run.nu` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->
2023-06-15 13:11:42 +02:00
{
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.remove(index - 1);
allow comment in multiple line pipeline (#9436) <!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description - fixes: #5517 - fixes: #9250 For the following commands: ``` ls # | le | length ``` I found that it generates a bad lite parsing result: ``` LiteBlock { block: [ LitePipeline { commands: [ Command(None, LiteCommand { comments: [], parts: [Span { start: 138600, end: 138602 }] }) ] }, LitePipeline { commands: [ Command(Some(Span { start: 138610, end: 138611 }), LiteCommand { comments: [Span { start: 138603, end: 138609 }], parts: [Span { start: 138612, end: 138618 }] }) ] } ] } ``` Which should contains only one `LitePipeline`, and the second `LitePipeline` is generated because of `Eol` lex token: ``` [ Token { contents: Item, span: Span { start: 138600, end: 138602 } }, Token { contents: Eol, span: Span { start: 138602, end: 138603 } }, // it generates the second LitePipeline Token { contents: Comment, span: Span { start: 138603, end: 138609 } }, Token { contents: Pipe, span: Span { start: 138610, end: 138611 } }, Token { contents: Item, span: Span { start: 138612, end: 138618 } } ] ``` To fix the issue, I remove the `Eol` token when we meet `Comment` right after `Eol`, then it will generate a good LiteBlock, and everything will work fine. ### After the fix: Token: ``` [ Token { contents: Item, span: Span { start: 138618, end: 138620 } }, Token { contents: Comment, span: Span { start: 138622, end: 138628 } }, Token { contents: Pipe, span: Span { start: 138629, end: 138630 } }, Token { contents: Item, span: Span { start: 138631, end: 138637 } } ] ``` LiteBlock: ``` LiteBlock { block: [ LitePipeline { commands: [ Command( None, LiteCommand { comments: [Span { start: 138622, end: 138628 }], parts: [Span { start: 138618, end: 138620 }] } ), Command( Some(Span { start: 138629, end: 138630 }), LiteCommand { comments: [], parts: [Span { start: 138631, end: 138637 }] })] }] } ``` <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect -A clippy::result_large_err` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass - `cargo run -- crates/nu-std/tests/run.nu` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->
2023-06-15 13:11:42 +02:00
offset += 1;
} else {
break;
}
}
}
_ => {
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(Token::new(
TokenContents::Pipe,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + idx, state.span_offset + idx + 1),
));
}
}
} else {
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(Token::new(
TokenContents::Pipe,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + idx, state.span_offset + idx + 1),
));
}
2021-06-30 03:42:56 +02:00
is_complete = false;
} else if c == b';' {
// If the next character is a `;`, we're looking at a semicolon token.
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
if !is_complete && state.error.is_none() {
state.error = Some(ParseError::ExtraTokens(Span::new(
2021-06-30 03:42:56 +02:00
curr_offset,
curr_offset + 1,
)));
}
let idx = curr_offset;
curr_offset += 1;
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(Token::new(
2021-06-30 03:42:56 +02:00
TokenContents::Semicolon,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + idx, state.span_offset + idx + 1),
2021-06-30 03:42:56 +02:00
));
} else if c == b'\r' {
// Ignore a stand-alone carriage return
curr_offset += 1;
} else if c == b'\n' {
2021-06-30 03:42:56 +02:00
// If the next character is a newline, we're looking at an EOL (end of line) token.
let idx = curr_offset;
curr_offset += 1;
2021-07-17 00:11:15 +02:00
if !additional_whitespace.contains(&c) {
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(Token::new(
TokenContents::Eol,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + idx, state.span_offset + idx + 1),
));
2021-06-30 03:42:56 +02:00
}
} else if c == b'#' {
// If the next character is `#`, we're at the beginning of a line
// comment. The comment continues until the next newline.
let mut start = curr_offset;
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
while let Some(input) = state.input.get(curr_offset) {
if *input == b'\n' {
2021-11-21 19:13:09 +01:00
if !skip_comment {
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(Token::new(
2021-11-21 19:13:09 +01:00
TokenContents::Comment,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + start, state.span_offset + curr_offset),
2021-11-21 19:13:09 +01:00
));
}
2021-06-30 03:42:56 +02:00
start = curr_offset;
break;
} else {
curr_offset += 1;
2021-06-30 03:42:56 +02:00
}
}
2021-11-21 19:13:09 +01:00
if start != curr_offset && !skip_comment {
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(Token::new(
2021-06-30 03:42:56 +02:00
TokenContents::Comment,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
Span::new(state.span_offset + start, state.span_offset + curr_offset),
2021-06-30 03:42:56 +02:00
));
}
2021-07-17 00:11:15 +02:00
} else if c == b' ' || c == b'\t' || additional_whitespace.contains(&c) {
2021-06-30 03:42:56 +02:00
// If the next character is non-newline whitespace, skip it.
curr_offset += 1;
} else {
let (token, err) = lex_item(
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.input,
2021-07-17 00:11:15 +02:00
&mut curr_offset,
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.span_offset,
2021-07-17 00:11:15 +02:00
additional_whitespace,
special_tokens,
allow lists to have type annotations (#8529) this pr refines #8270 and closes #8109 # description examples: the original syntax is okay ```nu def okay [nums: list] {} # the type of list will be list<any> ``` empty annotations are allowed in any variation the last two may be caught by a future formatter, but do not affect `nu` code currently ```nu def okay [nums: list<>] {} # okay def okay [nums: list< >] {} # weird but also okay def okay [nums: list< >] {} # also weird but okay ``` types are allowed (See [notes](#notes) below) ```nu def okay [nums: list<int>] {} # `test [a b c]` will throw an error def okay [nums: list< int > {} # any amount of space within the angle brackets is okay def err [nums: list <int>] {} # this is not okay, `nums` and `<int>` will be parsed as # two separate params, ``` nested annotations are allowed in many variations ```nu def okay [items: list<list<int>>] {} def okay [items: list<list>] {} ``` any unterminated annotation is caught ```nu Error: nu::parser::unexpected_eof × Unexpected end of code. ╭─[source:1:1] 1 │ def err [nums: list<int] {} · ▲ · ╰── expected closing > ╰──── ``` unknown types are flagged ```nu Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<str>] {} · ─┬─ · ╰── unknown type ╰──── Error: nu::parser::unknown_type × Unknown type. ╭─[source:1:1] 1 │ def err [nums: list<int, string>] {} · ─────┬───── · ╰── unknown type ╰──── ``` # notes the error message for mismatched types in not as intuitive ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ┬ · ╰── expected int ╰──── ``` it should be something like this ```nu Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[source:1:1] 1 │ def err [nums: list<int>] {}; err [a b c] · ──┬── · ╰── expected list<int> ╰──── ``` this is currently not implemented
2023-03-24 12:54:06 +01:00
in_signature,
2021-07-17 00:11:15 +02:00
);
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
if state.error.is_none() {
state.error = err;
2021-06-30 03:42:56 +02:00
}
is_complete = true;
Fix parsing record values containing colons (#13413) This PR is an attempt to fix #8257 and fix #10985 (which is duplicate-ish) # Description The parser currently doesn't know how to deal with colons appearing while lexing whitespace-terminated tokens specifying a record value. Most notably, this means you can't use datetime literals in record value position (and as a consequence, `| to nuon | from nuon` roundtrips can fail), but it also means that bare words containing colons cause a non-useful error message. ![image](https://github.com/user-attachments/assets/f04a8417-ee18-44e7-90eb-a0ecef943a0f) `parser::parse_record` calls `lex::lex` with the `:` colon character in the `special_tokens` argument. This allows colons to terminate record keys, but as a side effect, it also causes colons to terminate record *values*. I added a new function `lex::lex_n_tokens`, which allows the caller to drive the lexing process more explicitly, and used it in `parser::parse_record` to let colons terminate record keys while not giving them special treatment when appearing in record values. This PR description previously said: *Another approach suggested in one of the issues was to support an additional datetime literal format that doesn't require colons. I like that that wouldn't require new `lex::lex_internal` behaviour, but an advantage of my approach is that it also newly allows for string record values given as bare words containing colons. I think this eliminates another possible source of confusion.* It was determined that this is undesirable, and in the current state of this PR, bare word record values with colons are rejected explicitly. The better error message is still a win. # User-Facing Changes In addition to the above, this PR also disables the use of "special" (non-item) tokens in record key and value position, and the use of a single bare `:` as a record key. Examples of behaviour *before* this PR: ```nu { a: b } # Valid, same as { 'a': 'b' } { a: b:c } # Error: expected ':' { a: 2024-08-13T22:11:09 } # Error: expected ':' { :: 1 } # Valid, same as { ':': 1 } { ;: 1 } # Valid, same as { ';': 1 } { a: || } # Valid, same as { 'a': '||' } ``` Examples of behaviour *after* this PR: ```nu { a: b } # (Unchanged) Valid, same as { 'a': 'b' } { a: b:c } # Error: colon in bare word specifying record value { a: 2024-08-13T22:11:09 } # Valid, same as { a: (2024-08-13T22:11:09) } { :: 1 } # Error: colon in bare word specifying record key { ;: 1 } # Error: expected item in record key position { a: || } # Error: expected item in record value position ``` # Tests + Formatting I added tests, but I'm not sure if they're sufficient and in the right place. # After Submitting I don't think documentation changes are needed for this, but please let me know if you disagree.
2024-08-28 22:53:56 +02:00
state.output.push(token);
2021-06-30 03:42:56 +02:00
}
}
}
/// True if this the start of a redirection. Does not match `>>` or `>|` forms.
fn is_redirection(token: &[u8]) -> bool {
matches!(
token,
b"o>" | b"out>" | b"e>" | b"err>" | b"o+e>" | b"e+o>" | b"out+err>" | b"err+out>"
)
}