Fix exponential parser time on sequence of [[[[ (#10439)

<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->

Before this change, parsing `[[[[[[[[[[[[[[[[[[[[[[` would cause nushell
to consume several gigabytes of memory, now it should be linear in time.

The old code first tried parsing the head of the table as a list and
then after that it checked if it got more arguments. If it didn't, it
throws away the previous result and tries to parse the whole thing as a
list, which means we call `parse_list_expression` twice for each call to
`parse_table_expression`, resulting in the exponential growth

The fix is to simply check that we have all the arguments we need before
parsing the head of the table, so we know that we will either call
parse_list_expression only on sub-expressions or on the whole thing,
never both.

Fixes #10438


# User-Facing Changes
Should give a noticable speedup when typing a sequence of `[[[[[[` open
brackets
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting

I would like to add tests, but I'm not sure how to do that without
crashing CI with OOM on regression

- [x] Don't forget to add tests that cover your changes.
- [x] `cargo fmt --all -- --check` to check standard code formatting
(`cargo fmt --all` applies these changes)
- [x] `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used`
to check that you're using the standard code style
- [x] `cargo test --workspace` to check that all tests pass (on Windows
make sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- [x] `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library
<!--
> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
This commit is contained in:
Andreas Källberg 2023-09-20 17:53:48 +02:00 committed by GitHub
parent 7980ad9f7f
commit 8d8b44342b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 14 additions and 18 deletions

View File

@ -4004,29 +4004,18 @@ fn parse_table_expression(working_set: &mut StateWorkingSet, span: Span) -> Expr
working_set.error(err);
}
let head = if let Some(first) = tokens.first() {
if working_set.get_span_contents(first.span).starts_with(b"[") {
parse_list_expression(working_set, first.span, &SyntaxShape::Any)
} else {
return parse_list_expression(working_set, span, &SyntaxShape::Any);
}
} else {
// Check that we have all arguments first, before trying to parse the first
// in order to avoid exponential parsing time
let [first, second, rest @ ..] = &tokens[..] else {
return parse_list_expression(working_set, span, &SyntaxShape::Any);
};
if tokens
.get(1)
.filter(|second| second.contents == TokenContents::Semicolon)
.is_none()
if !working_set.get_span_contents(first.span).starts_with(b"[")
|| second.contents != TokenContents::Semicolon
|| rest.is_empty()
{
return parse_list_expression(working_set, span, &SyntaxShape::Any);
};
let rest = &tokens[2..];
if rest.is_empty() {
return parse_list_expression(working_set, span, &SyntaxShape::Any);
}
let head = parse_list_expression(working_set, first.span, &SyntaxShape::Any);
let head = {
let Expression {
expr: Expr::List(vals),

View File

@ -455,6 +455,13 @@ fn single_value_row_condition() -> TestResult {
)
}
#[test]
fn performance_nested_lists() -> TestResult {
// Parser used to be exponential on deeply nested lists
// TODO: Add a timeout
fail_test(r#"[[[[[[[[[[[[[[[[[[[[[[[[[[[["#, "Unexpected end of code")
}
#[test]
fn unary_not_1() -> TestResult {
run_test(r#"not false"#, "true")