Prevent cubic time on nested parentheses (#10467)

<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->

When parse_range get an item like ((((1..2)))) it would try to parse
"((((1" with a long chain of recursive parsers, namely:
- parse_value
- parse_paren_expr
- parse_full_cell_path
- parse_block
- parse_pipeline
- parse_builtin_commands
- parse_expression
- parse_math_expression
- parse_value
- ...

where `parse_paren_expr` calls `parse_range` in turn. Because at any
time in the chain `parse_paren_expr` can call `parse_range`, which will
then continue the chain, we get quadratic number of function calls, each
linear on the size of the input

By checking with the lexer that the parens are matched, we prevent the
long chain from being called on unmatched braces. Now, this is still
more quadratic than it needs to be, to fix that, we should process
parens only once, instead of on each recursive call

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
Speed improvements in some edge cases

# Tests + Formatting
Not sure how to test this, maybe I could add a benchmark
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

# Other notes
Found using the fuzzer, by setting a timeout on max run-time. It also
found a stack-overflow on too many parentheses, which this doesn't fix.
This commit is contained in:
Andreas Källberg 2023-09-22 18:24:35 +02:00 committed by GitHub
parent 4880721b73
commit 6df001f72d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1425,6 +1425,21 @@ pub fn parse_range(working_set: &mut StateWorkingSet, span: Span) -> Expression
return garbage(span); return garbage(span);
} }
}; };
// Avoid calling sub-parsers on unmatched parens, to prevent quadratic time on things like ((((1..2))))
// No need to call the expensive parse_value on "((((1"
if dotdot_pos[0] > 0 {
let (_tokens, err) = lex(
&contents[..dotdot_pos[0]],
span.start,
&[],
&[b'.', b'?'],
true,
);
if let Some(_err) = err {
working_set.error(ParseError::Expected("Valid expression before ..", span));
return garbage(span);
}
}
let (inclusion, range_op_str, range_op_span) = if let Some(pos) = token.find("..<") { let (inclusion, range_op_str, range_op_span) = if let Some(pos) = token.find("..<") {
if pos == range_op_pos { if pos == range_op_pos {