Overhaul $in expressions (#13357)

# Description

This grew quite a bit beyond its original scope, but I've tried to make
`$in` a bit more consistent and easier to work with.

Instead of the parser generating calls to `collect` and creating
closures, this adds `Expr::Collect` which just evaluates in the same
scope and doesn't require any closure.

When `$in` is detected in an expression, it is replaced with a new
variable (also called `$in`) and wrapped in `Expr::Collect`. During
eval, this expression is evaluated directly, with the input and with
that new variable set to the collected value.

Other than being faster and less prone to gotchas, it also makes it
possible to typecheck the output of an expression containing `$in`,
which is nice. This is a breaking change though, because of the lack of
the closure and because now typechecking will actually happen. Also, I
haven't attempted to typecheck the input yet.

The IR generated now just looks like this:

```gas
collect        %in
clone          %tmp, %in
store-variable $in, %tmp
# %out <- ...expression... <- %in
drop-variable  $in
```

(where `$in` is the local variable created for this collection, and not
`IN_VARIABLE_ID`)

which is a lot better than having to create a closure and call `collect
--keep-env`, dealing with all of the capture gathering and allocation
that entails. Ideally we can also detect whether that input is actually
needed, so maybe we don't have to clone, but I haven't tried to do that
yet. Theoretically now that the variable is a unique one every time, it
should be possible to give it a type - I just don't know how to
determine that yet.

On top of that, I've also reworked how `$in` works in pipeline-initial
position. Previously, it was a little bit inconsistent. For example,
this worked:

```nushell
> 3 | do { let x = $in; let y = $in; print $x $y }
3
3
```

However, this causes a runtime variable not found error on the second
`$in`:

```nushell
> def foo [] { let x = $in; let y = $in; print $x $y }; 3 | foo
Error: nu:🐚:variable_not_found

  × Variable not found
   ╭─[entry #115:1:35]
 1 │ def foo [] { let x = $in; let y = $in; print $x $y }; 3 | foo
   ·                                   ─┬─
   ·                                    ╰── variable not found
   ╰────
```

I've fixed this by making the first element `$in` detection *always*
happen at the block level, so if you use `$in` in pipeline-initial
position anywhere in a block, it will collect with an implicit
subexpression around the whole thing, and you can then use that `$in`
more than once. In doing this I also rewrote `parse_pipeline()` and
hopefully it's a bit more straightforward and possibly more efficient
too now.

Finally, I've tried to make `let` and `mut` a lot more straightforward
with how they handle the rest of the pipeline, and using a redirection
with `let`/`mut` now does what you'd expect if you assume that they
consume the whole pipeline - the redirection is just processed as
normal. These both work now:

```nushell
let x = ^foo err> err.txt
let y = ^foo out+err>| str length
```

It was previously possible to accomplish this with a subexpression, but
it just seemed like a weird gotcha that you couldn't do it. Intuitively,
`let` and `mut` just seem to take the whole line.

- closes #13137

# User-Facing Changes
- `$in` will behave more consistently with blocks and closures, since
the entire block is now just wrapped to handle it if it appears in the
first pipeline element
- `$in` no longer creates a closure, so what can be done within an
expression containing `$in` is less restrictive
- `$in` containing expressions are now type checked, rather than just
resulting in `any`. However, `$in` itself is still `any`, so this isn't
quite perfect yet
- Redirections are now allowed in `let` and `mut` and behave pretty much
how you'd expect

# Tests + Formatting
Added tests to cover the new behaviour.

# After Submitting
- [ ] release notes (definitely breaking change)
This commit is contained in:
Devyn Cairns
2024-07-17 14:02:42 -07:00
committed by GitHub
parent f976c31887
commit aa7d7d0cc3
24 changed files with 631 additions and 273 deletions

View File

@ -5299,6 +5299,7 @@ pub fn parse_expression(working_set: &mut StateWorkingSet, spans: &[Span]) -> Ex
let mut block = Block::default();
let ty = output.ty.clone();
block.pipelines = vec![Pipeline::from_vec(vec![output])];
block.span = Some(Span::concat(spans));
compile_block(working_set, &mut block);
@ -5393,9 +5394,19 @@ pub fn parse_builtin_commands(
match name {
b"def" => parse_def(working_set, lite_command, None).0,
b"extern" => parse_extern(working_set, lite_command, None),
b"let" => parse_let(working_set, &lite_command.parts),
b"let" => parse_let(
working_set,
&lite_command
.parts_including_redirection()
.collect::<Vec<Span>>(),
),
b"const" => parse_const(working_set, &lite_command.parts),
b"mut" => parse_mut(working_set, &lite_command.parts),
b"mut" => parse_mut(
working_set,
&lite_command
.parts_including_redirection()
.collect::<Vec<Span>>(),
),
b"for" => {
let expr = parse_for(working_set, lite_command);
Pipeline::from_vec(vec![expr])
@ -5647,169 +5658,73 @@ pub(crate) fn redirecting_builtin_error(
}
}
pub fn parse_pipeline(
working_set: &mut StateWorkingSet,
pipeline: &LitePipeline,
is_subexpression: bool,
pipeline_index: usize,
) -> Pipeline {
pub fn parse_pipeline(working_set: &mut StateWorkingSet, pipeline: &LitePipeline) -> Pipeline {
let first_command = pipeline.commands.first();
let first_command_name = first_command
.and_then(|command| command.parts.first())
.map(|span| working_set.get_span_contents(*span));
if pipeline.commands.len() > 1 {
// Special case: allow `let` and `mut` to consume the whole pipeline, eg) `let abc = "foo" | str length`
if let Some(&first) = pipeline.commands[0].parts.first() {
let first = working_set.get_span_contents(first);
if first == b"let" || first == b"mut" {
let name = if first == b"let" { "let" } else { "mut" };
let mut new_command = LiteCommand {
comments: vec![],
parts: pipeline.commands[0].parts.clone(),
pipe: None,
redirection: None,
};
// Special case: allow "let" or "mut" to consume the whole pipeline, if this is a pipeline
// with multiple commands
if matches!(first_command_name, Some(b"let" | b"mut")) {
// Merge the pipeline into one command
let first_command = first_command.expect("must be Some");
if let Some(redirection) = pipeline.commands[0].redirection.as_ref() {
working_set.error(redirecting_builtin_error(name, redirection));
}
let remainder_span = first_command
.parts_including_redirection()
.skip(3)
.chain(
pipeline.commands[1..]
.iter()
.flat_map(|command| command.parts_including_redirection()),
)
.reduce(Span::append);
for element in &pipeline.commands[1..] {
if let Some(redirection) = pipeline.commands[0].redirection.as_ref() {
working_set.error(redirecting_builtin_error(name, redirection));
} else {
new_command.parts.push(element.pipe.expect("pipe span"));
new_command.comments.extend_from_slice(&element.comments);
new_command.parts.extend_from_slice(&element.parts);
}
}
let parts = first_command
.parts
.iter()
.take(3) // the let/mut start itself
.copied()
.chain(remainder_span) // everything else
.collect();
// if the 'let' is complete enough, use it, if not, fall through for now
if new_command.parts.len() > 3 {
let rhs_span = Span::concat(&new_command.parts[3..]);
let comments = pipeline
.commands
.iter()
.flat_map(|command| command.comments.iter())
.copied()
.collect();
new_command.parts.truncate(3);
new_command.parts.push(rhs_span);
let mut pipeline = parse_builtin_commands(working_set, &new_command);
if pipeline_index == 0 {
let let_decl_id = working_set.find_decl(b"let");
let mut_decl_id = working_set.find_decl(b"mut");
for element in pipeline.elements.iter_mut() {
if let Expr::Call(call) = &element.expr.expr {
if Some(call.decl_id) == let_decl_id
|| Some(call.decl_id) == mut_decl_id
{
// Do an expansion
if let Some(Expression {
expr: Expr::Block(block_id),
..
}) = call.positional_iter().nth(1)
{
let block = working_set.get_block(*block_id);
if let Some(element) = block
.pipelines
.first()
.and_then(|p| p.elements.first())
.cloned()
{
if element.has_in_variable(working_set) {
let element = wrap_element_with_collect(
working_set,
&element,
);
let block = working_set.get_block_mut(*block_id);
block.pipelines[0].elements[0] = element;
}
}
}
continue;
} else if element.has_in_variable(working_set) && !is_subexpression
{
*element = wrap_element_with_collect(working_set, element);
}
} else if element.has_in_variable(working_set) && !is_subexpression {
*element = wrap_element_with_collect(working_set, element);
}
}
}
return pipeline;
}
}
}
let mut elements = pipeline
.commands
.iter()
.map(|element| parse_pipeline_element(working_set, element))
.collect::<Vec<_>>();
if is_subexpression {
for element in elements.iter_mut().skip(1) {
if element.has_in_variable(working_set) {
*element = wrap_element_with_collect(working_set, element);
}
}
let new_command = LiteCommand {
pipe: None,
comments,
parts,
redirection: None,
};
parse_builtin_commands(working_set, &new_command)
} else {
for element in elements.iter_mut() {
if element.has_in_variable(working_set) {
*element = wrap_element_with_collect(working_set, element);
}
}
}
Pipeline { elements }
} else {
if let Some(&first) = pipeline.commands[0].parts.first() {
let first = working_set.get_span_contents(first);
if first == b"let" || first == b"mut" {
if let Some(redirection) = pipeline.commands[0].redirection.as_ref() {
let name = if first == b"let" { "let" } else { "mut" };
working_set.error(redirecting_builtin_error(name, redirection));
}
}
}
let mut pipeline = parse_builtin_commands(working_set, &pipeline.commands[0]);
let let_decl_id = working_set.find_decl(b"let");
let mut_decl_id = working_set.find_decl(b"mut");
if pipeline_index == 0 {
for element in pipeline.elements.iter_mut() {
if let Expr::Call(call) = &element.expr.expr {
if Some(call.decl_id) == let_decl_id || Some(call.decl_id) == mut_decl_id {
// Do an expansion
if let Some(Expression {
expr: Expr::Block(block_id),
..
}) = call.positional_iter().nth(1)
{
let block = working_set.get_block(*block_id);
if let Some(element) = block
.pipelines
.first()
.and_then(|p| p.elements.first())
.cloned()
{
if element.has_in_variable(working_set) {
let element = wrap_element_with_collect(working_set, &element);
let block = working_set.get_block_mut(*block_id);
block.pipelines[0].elements[0] = element;
}
}
}
continue;
} else if element.has_in_variable(working_set) && !is_subexpression {
*element = wrap_element_with_collect(working_set, element);
// Parse a normal multi command pipeline
let elements: Vec<_> = pipeline
.commands
.iter()
.enumerate()
.map(|(index, element)| {
let element = parse_pipeline_element(working_set, element);
// Handle $in for pipeline elements beyond the first one
if index > 0 && element.has_in_variable(working_set) {
wrap_element_with_collect(working_set, element.clone())
} else {
element
}
} else if element.has_in_variable(working_set) && !is_subexpression {
*element = wrap_element_with_collect(working_set, element);
}
}
}
})
.collect();
pipeline
Pipeline { elements }
}
} else {
// If there's only one command in the pipeline, this could be a builtin command
parse_builtin_commands(working_set, &pipeline.commands[0])
}
}
@ -5840,18 +5755,45 @@ pub fn parse_block(
}
let mut block = Block::new_with_capacity(lite_block.block.len());
block.span = Some(span);
for (idx, lite_pipeline) in lite_block.block.iter().enumerate() {
let pipeline = parse_pipeline(working_set, lite_pipeline, is_subexpression, idx);
for lite_pipeline in &lite_block.block {
let pipeline = parse_pipeline(working_set, lite_pipeline);
block.pipelines.push(pipeline);
}
// If this is not a subexpression and there are any pipelines where the first element has $in,
// we can wrap the whole block in collect so that they all reference the same $in
if !is_subexpression
&& block
.pipelines
.iter()
.flat_map(|pipeline| pipeline.elements.first())
.any(|element| element.has_in_variable(working_set))
{
// Move the block out to prepare it to become a subexpression
let inner_block = std::mem::take(&mut block);
block.span = inner_block.span;
let ty = inner_block.output_type();
let block_id = working_set.add_block(Arc::new(inner_block));
// Now wrap it in a Collect expression, and put it in the block as the only pipeline
let subexpression = Expression::new(working_set, Expr::Subexpression(block_id), span, ty);
let collect = wrap_expr_with_collect(working_set, subexpression);
block.pipelines.push(Pipeline {
elements: vec![PipelineElement {
pipe: None,
expr: collect,
redirection: None,
}],
});
}
if scoped {
working_set.exit_scope();
}
block.span = Some(span);
let errors = type_check::check_block_input_output(working_set, &block);
if !errors.is_empty() {
working_set.parse_errors.extend_from_slice(&errors);
@ -6220,6 +6162,10 @@ pub fn discover_captures_in_expr(
discover_captures_in_expr(working_set, &match_.1, seen, seen_blocks, output)?;
}
}
Expr::Collect(var_id, expr) => {
seen.push(*var_id);
discover_captures_in_expr(working_set, expr, seen, seen_blocks, output)?
}
Expr::RowCondition(block_id) | Expr::Subexpression(block_id) => {
let block = working_set.get_block(*block_id);
@ -6270,28 +6216,28 @@ pub fn discover_captures_in_expr(
fn wrap_redirection_with_collect(
working_set: &mut StateWorkingSet,
target: &RedirectionTarget,
target: RedirectionTarget,
) -> RedirectionTarget {
match target {
RedirectionTarget::File { expr, append, span } => RedirectionTarget::File {
expr: wrap_expr_with_collect(working_set, expr),
span: *span,
append: *append,
span,
append,
},
RedirectionTarget::Pipe { span } => RedirectionTarget::Pipe { span: *span },
RedirectionTarget::Pipe { span } => RedirectionTarget::Pipe { span },
}
}
fn wrap_element_with_collect(
working_set: &mut StateWorkingSet,
element: &PipelineElement,
element: PipelineElement,
) -> PipelineElement {
PipelineElement {
pipe: element.pipe,
expr: wrap_expr_with_collect(working_set, &element.expr),
redirection: element.redirection.as_ref().map(|r| match r {
expr: wrap_expr_with_collect(working_set, element.expr),
redirection: element.redirection.map(|r| match r {
PipelineRedirection::Single { source, target } => PipelineRedirection::Single {
source: *source,
source,
target: wrap_redirection_with_collect(working_set, target),
},
PipelineRedirection::Separate { out, err } => PipelineRedirection::Separate {
@ -6302,65 +6248,24 @@ fn wrap_element_with_collect(
}
}
fn wrap_expr_with_collect(working_set: &mut StateWorkingSet, expr: &Expression) -> Expression {
fn wrap_expr_with_collect(working_set: &mut StateWorkingSet, expr: Expression) -> Expression {
let span = expr.span;
if let Some(decl_id) = working_set.find_decl(b"collect") {
let mut output = vec![];
// IN_VARIABLE_ID should get replaced with a unique variable, so that we don't have to
// execute as a closure
let var_id = working_set.add_variable(b"$in".into(), expr.span, Type::Any, false);
let mut expr = expr.clone();
expr.replace_in_variable(working_set, var_id);
let var_id = IN_VARIABLE_ID;
let mut signature = Signature::new("");
signature.required_positional.push(PositionalArg {
var_id: Some(var_id),
name: "$in".into(),
desc: String::new(),
shape: SyntaxShape::Any,
default_value: None,
});
let mut block = Block {
pipelines: vec![Pipeline::from_vec(vec![expr.clone()])],
signature: Box::new(signature),
..Default::default()
};
compile_block(working_set, &mut block);
let block_id = working_set.add_block(Arc::new(block));
output.push(Argument::Positional(Expression::new(
working_set,
Expr::Closure(block_id),
span,
Type::Any,
)));
output.push(Argument::Named((
Spanned {
item: "keep-env".to_string(),
span: Span::new(0, 0),
},
None,
None,
)));
// The containing, synthetic call to `collect`.
// We don't want to have a real span as it will confuse flattening
// The args are where we'll get the real info
Expression::new(
working_set,
Expr::Call(Box::new(Call {
head: Span::new(0, 0),
arguments: output,
decl_id,
parser_info: HashMap::new(),
})),
span,
Type::Any,
)
} else {
Expression::garbage(working_set, span)
}
// Bind the custom `$in` variable for that particular expression
let ty = expr.ty.clone();
Expression::new(
working_set,
Expr::Collect(var_id, Box::new(expr)),
span,
// We can expect it to have the same result type
ty,
)
}
// Parses a vector of u8 to create an AST Block. If a file name is given, then