Debugger experiments (#11441)

# Description  This PR adds a new evaluator path with callbacks to a mutable trait object implementing a Debugger trait. The trait object can do anything, e.g., profiling, code coverage, step debugging. Currently, entering/leaving a block and a pipeline element is marked with callbacks, but more callbacks can be added as necessary. Not all callbacks need to be used by all debuggers; unused ones are simply empty calls. A simple profiler is implemented as a proof of concept. The debugging support is implementing by making `eval_xxx()` functions generic depending on whether we're debugging or not. This has zero computational overhead, but makes the binary slightly larger (see benchmarks below). `eval_xxx()` variants called from commands (like `eval_block_with_early_return()` in `each`) are chosen with a dynamic dispatch for two reasons: to not grow the binary size due to duplicating the code of many commands, and for the fact that it isn't possible because it would make Command trait objects object-unsafe. In the future, I hope it will be possible to allow plugin callbacks such that users would be able to implement their profiler plugins instead of having to recompile Nushell. [DAP](https://microsoft.github.io/debug-adapter-protocol/) would also be interesting to explore. Try `help debug profile`. ## Screenshots Basic output: ![profiler_new](https://github.com/nushell/nushell/assets/25571562/418b9df0-b659-4dcb-b023-2d5fcef2c865) To profile with more granularity, increase the profiler depth (you'll see that repeated `is-windows` calls take a large chunk of total time, making it a good candidate for optimizing): ![profiler_new_m3](https://github.com/nushell/nushell/assets/25571562/636d756d-5d56-460c-a372-14716f65f37f) ## Benchmarks ### Binary size Binary size increase vs. main: **+40360 bytes**. _(Both built with `--release --features=extra,dataframe`.)_ ### Time ```nushell # bench_debug.nu use std bench let test = { 1..100 | each { ls | each {|row| $row.name | str length } } | flatten | math avg } print 'debug:' let res2 = bench { debug profile $test } --pretty print $res2 ``` ```nushell # bench_nodebug.nu use std bench let test = { 1..100 | each { ls | each {|row| $row.name | str length } } | flatten | math avg } print 'no debug:' let res1 = bench { do $test } --pretty print $res1 ``` `cargo run --release -- bench_debug.nu` is consistently 1--2 ms slower than `cargo run --release -- bench_nodebug.nu` due to the collection overhead + gathering the report. This is expected. When gathering more stuff, the overhead is obviously higher. `cargo run --release -- bench_nodebug.nu` vs. `nu bench_nodebug.nu` I didn't measure any difference. Both benchmarks report times between 97 and 103 ms randomly, without one being consistently higher than the other. This suggests that at least in this particular case, when not running any debugger, there is no runtime overhead. ## API changes This PR adds a generic parameter to all `eval_xxx` functions that forces you to specify whether you use the debugger. You can resolve it in two ways: * Use a provided helper that will figure it out for you. If you wanted to use `eval_block(&engine_state, ...)`, call `let eval_block = get_eval_block(&engine_state); eval_block(&engine_state, ...)` * If you know you're in an evaluation path that doesn't need debugger support, call `eval_block::<WithoutDebug>(&engine_state, ...)` (this is the case of hooks, for example). I tried to add more explanation in the docstring of `debugger_trait.rs`. ## TODO - [x] Better profiler output to reduce spam of iterative commands like `each` - [x] Resolve `TODO: DEBUG` comments - [x] Resolve unwraps - [x] Add doc comments - [x] Add usage and extra usage for `debug profile`, explaining all columns # User-Facing Changes  Hopefully none. # Tests + Formatting  # After Submitting
2025-08-10 12:58:11 +02:00 · 2024-03-08 20:21:35 +02:00
parent a9ddc58f21
commit 14d1c67863
81 changed files with 1297 additions and 205 deletions
--- a/crates/nu-engine/src/eval.rs
+++ b/crates/nu-engine/src/eval.rs
@ -1,5 +1,6 @@
 use crate::{current_dir_str, get_config, get_full_help};
 use nu_path::expand_path_with;
+use nu_protocol::debugger::{DebugContext, WithoutDebug};
 use nu_protocol::{
    ast::{
        Argument, Assignment, Block, Call, Expr, Expression, ExternalArgument, PathMember,
@ -10,10 +11,11 @@ use nu_protocol::{
    Config, DeclId, IntoPipelineData, IntoSpanned, PipelineData, RawStream, ShellError, Span,
    Spanned, Type, Value, VarId, ENV_VARIABLE_ID,
 };
+
 use std::thread::{self, JoinHandle};
 use std::{borrow::Cow, collections::HashMap};

-pub fn eval_call(
+pub fn eval_call<D: DebugContext>(
    engine_state: &EngineState,
    caller_stack: &mut Stack,
    call: &Call,
@ -75,7 +77,7 @@ pub fn eval_call(
                .expect("internal error: all custom parameters must have var_ids");

            if let Some(arg) = call.positional_nth(param_idx) {
-                let result = eval_expression(engine_state, caller_stack, arg)?;
+                let result = eval_expression::<D>(engine_state, caller_stack, arg)?;
                let param_type = param.shape.to_type();
                if required && !result.get_type().is_subtype(&param_type) {
                    // need to check if result is an empty list, and param_type is table or list
@ -110,7 +112,7 @@ pub fn eval_call(
            for result in call.rest_iter_flattened(
                decl.signature().required_positional.len()
                    + decl.signature().optional_positional.len(),
-                |expr| eval_expression(engine_state, caller_stack, expr),
+                |expr| eval_expression::<D>(engine_state, caller_stack, expr),
            )? {
                rest_items.push(result);
            }
@ -136,7 +138,7 @@ pub fn eval_call(
                    if let (Some(spanned), Some(short)) = (&call_named.1, named.short) {
                        if spanned.item == short.to_string() {
                            if let Some(arg) = &call_named.2 {
-                                let result = eval_expression(engine_state, caller_stack, arg)?;
+                                let result = eval_expression::<D>(engine_state, caller_stack, arg)?;

                                callee_stack.add_var(var_id, result);
                            } else if let Some(value) = &named.default_value {
@ -148,7 +150,7 @@ pub fn eval_call(
                        }
                    } else if call_named.0.item == named.long {
                        if let Some(arg) = &call_named.2 {
-                            let result = eval_expression(engine_state, caller_stack, arg)?;
+                            let result = eval_expression::<D>(engine_state, caller_stack, arg)?;

                            callee_stack.add_var(var_id, result);
                        } else if let Some(value) = &named.default_value {
@ -172,7 +174,7 @@ pub fn eval_call(
            }
        }

-        let result = eval_block_with_early_return(
+        let result = eval_block_with_early_return::<D>(
            engine_state,
            &mut callee_stack,
            block,
@ -293,12 +295,12 @@ fn eval_external(
    command.run(engine_state, stack, &call, input)
 }

-pub fn eval_expression(
+pub fn eval_expression<D: DebugContext>(
    engine_state: &EngineState,
    stack: &mut Stack,
    expr: &Expression,
 ) -> Result<Value, ShellError> {
-    <EvalRuntime as Eval>::eval(engine_state, stack, expr)
+    <EvalRuntime as Eval>::eval::<D>(engine_state, stack, expr)
 }

 /// Checks the expression to see if it's a internal or external call. If so, passes the input
@ -307,7 +309,7 @@ pub fn eval_expression(
 ///
 /// It returns PipelineData with a boolean flag, indicating if the external failed to run.
 /// The boolean flag **may only be true** for external calls, for internal calls, it always to be false.
-pub fn eval_expression_with_input(
+pub fn eval_expression_with_input<D: DebugContext>(
    engine_state: &EngineState,
    stack: &mut Stack,
    expr: &Expression,
@ -325,9 +327,9 @@ pub fn eval_expression_with_input(
                let mut call = call.clone();
                call.redirect_stdout = redirect_stdout;
                call.redirect_stderr = redirect_stderr;
-                input = eval_call(engine_state, stack, &call, input)?;
+                input = eval_call::<D>(engine_state, stack, &call, input)?;
            } else {
-                input = eval_call(engine_state, stack, call, input)?;
+                input = eval_call::<D>(engine_state, stack, call, input)?;
            }
        }
        Expression {
@ -352,7 +354,7 @@ pub fn eval_expression_with_input(
            let block = engine_state.get_block(*block_id);

            // FIXME: protect this collect with ctrl-c
-            input = eval_subexpression(engine_state, stack, block, input)?;
+            input = eval_subexpression::<D>(engine_state, stack, block, input)?;
        }

        elem @ Expression {
@ -367,19 +369,19 @@ pub fn eval_expression_with_input(
                let block = engine_state.get_block(*block_id);

                // FIXME: protect this collect with ctrl-c
-                input = eval_subexpression(engine_state, stack, block, input)?;
+                input = eval_subexpression::<D>(engine_state, stack, block, input)?;
                let value = input.into_value(*span);
                input = value
                    .follow_cell_path(&full_cell_path.tail, false)?
                    .into_pipeline_data()
            }
            _ => {
-                input = eval_expression(engine_state, stack, elem)?.into_pipeline_data();
+                input = eval_expression::<D>(engine_state, stack, elem)?.into_pipeline_data();
            }
        },

        elem => {
-            input = eval_expression(engine_state, stack, elem)?.into_pipeline_data();
+            input = eval_expression::<D>(engine_state, stack, elem)?.into_pipeline_data();
        }
    };

@ -406,7 +408,7 @@ fn might_consume_external_result(input: PipelineData) -> (PipelineData, bool) {
 }

 #[allow(clippy::too_many_arguments)]
-fn eval_element_with_input(
+fn eval_element_with_input<D: DebugContext>(
    engine_state: &EngineState,
    stack: &mut Stack,
    element: &PipelineElement,
@ -416,7 +418,9 @@ fn eval_element_with_input(
    redirect_combine: bool,
    stderr_writer_jobs: &mut Vec<DataSaveJob>,
 ) -> Result<(PipelineData, bool), ShellError> {
-    match element {
+    D::enter_element(engine_state, element);
+
+    let result = match element {
        PipelineElement::Expression(pipe_span, expr)
        | PipelineElement::OutErrPipedExpression(pipe_span, expr) => {
            if matches!(element, PipelineElement::OutErrPipedExpression(..))
@ -446,7 +450,7 @@ fn eval_element_with_input(
                    )?;
                    Ok(might_consume_external_result(result))
                }
-                _ => eval_expression_with_input(
+                _ => eval_expression_with_input::<D>(
                    engine_state,
                    stack,
                    expr,
@ -483,7 +487,7 @@ fn eval_element_with_input(
                    })
                }
            };
-            eval_expression_with_input(
+            eval_expression_with_input::<D>(
                engine_state,
                stack,
                expr,
@ -514,7 +518,7 @@ fn eval_element_with_input(
                        );
                        match result_out_stream {
                            None => {
-                                eval_call(engine_state, stack, &save_call, input).map(|_| {
+                                eval_call::<D>(engine_state, stack, &save_call, input).map(|_| {
                                    // save is internal command, normally it exists with non-ExternalStream
                                    // but here in redirection context, we make it returns ExternalStream
                                    // So nu handles exit_code correctly
@ -602,7 +606,7 @@ fn eval_element_with_input(
                        Some((*err_span, err_expr.clone(), *err_append_mode)),
                    );

-                    eval_call(engine_state, stack, &save_call, input).map(|_| {
+                    eval_call::<D>(engine_state, stack, &save_call, input).map(|_| {
                        // save is internal command, normally it exists with non-ExternalStream
                        // but here in redirection context, we make it returns ExternalStream
                        // So nu handles exit_code correctly
@ -648,7 +652,7 @@ fn eval_element_with_input(
                }
                _ => {
                    // we need to redirect output, so the result can be saved and pass to `save` command.
-                    eval_element_with_input(
+                    eval_element_with_input::<D>(
                        engine_state,
                        stack,
                        &PipelineElement::Expression(*cmd_span, cmd_exp.clone()),
@ -661,7 +665,7 @@ fn eval_element_with_input(
                    .map(|x| x.0)?
                }
            };
-            eval_element_with_input(
+            eval_element_with_input::<D>(
                engine_state,
                stack,
                &PipelineElement::Redirection(
@ -677,7 +681,7 @@ fn eval_element_with_input(
                stderr_writer_jobs,
            )
        }
-        PipelineElement::And(_, expr) => eval_expression_with_input(
+        PipelineElement::And(_, expr) => eval_expression_with_input::<D>(
            engine_state,
            stack,
            expr,
@ -685,7 +689,7 @@ fn eval_element_with_input(
            redirect_stdout,
            redirect_stderr,
        ),
-        PipelineElement::Or(_, expr) => eval_expression_with_input(
+        PipelineElement::Or(_, expr) => eval_expression_with_input::<D>(
            engine_state,
            stack,
            expr,
@ -693,7 +697,11 @@ fn eval_element_with_input(
            redirect_stdout,
            redirect_stderr,
        ),
-    }
+    };
+
+    D::leave_element(engine_state, element, &result);
+
+    result
 }

 // In redirection context, if nushell gets an ExternalStream
@ -836,7 +844,7 @@ fn is_redirect_combine_required(elements: &[PipelineElement], idx: usize) -> boo
        )
 }

-pub fn eval_block_with_early_return(
+pub fn eval_block_with_early_return<D: DebugContext>(
    engine_state: &EngineState,
    stack: &mut Stack,
    block: &Block,
@ -844,7 +852,7 @@ pub fn eval_block_with_early_return(
    redirect_stdout: bool,
    redirect_stderr: bool,
 ) -> Result<PipelineData, ShellError> {
-    match eval_block(
+    match eval_block::<D>(
        engine_state,
        stack,
        block,
@ -857,7 +865,7 @@ pub fn eval_block_with_early_return(
    }
 }

-pub fn eval_block(
+pub fn eval_block<D: DebugContext>(
    engine_state: &EngineState,
    stack: &mut Stack,
    block: &Block,
@ -865,6 +873,8 @@ pub fn eval_block(
    redirect_stdout: bool,
    redirect_stderr: bool,
 ) -> Result<PipelineData, ShellError> {
+    D::enter_block(engine_state, block);
+
    let num_pipelines = block.len();

    for (pipeline_idx, pipeline) in block.pipelines.iter().enumerate() {
@ -891,7 +901,7 @@ pub fn eval_block(
            let redirect_combine = is_redirect_combine_required(elements, idx);

            // if eval internal command failed, it can just make early return with `Err(ShellError)`.
-            let eval_result = eval_element_with_input(
+            let eval_result = eval_element_with_input::<D>(
                engine_state,
                stack,
                element,
@ -962,16 +972,18 @@ pub fn eval_block(
        }
    }

+    D::leave_block(engine_state, block);
+
    Ok(input)
 }

-pub fn eval_subexpression(
+pub fn eval_subexpression<D: DebugContext>(
    engine_state: &EngineState,
    stack: &mut Stack,
    block: &Block,
    input: PipelineData,
 ) -> Result<PipelineData, ShellError> {
-    eval_block(engine_state, stack, block, input, true, false)
+    eval_block::<D>(engine_state, stack, block, input, true, false)
 }

 pub fn eval_variable(
@ -1096,7 +1108,8 @@ impl DataSaveJob {
            inner: thread::Builder::new()
                .name("stderr saver".to_string())
                .spawn(move || {
-                    let result = eval_call(&engine_state, &mut stack, &save_call, input);
+                    let result =
+                        eval_call::<WithoutDebug>(&engine_state, &mut stack, &save_call, input);
                    if let Err(err) = result {
                        eprintln!("WARNING: error occurred when redirect to stderr: {:?}", err);
                    }
@ -1166,14 +1179,14 @@ impl Eval for EvalRuntime {
        eval_variable(engine_state, stack, var_id, span)
    }

-    fn eval_call(
+    fn eval_call<D: DebugContext>(
        engine_state: &EngineState,
        stack: &mut Stack,
        call: &Call,
        _: Span,
    ) -> Result<Value, ShellError> {
        // FIXME: protect this collect with ctrl-c
-        Ok(eval_call(engine_state, stack, call, PipelineData::empty())?.into_value(call.head))
+        Ok(eval_call::<D>(engine_state, stack, call, PipelineData::empty())?.into_value(call.head))
    }

    fn eval_external_call(
@ -1198,7 +1211,7 @@ impl Eval for EvalRuntime {
        .into_value(span))
    }

-    fn eval_subexpression(
+    fn eval_subexpression<D: DebugContext>(
        engine_state: &EngineState,
        stack: &mut Stack,
        block_id: usize,
@ -1207,7 +1220,10 @@ impl Eval for EvalRuntime {
        let block = engine_state.get_block(block_id);

        // FIXME: protect this collect with ctrl-c
-        Ok(eval_subexpression(engine_state, stack, block, PipelineData::empty())?.into_value(span))
+        Ok(
+            eval_subexpression::<D>(engine_state, stack, block, PipelineData::empty())?
+                .into_value(span),
+        )
    }

    fn regex_match(
@ -1221,7 +1237,7 @@ impl Eval for EvalRuntime {
        lhs.regex_match(engine_state, op_span, rhs, invert, expr_span)
    }

-    fn eval_assignment(
+    fn eval_assignment<D: DebugContext>(
        engine_state: &EngineState,
        stack: &mut Stack,
        lhs: &Expression,
@ -1230,28 +1246,28 @@ impl Eval for EvalRuntime {
        op_span: Span,
        _expr_span: Span,
    ) -> Result<Value, ShellError> {
-        let rhs = eval_expression(engine_state, stack, rhs)?;
+        let rhs = eval_expression::<D>(engine_state, stack, rhs)?;

        let rhs = match assignment {
            Assignment::Assign => rhs,
            Assignment::PlusAssign => {
-                let lhs = eval_expression(engine_state, stack, lhs)?;
+                let lhs = eval_expression::<D>(engine_state, stack, lhs)?;
                lhs.add(op_span, &rhs, op_span)?
            }
            Assignment::MinusAssign => {
-                let lhs = eval_expression(engine_state, stack, lhs)?;
+                let lhs = eval_expression::<D>(engine_state, stack, lhs)?;
                lhs.sub(op_span, &rhs, op_span)?
            }
            Assignment::MultiplyAssign => {
-                let lhs = eval_expression(engine_state, stack, lhs)?;
+                let lhs = eval_expression::<D>(engine_state, stack, lhs)?;
                lhs.mul(op_span, &rhs, op_span)?
            }
            Assignment::DivideAssign => {
-                let lhs = eval_expression(engine_state, stack, lhs)?;
+                let lhs = eval_expression::<D>(engine_state, stack, lhs)?;
                lhs.div(op_span, &rhs, op_span)?
            }
            Assignment::AppendAssign => {
-                let lhs = eval_expression(engine_state, stack, lhs)?;
+                let lhs = eval_expression::<D>(engine_state, stack, lhs)?;
                lhs.append(op_span, &rhs, op_span)?
            }
        };
@ -1273,7 +1289,8 @@ impl Eval for EvalRuntime {
                        // As such, give it special treatment here.
                        let is_env = var_id == &ENV_VARIABLE_ID;
                        if is_env || engine_state.get_var(*var_id).mutable {
-                            let mut lhs = eval_expression(engine_state, stack, &cell_path.head)?;
+                            let mut lhs =
+                                eval_expression::<D>(engine_state, stack, &cell_path.head)?;

                            lhs.upsert_data_at_cell_path(&cell_path.tail, rhs)?;
                            if is_env {