nushell/crates/nu-command/src/bytes/starts_with.rs
Doru c602b5a1e8
special-case ExternalStream in bytes starts-with (#8203)
# Description
`bytes starts-with` converts the input into a `Value` before running
.starts_with to find if the binary matches. This has two side effects:
it makes the code simpler, only dealing in whole values, and simplifying
a lot of input pipeline handling and value transforming it would
otherwise have to do. _Especially_ in the presence of a cell path to
drill into. It also makes buffers the entire input into memory, which
can take up a lot of memory when dealing with large files, especially if
you only want to check the first few bytes (like for a magic number).

This PR adds a special branch on PipelineData::ExternalStream with a
streaming version of starts_with.

# User-Facing Changes
Opening large files and running bytes starts-with on them will not take
a long time.

# Tests + Formatting

Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A
clippy::needless_collect` to check that you're using the standard code
style
- `cargo test --workspace` to check that all tests pass

# Drawbacks
Streaming checking is more complicated, and there may be bugs. I tested
it with multiple chunks with string data and binary data and it seems to
work alright up to 8k and over bytes, though.

The existing `operate` method still exists because the way it handles
cell paths and values is complicated. This causes some "code
duplication", or at least some intent duplication, between the value
code and the streaming code. This might be worthwhile considering the
performance gains (approaching infinity on larger inputs).

Another thing to consider is that my ExternalStream branch considers
string data as valid input. The operate branch only parses Binary
values, so it would fail. `open` is kind of unpredictable on whether it
returns string data or binary data, even when passing `--raw`. I think
this can be a problem but not really one I'm trying to tackle in this
PR, so, it's worth considering.
2023-02-26 15:17:44 +01:00

175 lines
5.7 KiB
Rust

use crate::input_handler::{operate, CmdArgument};
use nu_engine::CallExt;
use nu_protocol::ast::Call;
use nu_protocol::ast::CellPath;
use nu_protocol::engine::{Command, EngineState, Stack};
use nu_protocol::Category;
use nu_protocol::IntoPipelineData;
use nu_protocol::{Example, PipelineData, ShellError, Signature, Span, SyntaxShape, Type, Value};
struct Arguments {
pattern: Vec<u8>,
cell_paths: Option<Vec<CellPath>>,
}
impl CmdArgument for Arguments {
fn take_cell_paths(&mut self) -> Option<Vec<CellPath>> {
self.cell_paths.take()
}
}
#[derive(Clone)]
pub struct BytesStartsWith;
impl Command for BytesStartsWith {
fn name(&self) -> &str {
"bytes starts-with"
}
fn signature(&self) -> Signature {
Signature::build("bytes starts-with")
.input_output_types(vec![(Type::Binary, Type::Bool)])
.required("pattern", SyntaxShape::Binary, "the pattern to match")
.rest(
"rest",
SyntaxShape::CellPath,
"for a data structure input, check if bytes at the given cell paths start with the pattern",
)
.category(Category::Bytes)
}
fn usage(&self) -> &str {
"Check if bytes starts with a pattern"
}
fn search_terms(&self) -> Vec<&str> {
vec!["pattern", "match", "find", "search"]
}
fn run(
&self,
engine_state: &EngineState,
stack: &mut Stack,
call: &Call,
input: PipelineData,
) -> Result<PipelineData, ShellError> {
let pattern: Vec<u8> = call.req(engine_state, stack, 0)?;
let cell_paths: Vec<CellPath> = call.rest(engine_state, stack, 1)?;
let cell_paths = (!cell_paths.is_empty()).then_some(cell_paths);
let arg = Arguments {
pattern,
cell_paths,
};
match input {
PipelineData::ExternalStream {
stdout: Some(stream),
span,
..
} => {
let mut i = 0;
for item in stream {
let byte_slice = match &item {
// String and binary data are valid byte patterns
Ok(Value::String { val, .. }) => val.as_bytes(),
Ok(Value::Binary { val, .. }) => val,
// If any Error value is output, echo it back
Ok(v @ Value::Error { .. }) => return Ok(v.clone().into_pipeline_data()),
// Unsupported data
Ok(other) => {
return Ok(Value::Error {
error: ShellError::OnlySupportsThisInputType(
"string and binary".into(),
other.get_type().to_string(),
span,
// This line requires the Value::Error match above.
other.expect_span(),
),
}
.into_pipeline_data());
}
Err(err) => return Err(err.to_owned()),
};
let max = byte_slice.len().min(arg.pattern.len() - i);
if byte_slice[..max] == arg.pattern[i..i + max] {
i += max;
if i >= arg.pattern.len() {
return Ok(Value::boolean(true, span).into_pipeline_data());
}
} else {
return Ok(Value::boolean(false, span).into_pipeline_data());
}
}
// We reached the end of the stream and never returned,
// the pattern wasn't exhausted so it probably doesn't match
Ok(Value::boolean(false, span).into_pipeline_data())
}
_ => operate(
starts_with,
arg,
input,
call.head,
engine_state.ctrlc.clone(),
),
}
}
fn examples(&self) -> Vec<Example> {
vec![
Example {
description: "Checks if binary starts with `0x[1F FF AA]`",
example: "0x[1F FF AA AA] | bytes starts-with 0x[1F FF AA]",
result: Some(Value::test_bool(true)),
},
Example {
description: "Checks if binary starts with `0x[1F]`",
example: "0x[1F FF AA AA] | bytes starts-with 0x[1F]",
result: Some(Value::test_bool(true)),
},
Example {
description: "Checks if binary starts with `0x[1F]`",
example: "0x[1F FF AA AA] | bytes starts-with 0x[11]",
result: Some(Value::test_bool(false)),
},
]
}
}
fn starts_with(val: &Value, args: &Arguments, span: Span) -> Value {
match val {
Value::Binary {
val,
span: val_span,
} => Value::boolean(val.starts_with(&args.pattern), *val_span),
// Propagate errors by explicitly matching them before the final case.
Value::Error { .. } => val.clone(),
other => Value::Error {
error: ShellError::OnlySupportsThisInputType(
"binary".into(),
other.get_type().to_string(),
span,
// This line requires the Value::Error match above.
other.expect_span(),
),
},
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_examples() {
use crate::test_examples;
test_examples(BytesStartsWith {})
}
}