mirror of
https://github.com/nushell/nushell.git
synced 2025-08-09 08:26:22 +02:00
Replace ExternalStream
with new ByteStream
type (#12774)
# Description This PR introduces a `ByteStream` type which is a `Read`-able stream of bytes. Internally, it has an enum over three different byte stream sources: ```rust pub enum ByteStreamSource { Read(Box<dyn Read + Send + 'static>), File(File), Child(ChildProcess), } ``` This is in comparison to the current `RawStream` type, which is an `Iterator<Item = Vec<u8>>` and has to allocate for each read chunk. Currently, `PipelineData::ExternalStream` serves a weird dual role where it is either external command output or a wrapper around `RawStream`. `ByteStream` makes this distinction more clear (via `ByteStreamSource`) and replaces `PipelineData::ExternalStream` in this PR: ```rust pub enum PipelineData { Empty, Value(Value, Option<PipelineMetadata>), ListStream(ListStream, Option<PipelineMetadata>), ByteStream(ByteStream, Option<PipelineMetadata>), } ``` The PR is relatively large, but a decent amount of it is just repetitive changes. This PR fixes #7017, fixes #10763, and fixes #12369. This PR also improves performance when piping external commands. Nushell should, in most cases, have competitive pipeline throughput compared to, e.g., bash. | Command | Before (MB/s) | After (MB/s) | Bash (MB/s) | | -------------------------------------------------- | -------------:| ------------:| -----------:| | `throughput \| rg 'x'` | 3059 | 3744 | 3739 | | `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 | # User-Facing Changes - This is a breaking change for the plugin communication protocol, because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`. Plugins now only have to deal with a single input stream, as opposed to the previous three streams: stdout, stderr, and exit code. - The output of `describe` has been changed for external/byte streams. - Temporary breaking change: `bytes starts-with` no longer works with byte streams. This is to keep the PR smaller, and `bytes ends-with` already does not work on byte streams. - If a process core dumped, then instead of having a `Value::Error` in the `exit_code` column of the output returned from `complete`, it now is a `Value::Int` with the negation of the signal number. # After Submitting - Update docs and book as necessary - Release notes (e.g., plugin protocol changes) - Adapt/convert commands to work with byte streams (high priority is `str length`, `bytes starts-with`, and maybe `bytes ends-with`). - Refactor the `tee` code, Devyn has already done some work on this. --------- Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
This commit is contained in:
@ -57,16 +57,12 @@ documentation link at https://docs.rs/encoding_rs/latest/encoding_rs/#statics"#
|
||||
let encoding: Option<Spanned<String>> = call.opt(engine_state, stack, 0)?;
|
||||
|
||||
match input {
|
||||
PipelineData::ExternalStream { stdout: None, .. } => Ok(PipelineData::empty()),
|
||||
PipelineData::ExternalStream {
|
||||
stdout: Some(stream),
|
||||
span: input_span,
|
||||
..
|
||||
} => {
|
||||
let bytes: Vec<u8> = stream.into_bytes()?.item;
|
||||
PipelineData::ByteStream(stream, ..) => {
|
||||
let span = stream.span();
|
||||
let bytes = stream.into_bytes()?;
|
||||
match encoding {
|
||||
Some(encoding_name) => super::encoding::decode(head, encoding_name, &bytes),
|
||||
None => super::encoding::detect_encoding_name(head, input_span, &bytes)
|
||||
None => super::encoding::detect_encoding_name(head, span, &bytes)
|
||||
.map(|encoding| encoding.decode(&bytes).0.into_owned())
|
||||
.map(|s| Value::string(s, head)),
|
||||
}
|
||||
|
@ -81,13 +81,10 @@ documentation link at https://docs.rs/encoding_rs/latest/encoding_rs/#statics"#
|
||||
let ignore_errors = call.has_flag(engine_state, stack, "ignore-errors")?;
|
||||
|
||||
match input {
|
||||
PipelineData::ExternalStream { stdout: None, .. } => Ok(PipelineData::empty()),
|
||||
PipelineData::ExternalStream {
|
||||
stdout: Some(stream),
|
||||
..
|
||||
} => {
|
||||
PipelineData::ByteStream(stream, ..) => {
|
||||
let span = stream.span();
|
||||
let s = stream.into_string()?;
|
||||
super::encoding::encode(head, encoding, &s.item, s.span, ignore_errors)
|
||||
super::encoding::encode(head, encoding, &s, span, ignore_errors)
|
||||
.map(|val| val.into_pipeline_data())
|
||||
}
|
||||
PipelineData::Value(v, ..) => {
|
||||
|
@ -208,30 +208,21 @@ fn operate(
|
||||
}
|
||||
})
|
||||
.into()),
|
||||
PipelineData::ExternalStream { stdout: None, .. } => Ok(PipelineData::Empty),
|
||||
PipelineData::ExternalStream {
|
||||
stdout: Some(stream),
|
||||
..
|
||||
} => {
|
||||
// Collect all `stream` chunks into a single `chunk` to be able to deal with matches that
|
||||
// extend across chunk boundaries.
|
||||
// This is a stop-gap solution until the `regex` crate supports streaming or an alternative
|
||||
// solution is found.
|
||||
// See https://github.com/nushell/nushell/issues/9795
|
||||
let str = stream.into_string()?.item;
|
||||
PipelineData::ByteStream(stream, ..) => {
|
||||
if let Some(lines) = stream.lines() {
|
||||
let iter = ParseIter {
|
||||
captures: VecDeque::new(),
|
||||
regex,
|
||||
columns,
|
||||
iter: lines,
|
||||
span: head,
|
||||
ctrlc,
|
||||
};
|
||||
|
||||
// let iter = stream.lines();
|
||||
|
||||
let iter = ParseIter {
|
||||
captures: VecDeque::new(),
|
||||
regex,
|
||||
columns,
|
||||
iter: std::iter::once(Ok(str)),
|
||||
span: head,
|
||||
ctrlc,
|
||||
};
|
||||
|
||||
Ok(ListStream::new(iter, head, None).into())
|
||||
Ok(ListStream::new(iter, head, None).into())
|
||||
} else {
|
||||
Ok(PipelineData::Empty)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
Reference in New Issue
Block a user