Commit Graph

10 Commits

Author SHA1 Message Date
Devyn Cairns
c61075e20e
Add string/binary type color to ByteStream (#12897)
# Description

This PR allows byte streams to optionally be colored as being
specifically binary or string data, which guarantees that they'll be
converted to `Binary` or `String` appropriately on `into_value()`,
making them compatible with `Type` guarantees. This makes them
significantly more broadly usable for command input and output.

There is still an `Unknown` type for byte streams coming from external
commands, which uses the same behavior as we previously did where it's a
string if it's UTF-8.

A small number of commands were updated to take advantage of this, just
to prove the point. I will be adding more after this merges.

# User-Facing Changes
- New types in `describe`: `string (stream)`, `binary (stream)`
- These commands now return a stream if their input was a stream:
  - `into binary`
  - `into string`
  - `bytes collect`
  - `str join`
  - `first` (binary)
  - `last` (binary)
  - `take` (binary)
  - `skip` (binary)
- Streams that are explicitly binary colored will print as a streaming
hexdump
  - example:
    ```nushell
    1.. | each { into binary } | bytes collect
    ```

# Tests + Formatting
I've added some tests to cover it at a basic level, and it doesn't break
anything existing, but I do think more would be nice. Some of those will
come when I modify more commands to stream.

# After Submitting
There are a few things I'm not quite satisfied with:

- **String trimming behavior.** We automatically trim newlines from
streams from external commands, but I don't think we should do this with
internal commands. If I call a command that happens to turn my string
into a stream, I don't want the newline to suddenly disappear. I changed
this to specifically do it only on `Child` and `File`, but I don't know
if this is quite right, and maybe we should bring back the old flag for
`trim_end_newline`
- **Known binary always resulting in a hexdump.** It would be nice to
have a `print --raw`, so that we can put binary data on stdout
explicitly if we want to. This PR doesn't change how external commands
work though - they still dump straight to stdout.

Otherwise, here's the normal checklist:

- [ ] release notes
- [ ] docs update for plugin protocol changes (added `type` field)

---------

Co-authored-by: Ian Manske <ian.manske@pm.me>
2024-05-20 00:35:32 +00:00
Ian Manske
6fd854ed9f
Replace ExternalStream with new ByteStream type (#12774)
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
    Read(Box<dyn Read + Send + 'static>),
    File(File),
    Child(ChildProcess),
}
```

This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.

Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
    Empty,
    Value(Value, Option<PipelineMetadata>),
    ListStream(ListStream, Option<PipelineMetadata>),
    ByteStream(ByteStream, Option<PipelineMetadata>),
}
```

The PR is relatively large, but a decent amount of it is just repetitive
changes.

This PR fixes #7017, fixes #10763, and fixes #12369.

This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |

# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.

# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.

---------

Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
2024-05-16 07:11:18 -07:00
Ian Manske
e879d4ecaf
ListStream touchup (#12524)
# Description

Does some misc changes to `ListStream`:
- Moves it into its own module/file separate from `RawStream`.
- `ListStream`s now have an associated `Span`.
- This required changes to `ListStreamInfo` in `nu-plugin`. Note sure if
this is a breaking change for the plugin protocol.
- Hides the internals of `ListStream` but also adds a few more methods.
- This includes two functions to more easily alter a stream (these take
a `ListStream` and return a `ListStream` instead of having to go through
the whole `into_pipeline_data(..)` route).
  -  `map`: takes a `FnMut(Value) -> Value`
  - `modify`: takes a function to modify the inner stream.
2024-05-05 16:00:59 +00:00
Devyn Cairns
72f3942c37
Upgrade to interprocess 2.0.0 (#12729)
# Description

This fixes #12724. NetBSD confirmed to work with this change.

The update also behaves a bit better in some ways - it automatically
unlinks and reclaims sockets on Unix, and doesn't try to flush/sync the
socket on Windows, so I was able to remove that platform-specific logic.

They also have a way to split the socket so I could just use one socket
now, but I haven't tried to do that yet. That would be more of a
breaking change but I think it's more straightforward.

# User-Facing Changes

- Hopefully more platforms work

# Tests + Formatting
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`
2024-05-02 22:31:33 -07:00
Ian Manske
847646e44e
Remove lazy records (#12682)
# Description
Removes lazy records from the language, following from the reasons
outlined in #12622. Namely, this should make semantics more clear and
will eliminate concerns regarding maintainability.

# User-Facing Changes
- Breaking change: `lazy make` is removed.
- Breaking change: `describe --collect-lazyrecords` flag is removed.
- `sys` and `debug info` now return regular records.

# After Submitting
- Update nushell book if necessary.
- Explore new `sys` and `debug info` APIs to prevent them from taking
too long (e.g., subcommands or taking an optional column/cell-path
argument).
2024-05-03 08:36:10 +08:00
Devyn Cairns
ad6deadf24
Flush on every plugin Data message (#12728)
# Description

This helps to ensure data produced on a stream is immediately available
to the consumer of the stream. The BufWriter introduced for performance
reasons in 0.93 exposed the behavior that data messages wouldn't make it
to the other side until they filled the buffer in @cablehead's
[`nu_plugin_from_sse`](https://github.com/cablehead/nu_plugin_from_sse).

I had originally not flushed on every `Data` message because I figured
that it isn't really critical that the other side sees those messages
immediately, since they're not used for control and they are flushed
when waiting for acknowledgement or when the buffer is too full anyway.

Increasing the amount of data that can be sent with a single underlying
write increases performance, but this interferes with some plugins that
want to use streams in a more real-time way. In the future I would like
to make this configurable, maybe even per-command, so that a command can
decide what the priority is. But for now I think this is reasonable.

In the worst case, this decreases performance by about 40%, when sending
very small values (just numbers). But for larger values, this PR
actually increases performance by about 20%, because I've increased the
buffer size about 2x to 16,384 bytes. The previous value of 8,192 bytes
was too small to fit a full buffer coming from an external command, so
doubling it makes sense, and now a write of a buffer from an external
command can be done in exactly one write call, which I think makes
sense. I'm doing this at the same time because flushing each data
message would make it very likely that each individual data message from
an external stream would require exactly two writes rather than
approximately one (amortized).

Again, hopefully the tradeoff isn't too bad, and if it is I'll just make
it configurable.

# User-Facing Changes

- Performance of plugin streams will be a bit different
- Plugins that expect to send streams in real-time will work again

# Tests + Formatting

- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`
2024-05-02 23:51:16 +00:00
Devyn Cairns
21ebdfe8d7
Bump version to 0.93.1 (#12710)
# Description

Next patch/dev release, `0.93.1`
2024-05-01 17:19:20 -05:00
Devyn Cairns
3b220e07e3
Bump version to 0.93.0 (#12709)
# Description

Bump version to `0.93.0`
2024-04-30 15:51:13 -07:00
Devyn Cairns
648486400c
Fix missing local socket feature (#12698)
# Description

So sorry to do this during the pre-release freeze, but my plugin crate
split PR broke local socket mode, because `nu-plugin-protocol` didn't
have the compile feature to advertise the `LocalSocket` protocol
feature.

This is a very simple, configuration-only bugfix that I think really
needs to be merged before the release, or else local socket mode won't
work at all.

# Tests + Formatting

There's an oversight in my testing that caused this to not be caught:
the engine really did have the feature, but it just wasn't advertising
it, so for `stress_internals` it was still able to use it successfully.
Post-release I'll try to make sure this is properly handled somehow.
2024-04-29 15:02:56 +08:00
Devyn Cairns
0c4d5330ee
Split the plugin crate (#12563)
# Description

This breaks `nu-plugin` up into four crates:

- `nu-plugin-protocol`: just the type definitions for the protocol, no
I/O. If someone wanted to wire up something more bare metal, maybe for
async I/O, they could use this.
- `nu-plugin-core`: the shared stuff between engine/plugin. Less stable
interface.
- `nu-plugin-engine`: everything required for the engine to talk to
plugins. Less stable interface.
- `nu-plugin`: everything required for the plugin to talk to the engine,
what plugin developers use. Should be the most stable interface.

No changes are made to the interface exposed by `nu-plugin` - it should
all still be there. Re-exports from `nu-plugin-protocol` or
`nu-plugin-core` are used as required. Plugins shouldn't ever have to
use those crates directly.

This should be somewhat faster to compile as `nu-plugin-engine` and
`nu-plugin` can compile in parallel, and the engine doesn't need
`nu-plugin` and plugins don't need `nu-plugin-engine` (except for test
support), so that should reduce what needs to be compiled too.

The only significant change here other than splitting stuff up was to
break the `source` out of `PluginCustomValue` and create a new
`PluginCustomValueWithSource` type that contains that instead. One bonus
of that is we get rid of the option and it's now more type-safe, but it
also means that the logic for that stuff (actually running the plugin
for custom value ops) can live entirely within the `nu-plugin-engine`
crate.

# User-Facing Changes
- New crates.
- Added `local-socket` feature for `nu` to try to make it possible to
compile without that support if needed.

# Tests + Formatting
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`
2024-04-27 12:08:12 -05:00