nushell/crates/nu-command
alex-tdrn 40e629beb1
Fix multibyte codepoint handling in detect columns --guess (#13272)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR fixes #13269. The splitting code in `guess_width.rs` was
creating slices from char indices, instead of byte indices. This works
perfectly fine for 1-byte code points, but panics or returns wrong
results as soon as multibyte codepoints appear in the input. I
originally discovered this by piping `winget list` into `detect columns
--guess`, since winget sometimes uses the unicode ellipsis symbol (`…`)
which is 3 bytes long when encoded in utf-8.

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
`detect columns --guess` should not crash due to multibyte unicode input
anymore

before:

![image](https://github.com/nushell/nushell/assets/20356389/833cd732-be3b-4158-97f7-0ca2616ce23f)

after:

![image](https://github.com/nushell/nushell/assets/20356389/15358b40-4083-4a33-9f2c-87e63f39d985)


# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
- Added tests to `guess_width.rs` for testing handling of multibyte as
well as combining diacritical marks

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2024-06-29 16:12:17 -05:00
..
src Fix multibyte codepoint handling in detect columns --guess (#13272) 2024-06-29 16:12:17 -05:00
tests Fix find command output bug in the case of taking ByteStream input. (#13246) 2024-06-27 09:46:10 -05:00
Cargo.toml Bumping version to 0.95.1 (#13231) 2024-06-25 18:26:07 -07:00
LICENSE Fix rest of license year ranges (#8727) 2023-04-04 09:03:29 +12:00