string | fill counts clusters, not graphemes; and doesn't count ANSI escape codes (#8134)

Enhancement of new `fill` command (#7846) to handle content including
ANSI escape codes for formatting or multi-code-point Unicode grapheme
clusters.
In both of these cases, the content is (many) bytes longer than its
visible length, and `fill` was counting the extra bytes so not adding
enough fill characters.

# Description

This script:
```rust
# the teacher emoji `\u{1F9D1}\u{200D}\u{1F3EB}` is 3 code points, but only 1 print position wide.
echo "This output should be 3 print positions wide, with leading and trailing `+`"
$"\u{1F9D1}\u{200D}\u{1F3EB}" | fill -c "+" -w 3 -a "c"

echo "This output should be 3 print positions wide, with leading and trailing `+`"
$"(ansi green)a(ansi reset)" | fill -c "+" -w 3 -a c
echo ""
```

Was producing this output:
```rust
This output should be 3 print positions wide, with leading and trailing `+`
🧑‍🏫

This output should be 3 print positions wide, with leading and trailing `+`
a
```

After this PR, it produces this output:
```rust
This output should be 3 print positions wide, with leading and trailing `+`
+🧑‍🏫+

This output should be 3 print positions wide, with leading and trailing `+`
+a+
```
# User-Facing Changes

Users may have to undo fixes they may have introduced to work around the
former behavior. I have one such in my prompt string that I can now
revert.

# Tests + Formatting

Don't forget to add tests that cover your changes.
-- Done

Make sure you've run and fixed any issues with these commands:

- [x] `cargo fmt --all -- --check` to check standard code formatting
(`cargo fmt --all` applies these changes)
- [x] `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A
clippy::needless_collect` to check that you're using the standard code
style
- [x] `cargo test --workspace` to check that all tests pass

# After Submitting

`fill` command not documented in the book, and it still talks about `str
lpad/rpad`. I'll fix.

Note added dependency on a new library `print-positions`, which is an
iterator that yields a complete print position (cluster + Ansi sequence)
per call. Should this be vendored?
This commit is contained in:
Bob Hyman
2023-02-20 07:32:20 -05:00
committed by GitHub
parent 527c44ed84
commit 0a8c9b22b0
6 changed files with 56 additions and 5 deletions

View File

@ -5,7 +5,7 @@ use nu_protocol::{
engine::{Command, EngineState, Stack},
Category, Example, PipelineData, ShellError, Signature, Span, SyntaxShape, Type, Value,
};
use unicode_width::UnicodeWidthStr;
use print_positions::print_positions;
#[derive(Clone)]
pub struct Fill;
@ -225,7 +225,8 @@ fn fill_string(s: &str, args: &Arguments, span: Span) -> Value {
fn pad(s: &str, width: usize, pad_char: &str, alignment: FillAlignment, truncate: bool) -> String {
// Attribution: Most of this function was taken from https://github.com/ogham/rust-pad and tweaked. Thank you!
// Use width instead of len for graphical display
let cols = UnicodeWidthStr::width(s);
let cols = print_positions(s).count();
if cols >= width {
if truncate {