Improve case insensitivity consistency (#10884)

# Description

Add an extension trait `IgnoreCaseExt` to nu_utils which adds some case
insensitivity helpers, and use them throughout nu to improve the
handling of case insensitivity. Proper case folding is done via unicase,
which is already a dependency via mime_guess from nu-command.

In actuality a lot of code still does `to_lowercase`, because unicase
only provides immediate comparison and doesn't expose a `to_folded_case`
yet. And since we do a lot of `contains`/`starts_with`/`ends_with`, it's
not sufficient to just have `eq_ignore_case`. But if we get access in
the future, this makes us ready to use it with a change in one place.

Plus, it's clearer what the purpose is at the call site to call
`to_folded_case` instead of `to_lowercase` if it's exclusively for the
purpose of case insensitive comparison, even if it just does
`to_lowercase` still.

# User-Facing Changes

- Some commands that were supposed to be case insensitive remained only
insensitive to ASCII case (a-z), and now are case insensitive w.r.t.
non-ASCII characters as well.

# Tests + Formatting

- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`

---------

Co-authored-by: Stefan Holderbach <sholderbach@users.noreply.github.com>
This commit is contained in:
Christopher Durham
2023-11-08 17:58:54 -05:00
committed by GitHub
parent aed4b626b8
commit 0f600bc3f5
35 changed files with 176 additions and 122 deletions

View File

@ -1016,29 +1016,19 @@ fn in_char_specifiers(specifiers: &[CharSpecifier], c: char, options: MatchOptio
CharRange(start, end) => {
// FIXME: work with non-ascii chars properly (issue #1347)
if !options.case_sensitive && c.is_ascii() && start.is_ascii() && end.is_ascii() {
let start = start.to_ascii_lowercase();
let end = end.to_ascii_lowercase();
let start_up = start
.to_uppercase()
.next()
.expect("internal error: getting start uppercase");
let end_up = end
.to_uppercase()
.next()
.expect("internal error: getting end uppercase");
// only allow case insensitive matching when
// both start and end are within a-z or A-Z
if start != start_up && end != end_up {
if start.is_ascii_alphabetic() && end.is_ascii_alphabetic() {
let start = start.to_ascii_lowercase();
let end = end.to_ascii_lowercase();
let c = c.to_ascii_lowercase();
if c >= start && c <= end {
if (start..=end).contains(&c) {
return true;
}
}
}
if c >= start && c <= end {
if (start..=end).contains(&c) {
return true;
}
}
@ -1279,7 +1269,7 @@ mod test {
fn test_range_pattern() {
let pat = Pattern::new("a[0-9]b").unwrap();
for i in 0..10 {
assert!(pat.matches(&format!("a{}b", i)));
assert!(pat.matches(&format!("a{}b", i)), "a{i}b =~ a[0-9]b");
}
assert!(!pat.matches("a_b"));