nushell/crates/nu-cmd-base/src/arg_glob.rs
Bob Hyman 09b3dab35d
Allow filesystem commands to access files with glob metachars in name (#10694)
(squashed version of #10557, clean commit history and review thread)

Fixes #10571, also potentially: #10364, #10211, #9558, #9310,


# Description
Changes processing of arguments to filesystem commands that are source
paths or globs.
Applies to `cp, cp-old, mv, rm, du` but not `ls` (because it uses a
different globbing interface) or `glob` (because it uses a different
globbing library).

The core of the change is to lookup the argument first as a file and
only glob if it is not. That way,
a path containing glob metacharacters can be referenced without glob
quoting, though it will have to be single quoted to avoid nushell
parsing.

Before: A file path that looks like a glob is not matched by the glob
specified as a (source) argument and takes some thinking about to
access. You might say the glob pattern shadows a file with the same
spelling.
```
> ls a*
╭───┬────────┬──────┬──────┬────────────────╮
│ # │  name  │ type │ size │    modified    │
├───┼────────┼──────┼──────┼────────────────┤
│ 0 │ a[bc]d │ file │  0 B │ 34 seconds ago │
│ 1 │ abd    │ file │  0 B │ now            │
│ 2 │ acd    │ file │  0 B │ now            │
╰───┴────────┴──────┴──────┴────────────────╯

> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd

> ## Note -- a[bc]d *not* copied, and seemingly hard to access.
> cp --verbose 'a\[bc\]d' dest
Error:   × No matches found
   ╭─[entry #33:1:1]
 1 │ cp --verbose 'a\[bc\]d' dest
   ·              ─────┬────
   ·                   ╰── no matches found
   ╰────

> #.. but is accessible with enough glob quoting.
> cp --verbose 'a[[]bc[]]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d
```
Before_2: if file has glob metachars but isn't a valid pattern, user
gets a confusing error:

```
> touch 'a[b'
> cp 'a[b' dest
Error:   × Pattern syntax error near position 30: invalid range pattern
   ╭─[entry #13:1:1]
 1 │ cp 'a[b' dest
   ·    ──┬──
   ·      ╰── invalid pattern
   ╰────
```

After: Args to cp, mv, etc. are tried first as literal files, and only
as globs if not found to be files.

```
> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d
> cp --verbose '[a][bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd
```
After_2: file with glob metachars but invalid pattern just works.
(though Windows does not allow file name to contain `*`.).

```
> cp --verbose 'a[b' dest
copied /home/bobhy/src/rust/work/r4/a[b to /home/bobhy/src/rust/work/r4/dest/a[b
```

So, with this fix, a file shadows a glob pattern with the same spelling.
If you have such a file and really want to use the glob pattern, you
will have to glob quote some of the characters in the pattern. I think
that's less confusing to the user: if ls shows a file with a weird name,
s/he'll still be able to copy, rename or delete it.

# User-Facing Changes
Could break some existing scripts. If user happened to have a file with
a globbish name but was using a glob pattern with the same spelling, the
new version will process the file and not expand the glob.

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

---------

Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
2023-10-18 13:31:15 -05:00

208 lines
6.6 KiB
Rust

// utilities for expanding globs in command arguments
use nu_glob::{glob_with_parent, MatchOptions, Paths};
use nu_protocol::{ShellError, Spanned};
use std::fs;
use std::path::{Path, PathBuf};
// standard glob options to use for filesystem command arguments
const GLOB_PARAMS: MatchOptions = MatchOptions {
case_sensitive: true,
require_literal_separator: false,
require_literal_leading_dot: false,
recursive_match_hidden_dir: true,
};
// handle an argument that could be a literal path or a glob.
// if literal path, return just that (whether user can access it or not).
// if glob, expand into matching paths, using GLOB_PARAMS options.
pub fn arg_glob(
pattern: &Spanned<String>, // alleged path or glob
cwd: &Path, // current working directory
) -> Result<Paths, ShellError> {
arg_glob_opt(pattern, cwd, GLOB_PARAMS)
}
// variant of [arg_glob] that requires literal dot prefix in pattern to match dot-prefixed path.
pub fn arg_glob_leading_dot(pattern: &Spanned<String>, cwd: &Path) -> Result<Paths, ShellError> {
arg_glob_opt(
pattern,
cwd,
MatchOptions {
require_literal_leading_dot: true,
..GLOB_PARAMS
},
)
}
fn arg_glob_opt(
pattern: &Spanned<String>,
cwd: &Path,
options: MatchOptions,
) -> Result<Paths, ShellError> {
// remove ansi coloring (?)
let pattern = {
Spanned {
item: nu_utils::strip_ansi_string_unlikely(pattern.item.clone()),
span: pattern.span,
}
};
// if there's a file with same path as the pattern, just return that.
let pp = cwd.join(&pattern.item);
let md = fs::metadata(pp);
#[allow(clippy::single_match)]
match md {
Ok(_metadata) => {
return Ok(Paths::single(&PathBuf::from(pattern.item), cwd));
}
// file not found, but also "invalid chars in file" (e.g * on Windows). Fall through and glob
Err(_) => {}
}
// user wasn't referring to a specific thing in filesystem, try to glob it.
match glob_with_parent(&pattern.item, options, cwd) {
Ok(p) => Ok(p),
Err(pat_err) => {
Err(ShellError::InvalidGlobPattern(
pat_err.msg.into(),
pattern.span, // improve specificity
))
}
}
}
#[cfg(test)]
mod test {
use super::*;
use nu_glob::GlobResult;
use nu_protocol::{Span, Spanned};
use nu_test_support::fs::Stub::EmptyFile;
use nu_test_support::playground::Playground;
use rstest::rstest;
fn spanned_string(str: &str) -> Spanned<String> {
Spanned {
item: str.to_string(),
span: Span::test_data(),
}
}
#[test]
fn does_something() {
let act = arg_glob(&spanned_string("*"), &PathBuf::from("."));
assert!(act.is_ok());
for f in act.expect("checked ok") {
match f {
Ok(p) => {
assert!(!p.to_str().unwrap().is_empty());
}
Err(e) => panic!("unexpected error {:?}", e),
};
}
}
#[test]
fn glob_format_error() {
let act = arg_glob(&spanned_string(r#"ab]c[def"#), &PathBuf::from("."));
assert!(act.is_err());
}
#[rstest]
#[case("*", 4, "no dirs")]
#[case("**/*", 7, "incl dirs")]
fn glob_subdirs(#[case] pat: &str, #[case] exp_count: usize, #[case] case: &str) {
Playground::setup("glob_subdirs", |dirs, sandbox| {
sandbox.with_files(vec![
EmptyFile("yehuda.txt"),
EmptyFile("jttxt"),
EmptyFile("andres.txt"),
]);
sandbox.mkdir(".children");
sandbox.within(".children").with_files(vec![
EmptyFile("timothy.txt"),
EmptyFile("tiffany.txt"),
EmptyFile("trish.txt"),
]);
let p: Vec<GlobResult> = arg_glob(&spanned_string(pat), &dirs.test)
.expect("no error")
.collect();
assert_eq!(
exp_count,
p.iter().filter(|i| i.is_ok()).count(),
" case: {case} ",
);
// expected behavior -- that directories are included in results (if name matches pattern)
let t = p
.iter()
.any(|i| i.as_ref().unwrap().to_string_lossy().contains(".children"));
assert!(t, "check for dir, case {case}");
})
}
#[rstest]
#[case("yehuda.txt", true, 1, "matches literal path")]
#[case("*", false, 3, "matches glob")]
#[case(r#"bad[glob.foo"#, true, 1, "matches literal, would be bad glob pat")]
fn exact_vs_glob(
#[case] pat: &str,
#[case] exp_matches_input: bool,
#[case] exp_count: usize,
#[case] case: &str,
) {
Playground::setup("exact_vs_glob", |dirs, sandbox| {
sandbox.with_files(vec![
EmptyFile("yehuda.txt"),
EmptyFile("jttxt"),
EmptyFile("bad[glob.foo"),
]);
let res = arg_glob(&spanned_string(pat), &dirs.test)
.expect("no error")
.collect::<Vec<GlobResult>>();
eprintln!("res: {:?}", res);
if exp_matches_input {
assert_eq!(
exp_count,
res.len(),
" case {case}: matches input, but count not 1? "
);
assert_eq!(
&res[0].as_ref().unwrap().to_string_lossy(),
pat, // todo: is it OK for glob to return relative paths (not to current cwd, but to arg cwd of arg_glob)?
);
} else {
assert_eq!(exp_count, res.len(), " case: {}: matched glob", case);
}
})
}
#[rstest]
#[case(r#"realbad[glob.foo"#, true, 1, "error, bad glob")]
fn exact_vs_bad_glob(
// if path doesn't exist but pattern is not valid glob, should get error.
#[case] pat: &str,
#[case] _exp_matches_input: bool,
#[case] _exp_count: usize,
#[case] _tag: &str,
) {
Playground::setup("exact_vs_bad_glob", |dirs, sandbox| {
sandbox.with_files(vec![
EmptyFile("yehuda.txt"),
EmptyFile("jttxt"),
EmptyFile("bad[glob.foo"),
]);
let res = arg_glob(&spanned_string(pat), &dirs.test);
assert!(res
.expect_err("expected error")
.to_string()
.contains("Invalid glob pattern"));
})
}
}