nushell/crates/nu-command
132ikl 430b2746b8
Parse XML documents with DTDs by default, and add --disallow-dtd flag (#15272)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR allows `from xml` to parse XML documents with [document type
declarations](https://en.wikipedia.org/wiki/Document_type_declaration)
by default. This is especially notable since many HTML documents start
with `<!DOCTYPE html>`, and `roxmltree` should be able to parse some
simple HTML documents. The security concerns with DTDs are [XXE
attacks](https://en.wikipedia.org/wiki/XML_external_entity_attack), and
[exponential entity expansion
attacks](https://en.wikipedia.org/wiki/Billion_laughs_attack).
`roxmltree` [doesn't
support](d2c7801624/src/tokenizer.rs (L535-L547))
external entities (it parses them, but doesn't do anything with them),
so it is not vulnerable to XXE attacks. Additionally, `roxmltree` has
[some
safeguards](d2c7801624/src/parse.rs (L424-L452))
in place to prevent exponential entity expansion, so enabling DTDs by
default is relatively safe. The worst case is no worse than running
`loop {}`, so I think allowing DTDs by default is best, and DTDs can
still be disabled with `--disallow-dtd` if needed.

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
* Allows `from xml` to parse XML documents with [document type
declarations](https://en.wikipedia.org/wiki/Document_type_declaration)
by default, and adds a `--disallow-dtd` flag to disallow parsing
documents with DTDs.

This PR also improves the errors in `from xml` by pointing at the issue
in the XML source. Example:

```
$ open --raw foo.xml | from xml 
Error:   × Failed to parse XML
   ╭─[2:7]
 1 │ <html>
 2 │     <p<>hi</p>
   ·       ▲
   ·       ╰── Unexpected character <, expected a whitespace
 3 │ </html>
   ╰────
```

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
N/A

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
N/A
2025-03-12 08:09:55 -05:00
..
src Parse XML documents with DTDs by default, and add --disallow-dtd flag (#15272) 2025-03-12 08:09:55 -05:00
tests bugfix: math commands now return error with infinite range [#15135] (#15236) 2025-03-11 14:40:26 +01:00
Cargo.toml feat(random uuid): add support for uuid versions other than 4. (#15239) 2025-03-06 14:21:52 +01:00
LICENSE Fix rest of license year ranges (#8727) 2023-04-04 09:03:29 +12:00
README.md Add top-level crate documentation/READMEs (#12907) 2024-07-14 10:10:41 +02:00

This crate contains the majority of our commands

We allow ourselves to move some of the commands in nu-command to nu-cmd-* crates as needed.

Internal Nushell crate

This crate implements components of Nushell and is not designed to support plugin authors or other users directly.