mirror of
https://github.com/nushell/nushell.git
synced 2025-05-29 06:17:54 +02:00
<!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> This PR allows `from xml` to parse XML documents with [document type declarations](https://en.wikipedia.org/wiki/Document_type_declaration) by default. This is especially notable since many HTML documents start with `<!DOCTYPE html>`, and `roxmltree` should be able to parse some simple HTML documents. The security concerns with DTDs are [XXE attacks](https://en.wikipedia.org/wiki/XML_external_entity_attack), and [exponential entity expansion attacks](https://en.wikipedia.org/wiki/Billion_laughs_attack). `roxmltree` [doesn't support](d2c7801624/src/tokenizer.rs (L535-L547)
) external entities (it parses them, but doesn't do anything with them), so it is not vulnerable to XXE attacks. Additionally, `roxmltree` has [some safeguards](d2c7801624/src/parse.rs (L424-L452)
) in place to prevent exponential entity expansion, so enabling DTDs by default is relatively safe. The worst case is no worse than running `loop {}`, so I think allowing DTDs by default is best, and DTDs can still be disabled with `--disallow-dtd` if needed. # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> * Allows `from xml` to parse XML documents with [document type declarations](https://en.wikipedia.org/wiki/Document_type_declaration) by default, and adds a `--disallow-dtd` flag to disallow parsing documents with DTDs. This PR also improves the errors in `from xml` by pointing at the issue in the XML source. Example: ``` $ open --raw foo.xml | from xml Error: × Failed to parse XML ╭─[2:7] 1 │ <html> 2 │ <p<>hi</p> · ▲ · ╰── Unexpected character <, expected a whitespace 3 │ </html> ╰──── ``` # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> N/A # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> N/A
nu-protocol
The nu-protocol crate holds the definitions of structs/traits that are used throughout Nushell. This gives us one way to expose them to many other crates, as well as make these definitions available to each other, without causing mutually recursive dependencies.