The text that is printed is generated when building assets, by analyzing LICENSE
and NOTICE files that comes with syntaxes and themes.
We take this opportunity to also add a NOTICE file as defined by Apache License 2.0.
We started to use lazycell because syntect already used it. But syntect has
changed to use once_cell. So we should also do that to prepare for using the
upcoming version of syntect.
I had to use a `lazy_static` due to that the clap API that only accepts a
reference to a version string. And, in our code, only a 'static reference to a
version string.
Code could probably be refactored to accept a "normal" reference, but that would
be a major undertaking.
By forwarding the task to find the `Plain Text` syntax to `assets`. Not only does
the code become simpler; we also get rid of a call to `self.get_syntax_set()`
which is beneficial to the long term goal of replacing `syntaxes.bin` with
`minimal_syntaxes.bin`.
Note that the use of `.expect()` is not a regression in error handling. It was
previously hidden in `.find_syntax_plain_text()`.
This information is useful when you want to build several SyntaxSets, but
without having to duplicate SyntaxDefinitions. For example:
"Rust" has no dependencies. But "Markdown" depends on "Rust". With the data
structures this code adds, we know that "Rust" is a dependent syntax for
"Markdown", and can construct a SyntaxSet that takes that into account.
Note that code has a temporary environment flag to ignore any information about
dependents when constructing SyntaxSets. Code that makes use of the new data
structure will be added later.
This significantly speeds up the startup time of bat, since only a single
linked SyntaxDefinition is loaded for each file. The size increase of the
binary is just ~400 kB.
In order for startup time to be improved, the --language arg must be used, and
it must match one of the following names:
"Plain Text", "ActionScript", "AppleScript", "Batch File", "NAnt Build File",
"C#", "C", "CSS", "D", "Diff", "Erlang", "Go", "Haskell", "JSON", "Java
Properties", "BibTeX", "LaTeX Log", "TeX", "Lisp", "Lua", "MATLAB", "Pascal",
"R", "Regular Expression", "Rust", "SQL", "Scala", "Tcl", "XML", "YAML", "Apache
Conf", "ARM Assembly", "Assembly (x86_64)", "CMakeCache", "Comma Separated
Values", "Cabal", "CoffeeScript", "CpuInfo", "Dart Analysis Output", "Dart",
"Dockerfile", "DotENV", "F#", "Friendly Interactive Shell (fish)", "Fortran
(Fixed Form)", "Fortran (Modern)", "Fortran Namelist", "fstab", "GLSL",
"GraphQL", "Groff/troff", "group", "hosts", "INI", "Jinja2", "jsonnet",
"Kotlin", "Less", "LLVM", "Lean", "MemInfo", "Nim", "Ninja", "Nix", "passwd",
"PowerShell", "Protocol Buffer (TEXT)", "Puppet", "Rego", "resolv", "Robot
Framework", "SML", "Strace", "Stylus", "Solidity", "Vyper", "Swift",
"SystemVerilog", "TOML", "Terraform", "TypeScript", "TypeScriptReact",
"Verilog", "VimL", "Zig", "gnuplot", "log", "requirements.txt", "Highlight
non-printables", "Private Key", "varlink"
Later commits will improve startup time for more code paths.
* fix some typos and misspellings
* CHANGELOG.md: Add Performance section (preliminary)
* Add a CHANGELOG.md entry for this PR
This will be needed to later support zero-copy deserialization of independent
syntax sets, but is interesting and useful on its own.
Instead of deferring serialization and deserialization to syntect, we implement it
ourselves in the same way, but make compression optional.
We can't use #[from] on Error::Msg(String) because String does not implement Error.
(Which it shouldn't; see e.g. https://internals.rust-lang.org/t/impl-error-for-string/8881.)
So we implement From manually for Error::Msg, since our current code was written
in that way for error-chain.
Move code to build assets to its own file. That results in better modularity and flexibility.
It also allows us to simplify HighlightingAssets a lot, since it will now always
be initialized with a SerializedSyntaxSet.
To improve startup performance, we will later load smaller `SyntaxSet`s instead
of one giant one. However, the current API assumes only one `SyntaxSet` is ever used,
and that that implicitly is the `SyntaxSet` from which returned `SyntaxReference`s
comes.
This change changes the API to reflect that `SyntaxSet` and `SyntaxReference`
are tightly coupled, and enables the use of several `SyntaxSet`.
Instead of 100 ms - 50 ms, startup takes 10 ms - 5 ms.
HighlightingAssets::get_syntax_set() is never called when e.g. piping the bat
output to a file (see Config::loop_through), so by loading the SyntaxSet only
when needed, we radically improve startup time when it is not needed.
They are just a way to get access to data embedded in the binary, so they don't
conceptually belong inside HighlightingAssets.
This has the nice side effect of getting HighlightingAssets::from_cache() and
::from_binary(), that are highly related, next to each other.
Or rather, introduce new versions of these methods and deprecate the old ones.
This is preparation to enable robust and user-friendly support for lazy-loading.
With lazy-loading, we don't know if the SyntaxSet is valid until after we try to
use it, so wherever we try to use it, we need to return a Result. See discussion
about panics in #1747.
Using BufReader makes sense for large files, but assets are never large enough
to require buffering. It is significantly faster to load the file contents in
one go, so let's do that instead.
Closes#1753
It already now reduces code duplication slightly, but will become even more
useful in the future when we add more complicated logic such as lazy-loading.
Since we only modify `pub(crate)` items, the stable bat-as-a-library API is not
affected.
This takes us one step closer to making SyntaxSet lazy-loaded, which in turn
takes us one step closer to solving #951.
This fixes a bug on Windows where `Command::new` would also run
executables from the current working directory, possibly resulting in
accidental runs of programs called `less`.
Otherwise Rust 1.53.0 gets confused during `cargo doc` because it thinks
we want an actual URL:
warning: this URL is not a hyperlink
--> src/pretty_printer.rs:331:40
|
331 | /// The title for the input (e.g. "http://example.com/example.txt")
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use an automatic link instead: `<http://example.com/example.txt>`
|
= note: `#[warn(rustdoc::bare_urls)]` on by default
= note: bare URLs are not automatically turned into clickable links
It was perhaps also a bit confusing to give an URL as an example in the
first place, because according to our own API example
`examples/inputs.rs` it is meant to be more a free-text thing.