This significantly speeds up the startup time of bat, since only a single
linked SyntaxDefinition is loaded for each file. The size increase of the
binary is just ~400 kB.
In order for startup time to be improved, the --language arg must be used, and
it must match one of the following names:
"Plain Text", "ActionScript", "AppleScript", "Batch File", "NAnt Build File",
"C#", "C", "CSS", "D", "Diff", "Erlang", "Go", "Haskell", "JSON", "Java
Properties", "BibTeX", "LaTeX Log", "TeX", "Lisp", "Lua", "MATLAB", "Pascal",
"R", "Regular Expression", "Rust", "SQL", "Scala", "Tcl", "XML", "YAML", "Apache
Conf", "ARM Assembly", "Assembly (x86_64)", "CMakeCache", "Comma Separated
Values", "Cabal", "CoffeeScript", "CpuInfo", "Dart Analysis Output", "Dart",
"Dockerfile", "DotENV", "F#", "Friendly Interactive Shell (fish)", "Fortran
(Fixed Form)", "Fortran (Modern)", "Fortran Namelist", "fstab", "GLSL",
"GraphQL", "Groff/troff", "group", "hosts", "INI", "Jinja2", "jsonnet",
"Kotlin", "Less", "LLVM", "Lean", "MemInfo", "Nim", "Ninja", "Nix", "passwd",
"PowerShell", "Protocol Buffer (TEXT)", "Puppet", "Rego", "resolv", "Robot
Framework", "SML", "Strace", "Stylus", "Solidity", "Vyper", "Swift",
"SystemVerilog", "TOML", "Terraform", "TypeScript", "TypeScriptReact",
"Verilog", "VimL", "Zig", "gnuplot", "log", "requirements.txt", "Highlight
non-printables", "Private Key", "varlink"
Later commits will improve startup time for more code paths.
* fix some typos and misspellings
* CHANGELOG.md: Add Performance section (preliminary)
* Add a CHANGELOG.md entry for this PR
This will be needed to later support zero-copy deserialization of independent
syntax sets, but is interesting and useful on its own.
Instead of deferring serialization and deserialization to syntect, we implement it
ourselves in the same way, but make compression optional.
We can't use #[from] on Error::Msg(String) because String does not implement Error.
(Which it shouldn't; see e.g. https://internals.rust-lang.org/t/impl-error-for-string/8881.)
So we implement From manually for Error::Msg, since our current code was written
in that way for error-chain.
Move code to build assets to its own file. That results in better modularity and flexibility.
It also allows us to simplify HighlightingAssets a lot, since it will now always
be initialized with a SerializedSyntaxSet.
To improve startup performance, we will later load smaller `SyntaxSet`s instead
of one giant one. However, the current API assumes only one `SyntaxSet` is ever used,
and that that implicitly is the `SyntaxSet` from which returned `SyntaxReference`s
comes.
This change changes the API to reflect that `SyntaxSet` and `SyntaxReference`
are tightly coupled, and enables the use of several `SyntaxSet`.
Instead of 100 ms - 50 ms, startup takes 10 ms - 5 ms.
HighlightingAssets::get_syntax_set() is never called when e.g. piping the bat
output to a file (see Config::loop_through), so by loading the SyntaxSet only
when needed, we radically improve startup time when it is not needed.
They are just a way to get access to data embedded in the binary, so they don't
conceptually belong inside HighlightingAssets.
This has the nice side effect of getting HighlightingAssets::from_cache() and
::from_binary(), that are highly related, next to each other.
Or rather, introduce new versions of these methods and deprecate the old ones.
This is preparation to enable robust and user-friendly support for lazy-loading.
With lazy-loading, we don't know if the SyntaxSet is valid until after we try to
use it, so wherever we try to use it, we need to return a Result. See discussion
about panics in #1747.
Using BufReader makes sense for large files, but assets are never large enough
to require buffering. It is significantly faster to load the file contents in
one go, so let's do that instead.
Closes#1753
It already now reduces code duplication slightly, but will become even more
useful in the future when we add more complicated logic such as lazy-loading.
Since we only modify `pub(crate)` items, the stable bat-as-a-library API is not
affected.
This takes us one step closer to making SyntaxSet lazy-loaded, which in turn
takes us one step closer to solving #951.
This will fix#614 by making it clear what is wrong by showing the
following error message:
Failed to load one or more themes from
'/Users/me/.config/bat/themes' (reason: 'Invalid syntax theme
settings')
We also need to add a check if theme_dir.exists(), otherwise an absent
dir will seem like an error:
Failed to load one or more themes from
'/Users/me/.config/bat/themes' (reason: 'IO error for
operation on /Users/me/.config/bat/themes: No such file or
directory (os error 2)')
(This is the same check we already have for syntax_dir.)
This macro is intended to be package-internal and is not to be
considered part of the public lib API.
Use it in three places to reduce code duplication. However, main reason
for this refactoring is to allow us to fix#1063 without duplicating the
code yet another time.
The macro can also be used for the "Binary content from {} will not be
printed to the terminal" message if that message starts to use eprintln!
instead (if ever).
To trigger/verify the changed code, the following commands can be used:
cargo run -- --theme=ansi-light tests/examples/single-line.txt
cargo run -- --theme=does-not-exist tests/examples/single-line.txt
cargo run -- --style=grid,rule tests/examples/single-line.txt
This combines ansi-light and ansi-dark into a single theme that works
with both light and dark backgrounds. Instead of specifying white/black,
the ansi theme uses the terminal's default foreground/background color
by setting alpha=01, i.e. #00000001. This is in addition to the alpha=00
encoding where red contains an ANSI color palette number.
Now, `--theme ansi-light` and `--theme ansi-dark` will print a
deprecation notice and use ansi instead (unless the user has a custom
theme named ansi-light or ansi-dark, which would take precedence).