A relatively straight forward change that adds "include" key support in
the main configuration file.
The config parser uses the list under this key to open the included
configuration files parse them as configs and append their jobs to the
main config file.
As per some of the discussion items I modified the proposal to have a
special top level includes key as well as added references to handling
of specific file includes.
**Note: This is a WIP PR that presently only adds the documentation for
the features so that proposal can be discussed.**
Background
==========
While trying to use zrepl in my ansible driven home lab deployment I ran
into an interesting problem. My ZFS based servers subscribe to different
sometimes overlapping roles.
Example role distribution between two servers:
```
serverA:
- common
- web
- file
serverB:
- common
- git
- file
```
Each role wants to create and manage a ZFS dataset with its own
replication / backup policies:
- web: pool/web
- git: pool/git
- file pool/file
At present the creation of a ZFS dataset from each role role is somewhat
very easy, so to is the creation of the basic zrepl configuration file
from the "common" role.
However, when each role tries to register it's job(s) into the singular
zrepl configuration files things get tricky.
I could try adding a role at the end that hardcodes the datasets that
need to be backed up but that seems a bit hacky.
I could also use ansible's `lineinfile` task to try to idempotently add
each dataset's snapshot jobs to the zrepl configuration files but that
causes a problem:
Everytime the "common" role gets run the basic zrepl configuration file
gets re-created causing the web, git, and file roles to all register
changes as they have to re-insert all jobs back into the singular
configuration file.
Proposed Solution
=================
The proposed solution is to allow for the distribution of zrepl job
definition between multiple different YAML files that can be included
from the main zrepl configuration files.
```
global: ...
jobs:
include: jobs.d
```
This directive would be only acceptable in the main configuration file
and is mutually exclusive with any other job definitions in the file.
To keep things lean there will be no conflict resolution provided to
users, job names must be unique across all included job YAML files.
With this feature, the above problem becomes much simpler:
- Common: Sets up the global zrepl configuration and the include
directive
- web/git/file: Each manage their own datasets and create their
jobs.d/web.yml, jobs.d/git.yml, and jobs.d/file.yml.
While investigating https://github.com/zrepl/zrepl/issues/700
I checked in on `zrepl status` dependencies and found that
`cview`, which was/is a fork of tview, appears to be unmaintained.
We switched to it 4.5 years ago in a58ce74.
Checking now, `github.com/rivo/tview` seems to be somewhat maintained
again.
I also checked what k9s uses because that tool came to mind as a Go
terminal UI app.
It does use `tview`, but, a fork that has diverged substantially.
Maybe in another 4.5 years stuff the ecosystem has consolidated...
refs https://github.com/zrepl/zrepl/issues/700
It pains me to do it, but, especially with hooks, the Protect
settings are too restrictive.
I wish there were a systemd API that allowed us to self-sandbox,
using these settings, _after_ parsing the config.
fixes https://github.com/zrepl/zrepl/issues/735
This PR adds a new field optional field `timestamp_location` that allows
the user to specify a timezone different than the default UTC for use in
the snapshot suffix.
I took @mjasnik 's PR https://github.com/zrepl/zrepl/pull/785 and
refactored+extended it as follows:
* move all formatting logic into its own package
* disallow `dense` and `human` with formats != UTC to protect users from
stupidity
* document behavior more clearly
* regression test for existing users
Go upgrade:
- Go 1.23 is current => use that for release builds
- Go 1.22 is less than one year old, it's desirable to support it.
- The [`Go Toolchains`](https://go.dev/doc/toolchain) stuff is available
in both of these (would also be in Go 1.21). That is quite nice stuff,
but required some changes to how we versions we use in CircleCI and
the `release-docker` Makefile target.
Protobuf upgrade:
- Go to protobuf GH release website
- Download latest locally
- run `sha256sum`
- replace existing pinned hashes
- `make generate`
Deps upgrade:
- `go get -t -u all`
- repository moves aren't handled well automatically, fix manually
- repeat until no changes
fixes https://github.com/zrepl/zrepl/issues/742
Before this PR, when chaining replication from
A => B => C, if B had placeholders and the `filesystems`
included these placeholders, we'd incorrectly
fail the planning phase with error
`sender does not have any versions`.
The non-placeholder child filesystems of these placeholders
would then fail to replicate because of the
initial-replication-dependency-tracking that we do, i.e.,
their parent failed to initially replication, hence
they fail to replicate as well
(`parent(s) failed during initial replication`).
We can do better than that because we have the information
whether a sender-side filesystem is a placeholder.
This PR makes the planner act on that information.
The outcome is that placeholders are replicated as
placeholders (albeit the receiver remains in control
of how these placeholders are created, i.e., `recv.placeholders`)
The mechanism to do it is:
1. Don't plan any replication steps for filesystems that
are placeholders on the sender.
2. Ensure that, if a receiving-side filesystem exists, it
is indeed a placeholder.
Check (2) may seem overly restrictive, but, the goal here
is not just to mirror all non-placeholder filesystems, but
also to mirror the hierarchy.
Testing performed:
- [x] confirm with issue reporter that this PR fixes their issue
- [x] add a regression test that fails without the changes in this PR
From https://github.com/zrepl/zrepl/issues/691
The last_n prune rule keeps everything, regardless of if it matches the
regex or not, if there are less than count snapshot. The expectation
would be to never keep non-regex snapshots, regardless of number.
Because some jobs add client identity to root_fs and other jobs don't do
that,
we can't reliable detect overlapping of filesystems. And and the same
time we
need an ability to use equal or overlapped root_fs for different jobs.
For
instance see this config:
```
- name: "zdisk"
type: "sink"
root_fs: "zdisk/zrepl"
serve:
type: "local"
listener_name: "zdisk"
```
and
```
- name: "remote-to-zdisk"
type: "pull"
connect:
type: "tls"
root_fs: "zdisk/zrepl/remote"
```
As you can see, two jobs have overlapped root_fs, but actually datasets
are not
overlapped, because job `zdisk` save everything under
`zdisk/zrepl/localhost`,
because it adds client identity. So they actually use two different
filesystems:
`zdisk/zrepl/localhost` and `zdisk/zrepl/remote`. And we can't detect
this
situation during config check. So let's just remove this check, because
it's
admin's duty to configure correct root_fs's.
---------
Co-authored-by: Christian Schwarz <me@cschwarz.com>