This PR adds a new field optional field `timestamp_location` that allows
the user to specify a timezone different than the default UTC for use in
the snapshot suffix.
I took @mjasnik 's PR https://github.com/zrepl/zrepl/pull/785 and
refactored+extended it as follows:
* move all formatting logic into its own package
* disallow `dense` and `human` with formats != UTC to protect users from
stupidity
* document behavior more clearly
* regression test for existing users
Go upgrade:
- Go 1.23 is current => use that for release builds
- Go 1.22 is less than one year old, it's desirable to support it.
- The [`Go Toolchains`](https://go.dev/doc/toolchain) stuff is available
in both of these (would also be in Go 1.21). That is quite nice stuff,
but required some changes to how we versions we use in CircleCI and
the `release-docker` Makefile target.
Protobuf upgrade:
- Go to protobuf GH release website
- Download latest locally
- run `sha256sum`
- replace existing pinned hashes
- `make generate`
Deps upgrade:
- `go get -t -u all`
- repository moves aren't handled well automatically, fix manually
- repeat until no changes
fixes https://github.com/zrepl/zrepl/issues/742
Before this PR, when chaining replication from
A => B => C, if B had placeholders and the `filesystems`
included these placeholders, we'd incorrectly
fail the planning phase with error
`sender does not have any versions`.
The non-placeholder child filesystems of these placeholders
would then fail to replicate because of the
initial-replication-dependency-tracking that we do, i.e.,
their parent failed to initially replication, hence
they fail to replicate as well
(`parent(s) failed during initial replication`).
We can do better than that because we have the information
whether a sender-side filesystem is a placeholder.
This PR makes the planner act on that information.
The outcome is that placeholders are replicated as
placeholders (albeit the receiver remains in control
of how these placeholders are created, i.e., `recv.placeholders`)
The mechanism to do it is:
1. Don't plan any replication steps for filesystems that
are placeholders on the sender.
2. Ensure that, if a receiving-side filesystem exists, it
is indeed a placeholder.
Check (2) may seem overly restrictive, but, the goal here
is not just to mirror all non-placeholder filesystems, but
also to mirror the hierarchy.
Testing performed:
- [x] confirm with issue reporter that this PR fixes their issue
- [x] add a regression test that fails without the changes in this PR
From https://github.com/zrepl/zrepl/issues/691
The last_n prune rule keeps everything, regardless of if it matches the
regex or not, if there are less than count snapshot. The expectation
would be to never keep non-regex snapshots, regardless of number.
Because some jobs add client identity to root_fs and other jobs don't do
that,
we can't reliable detect overlapping of filesystems. And and the same
time we
need an ability to use equal or overlapped root_fs for different jobs.
For
instance see this config:
```
- name: "zdisk"
type: "sink"
root_fs: "zdisk/zrepl"
serve:
type: "local"
listener_name: "zdisk"
```
and
```
- name: "remote-to-zdisk"
type: "pull"
connect:
type: "tls"
root_fs: "zdisk/zrepl/remote"
```
As you can see, two jobs have overlapped root_fs, but actually datasets
are not
overlapped, because job `zdisk` save everything under
`zdisk/zrepl/localhost`,
because it adds client identity. So they actually use two different
filesystems:
`zdisk/zrepl/localhost` and `zdisk/zrepl/remote`. And we can't detect
this
situation during config check. So let's just remove this check, because
it's
admin's duty to configure correct root_fs's.
---------
Co-authored-by: Christian Schwarz <me@cschwarz.com>
Command used:
```
cd build
go get -u github.com/golangci/golangci-lint/cmd/golangci-lint
go mod tidy
```
Further, golangci-lint requires go 1.20 to build, so, use that as the lowest version in the CI.
The sphinxcontrib-versioning seems unmaintainted and I can't
get the fork that we used before this PR working on Python 3.10.
The situation wrt maintenance doesn't seem much better for
sphinx-multiversion, but, at least I could get it to work
with current sphinx versions.
The main problem with sphinx-multiversion is that it doesn't render
anything at `/`. I.e., `https://zrepl.github.io/configuration.html` will
be 404.
That's different from `sphinxcontrib-versioning`, and thus switching
to sphinx-multiversion would break URLs.
We host on GitHub pages and don't control the webserver,
so, we can't use webserver-level redirects to keep the URLs working.
We could create JS-level redirects, or `http-equiv`, but that's ugly as
well.
The simplest solution was to fork sphinx-multiversion and hard-code
zrepl's specific needs into that fork.
The fork is based off v0.2.4 and pinned via requirements.txt.
Here are its unique commits:
https://github.com/Holzhaus/sphinx-multiversion/compare/master...zrepl:sphinx-multiversion:zrepl
We should revisit `sphinx-polyversion` in the future once its docs
improve.
See
https://github.com/Holzhaus/sphinx-multiversion/issues/88#issuecomment-1606221194
This PR updates the various Python packages, as I couldn't get
sphinx-multiversion to work with the (very old) versions that were
pinned in `requirements.txt` prior to this PR.
This PR's `requirements.txt` is from a clean Python 3.10 venv on Ubuntu
22.10 after running
```
pip install sphinx sphinx-rtd-theme
pip install 'git+https://github.com/zrepl/sphinx-multiversion/@52c915d7ad898d9641ec48c8bbccb7d4f079db93#egg=sphinx_multiversion'
```
Before this PR, we would panic in the `check` phase of `endpoint.Send()`'s `TryBatchDestroy` call in the following cases: the current protection strategy does NOT produce a tentative replication cursor AND
* `FromVersion` is a tentative cursor bookmark
* `FromVersion` is a snapshot, and there exists a tentative cursor bookmark for that snapshot
* `FromVersion` is a bookmark != tentative cursor bookmark, but there exists a tentative cursor bookmark for the same snapshot as the `FromVersion` bookmark
In those cases, the `check` concluded that we would delete `FromVersion`.
It came to that conclusion because the tentative cursor isn't part of `obsoleteAbs` if the protection strategy doesn't produce a tentative replication cursor.
The scenarios above can happen if the user changes the protection strategy from "with tentative cursor" to one "without tentative replication cursor", while there is a tentative replication cursor on disk.
The workaround was to rename the tentative cursor.
In all cases above, `TryBatchDestroy` would have destroyed the tentative cursor.
In case 1, that would fail the `Send` step and potentially break replication if the cursor is the last common bookmark. The `check` conclusion was correct.
In cases 2 and 3, deleting the tentative cursor would have been fine because `FromVersion` was a different entity than the tentative cursor. So, destroying the tentative cursor would be the right call.
The solution in this PR is as follows:
* add the `FromVersion` to the `liveAbs` set of live abstractions
* rewrite the `check` closure to use the full dataset path (`fullpath`) to identify the concrete ZFS object instead of the `zfs.FilesystemVersionEqualIdentity`, which is only identified by matching GUID.
* Holds have no dataset path and are not the `FromVersion` in any case, so disregard them.
fixes#666
This PR adds a Prometheus counter called
`zrepl_zfs_list_unmatched_user_specified_dataset_count`.
Monitor for increases of the counter to detect filesystem filter rules that
have no effect because they don't match any local filesystem.
An example use case for this is the following story:
1. Someone sets up zrepl with `filesystems` filter for `zroot/pg14<`.
2. During the upgrade to Postgres 15, they rename the dataset to `zroot/pg15`,
but forget to update the zrepl `filesystems` filter.
3. zrepl will not snapshot / replicate the `zroot/pg15<` datasets.
Since `filesystems` rules are always evaluated on the side that has the datasets,
we can smuggle this functionality into the `zfs` module's `ZFSList` function that
is used by all jobs with a `filesystems` filter.
Dashboard changes:
- histogram with increase in $__interval, one row per job
- table with increase in $__range
- explainer text box, so, people know what the previous two are about
We had to re-arrange some panels, hence the Git diff isn't great.
closes https://github.com/zrepl/zrepl/pull/653
Co-authored-by: Christian Schwarz <me@cschwarz.com>
Co-authored-by: Goran Mekić <meka@tilda.center>