zrepl

mirror of https://github.com/zrepl/zrepl.git synced 2024-11-25 18:04:58 +01:00

Author	SHA1	Message	Date
Christian Schwarz	bbdc6f5465	fix handling of tenative cursor presence if protection strategy doesn't use it (#714 ) Before this PR, we would panic in the `check` phase of `endpoint.Send()`'s `TryBatchDestroy` call in the following cases: the current protection strategy does NOT produce a tentative replication cursor AND * `FromVersion` is a tentative cursor bookmark * `FromVersion` is a snapshot, and there exists a tentative cursor bookmark for that snapshot * `FromVersion` is a bookmark != tentative cursor bookmark, but there exists a tentative cursor bookmark for the same snapshot as the `FromVersion` bookmark In those cases, the `check` concluded that we would delete `FromVersion`. It came to that conclusion because the tentative cursor isn't part of `obsoleteAbs` if the protection strategy doesn't produce a tentative replication cursor. The scenarios above can happen if the user changes the protection strategy from "with tentative cursor" to one "without tentative replication cursor", while there is a tentative replication cursor on disk. The workaround was to rename the tentative cursor. In all cases above, `TryBatchDestroy` would have destroyed the tentative cursor. In case 1, that would fail the `Send` step and potentially break replication if the cursor is the last common bookmark. The `check` conclusion was correct. In cases 2 and 3, deleting the tentative cursor would have been fine because `FromVersion` was a different entity than the tentative cursor. So, destroying the tentative cursor would be the right call. The solution in this PR is as follows: * add the `FromVersion` to the `liveAbs` set of live abstractions * rewrite the `check` closure to use the full dataset path (`fullpath`) to identify the concrete ZFS object instead of the `zfs.FilesystemVersionEqualIdentity`, which is only identified by matching GUID. * Holds have no dataset path and are not the `FromVersion` in any case, so disregard them. fixes #666	2023-07-04 20:21:48 +02:00
Christian Schwarz	2d8c3692ec	rework resume token validation to allow resuming from raw sends of unencrypted datasets Before this change, resuming from an unencrypted dataset with send.raw=true specified wouldn't work with zrepl due to overly restrictive resume token checking. An initial PR to fix this was made in https://github.com/zrepl/zrepl/pull/503 but it didn't address the core of the problem. The core of the problem was that zrepl assumed that if a resume token contained `rawok=true, compressok=true`, the resulting send would be encrypted. But if the sender dataset was unencrypted, such a resume would actually result in an unencrypted send. Which could be totally legitimate but zrepl failed to recognize that. BACKGROUND ========== The following snippets of OpenZFS code are insightful regarding how the various ${X}ok values in the resume token are handled: - `6c3c5fcfbe/module/zfs/dmu_send.c (L1947-L2012)` - `6c3c5fcfbe/module/zfs/dmu_recv.c (L877-L891)` - https://github.com/openzfs/zfs/blob/6c3c5fc/lib/libzfs/libzfs_sendrecv.c#L1663-L1672 Basically, some zfs send flags make the DMU send code set some DMU send stream featureflags, although it's not a pure mapping, i.e, which DMU send stream flags are used depends somewhat on the dataset (e.g., is it encrypted or not, or, does it use zstd or not). Then, the receiver looks at some (but not all) feature flags and maps them to ${X}ok dataset zap attributes. These are funnelled back to the sender 1:1 through the resume_token. And the sender turns them into lzc flags. As an example, let's look at zfs send --raw. if the sender requests a raw send on an unencrypted dataset, the send stream (and hence the resume token) will not have the raw stream featureflag set, and hence the resume token will not have the rawok field set. Instead, it will have compressok, embedok, and depending on whether large blocks are present in the dataset, largeblockok set. WHAT'S ZREPL'S ROLE IN THIS? ============================ zrepl provides a virtual encrypted sendflag that is like `raw`, but further ensures that we only send encrypted datasets. For any other resume token stuff, it shoudn't do any checking, because it's a futile effort to keep up with ZFS send/recv features that are orthogonal to encryption. CHANGES MADE IN THIS COMMIT =========================== - Rip out a bunch of needless checking that zrepl would do during planning. These checks were there to give better error messages, but actually, the error messages created by the endpoint.Sender.Send RPC upon send args validation failure are good enough. - Add platformtests to validate all combinations of (Unencrypted/Encrypted FS) x (send.encrypted = true \| false) x (send.raw = true \| false) for cases both non-resuming and resuming send. Additional manual testing done: 1. With zrepl 0.5, setup with unencrypted dataset, send.raw=true specified, no send.encrypted specified. 2. Observe that regular non-resuming send works, but resuming doesn't work. 3. Upgrade zrepl to this change. 4. Observe that both regular and resuming send works. closes https://github.com/zrepl/zrepl/pull/613	2022-09-25 17:32:02 +02:00
Christian Schwarz	2642c64303	make initial replication policy configurable (most_recent, all, fail) Config: ``` - type: push ... conflict_resolution: initial_replication: most_recent \| all \| fali ``` The ``initial_replication`` option determines which snapshots zrepl replicates if the filesystem has not been replicated before. If ``most_recent`` (the default), the initial replication will only transfer the most recent snapshot, while ignoring previous snapshots. If all snapshots should be replicated, specify ``all``. Use ``fail`` to make replication of the filesystem fail in case there is no corresponding fileystem on the receiver. Code-Level Changes, apart from the obvious: - Rework IncrementalPath()'s return signature. Now returns an error for initial replications as well. - Rename & rework it's consumer, resolveConflict(). Co-authored-by: Graham Christensen <graham@grahamc.com> Fixes https://github.com/zrepl/zrepl/issues/550 Fixes https://github.com/zrepl/zrepl/issues/187 Closes https://github.com/zrepl/zrepl/pull/592	2022-06-26 14:36:59 +02:00
Christian Schwarz	fb6a9be954	fix encrypt-on-receive with placeholders fixes https://github.com/zrepl/zrepl/issues/504 Problem: plain send + recv with root_fs encrypted + placeholders causes plain recvs whereas user would expect encrypt-on-recv Reason: We create placeholder filesytems with -o encryption=off. Thus, children received below those placeholders won't inherit encryption of root_fs. Fix: We'll have three values for `recv.placeholders.encryption: unspecified (default) \| off \| inherit`. When we create a placeholder, we will fail the operation if `recv.placeholders.encryption = unspecified`. The exception is if the placeholder filesystem is to encode the client identity ($root_fs/$client_identity) in a pull job. Those are created in `inherit` mode if the config field is `unspecified` so that users who don't need placeholders are not bothered by these details. Future Work: Automatically warn existing users of encrypt-on-recv about the problem if they are affected. The problem that I hit during implementation of this is that the `encryption` prop's `source` doesn't quite behave like other props: `source` is `default` for `encryption=off` and `-` when `encryption=on`. Hence, we can't use `source` to distinguish the following 2x2 cases: (1) placeholder created with explicit -o encryption=off (2) placeholder created without specifying -o encryption with (A) an encrypted parent at creation time (B) an unencrypted parent at creation time	2021-12-18 15:12:47 +01:00
Christian Schwarz	a8e92971d0	zfs: rewrite SendStream, fix bug in Close() on FreeBSD, add platformtests This commit was motivated by https://github.com/zrepl/zrepl/issues/495 where, on FreeBSD with OpenZFS 2.0, a SendStream.Close() call might wait indefinitely for `zfs send` to exit. The reason is that, due to the refactoring done for redacted send & recv (`30af21b025`), the `dump_bytes` function, which writes to the pipe, executes in a separate thread (synctask taskq) iff not `HAVE_LARGE_STACKS`. The `zfs send` process/thread waits for that taskq thread using an uninterruptible primitive. So when we SIGKILL `zfs send`, that signal doesn't reach the right thread to interrupt the pipe write. Theoretically this affects both Linux and FreeBSD, but most Linux users `HAVE_LARGE_STACKS` and since https://github.com/penzfs/zfs/pull/12350/files OpenZFS on FreeBSD `HAVE_LARGE_STACKS` as well. However, at least until FreeBSD 13.1, possibly for the entire 13 lifecycle, we're going to have to live with that oddity. Measures taken in this commit: - Report the behavior as an upstream bug https://github.com/openzfs/zfs/issues/12500 - Change SendStream code so that it closes zrepl's read-end of the pipe (see comment in code) - Clean up and make explicit SendStream's state handling - Write extensive platformtests for SendStream - They pass on my Linux install and on FreeBSD 12 - FreeBSD 13 still needs testing. fixes https://github.com/zrepl/zrepl/issues/495	2021-09-19 20:11:31 +02:00
InsanePrawn	393fc10a69	[#285 ] support setting zfs send / recv flags in the config (send: -wLcepbS, recv: -ox) Co-authored-by: Christian Schwarz <me@cschwarz.com> Signed-off-by: InsanePrawn <insane.prawny@gmail.com> closes #285 closes #276 closes #24	2021-02-20 17:20:45 +01:00
Christian Schwarz	0c189265e8	platformtests: fix skipping encryption-only tests on systems that don't support encryption (Or split the test into two tests of which one is skipped depending on encryption support)	2020-09-05 16:04:34 +02:00
Christian Schwarz	02db5994fe	[#345 ] fix broken identification of parent-fs for initial replication ordering fixup of `02807279` fixes #345	2020-07-26 20:32:35 +02:00
Christian Schwarz	a0e3dc7040	[#348 ] replication: add platformtest to check behavior on recv fail while still sending Regression test for #348	2020-07-26 20:32:35 +02:00
Christian Schwarz	30cdc1430e	replication + endpoint: replication guarantees: guarantee_{resumability,incremental,nothing} This commit - adds a configuration in which no step holds, replication cursors, etc. are created - removes the send.step_holds.disable_incremental setting - creates a new config option `replication` for active-side jobs - adds the replication.protection.{initial,incremental} settings, each of which can have values - `guarantee_resumability` - `guarantee_incremental` - `guarantee_nothing` (refer to docs/configuration/replication.rst for semantics) The `replication` config from an active side is sent to both endpoint.Sender and endpoint.Receiver for each replication step. Sender and Receiver then act accordingly. For `guarantee_incremental`, we add the new `tentative-replication-cursor` abstraction. The necessity for that abstraction is outlined in https://github.com/zrepl/zrepl/issues/340. fixes https://github.com/zrepl/zrepl/issues/340	2020-07-26 20:32:35 +02:00
Christian Schwarz	1c270b7e39	add option to disable step holds for incremental sends This is a stop-gap solution until we re-write the pruner to support rules for removing step holds. Note that disabling step holds for incremental sends does not affect zrepl's guarantee that incremental replication is always possible: Suppose you yank the external drive during an incremental @from -> @to step: * restarting that step or future incrementals @from -> @to_later` will be possible because the replication cursor bookmark points to @from until the step is complete * resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to. * in that case, the replication algorithm should determine that the resumable state on the receiving side isuseless because @to no longer exists on the sending side, and consequently clear it, and restart an incremental step @from -> @to_later refs #288	2020-06-14 15:26:05 +02:00
Christian Schwarz	94a0fbf953	[#321 ] platformtest: add test for zfs.ZFSHolds	2020-06-14 15:21:36 +02:00
Christian Schwarz	b056e7b2b9	[#321 ] endpoint: ListAbstractions: acutally emit one Abstraction per matching hold	2020-06-14 15:21:36 +02:00
Christian Schwarz	6e927f20f9	[#321 ] platformtest: minimal integration tests for package replication # Conflicts: # platformtest/tests/generated_cases.go	2020-06-14 15:21:36 +02:00
Christian Schwarz	301f163a44	[#321 ] platformtest: generate test case list + coverage tooling	2020-06-14 15:21:36 +02:00

15 Commits