new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold

- **Resumable Send & Recv Support** No knobs required, automatically used where supported. - **Hold-Protected Send & Recv** Automatic ZFS holds to ensure that we can always resume a replication step. - **Encrypted Send & Recv Support** for OpenZFS native encryption. Configurable at the job level, i.e., for all filesystems a job is responsible for. - **Receive-side hold on last received dataset** The counterpart to the replication cursor bookmark on the send-side. Ensures that incremental replication will always be possible between a sender and receiver. Design Doc ---------- `replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable. The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc). We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc. This might BREAK CONFIG on upgrade. Protocol Version Bump --------------------- This commit makes backwards-incompatible changes to the replication/pdu protobufs. Thus, bump the version number used in the protocol handshake. Replication Cursor Format Change -------------------------------- The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}` Including the GUID enables transaction-safe moving-forward of the cursor. Including the job id enables that multiple sending jobs can send the same filesystem without interfering. The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors. Changes in This Commit ---------------------- - package zfs - infrastructure for holds - infrastructure for resume token decoding - implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code - ZFSSendArgs to specify a ZFS send operation - validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`) - RecvOptions support for `recv -s` flag - convert a bunch of ZFS operations to be idempotent - achieved through more differentiated error message scraping / additional pre-/post-checks - package replication/pdu - add field for encryption to send request messages - add fields for resume handling to send & recv request messages - receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into - can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream) - used to set `last-received-hold` - package replication/logic - introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender - integrate encryption and resume token support into `Step` struct - package endpoint - move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go` - step-holds + step-bookmarks - last-received-hold - new replication cursor + old replication cursor compat code - adjust `endpoint/endpoint.go` handlers for - encryption - resumability - new replication cursor - last-received-hold - client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it - client subcommand `zrepl migrate replication-cursor:v1-v2`
2025-08-15 09:32:25 +02:00 · 2019-09-11 17:19:17 +02:00
parent 9a4763ceee
commit 58c08c855f
72 changed files with 5445 additions and 818 deletions
--- a/docs/configuration/jobs.rst
+++ b/docs/configuration/jobs.rst
@ -24,6 +24,8 @@ Job Type ``push``
      - |connect-transport|
    * - ``filesystems``
      - |filter-spec| for filesystems to be snapshotted and pushed to the sink
+    * - ``send``
+      - |send-options| 
    * - ``snapshotting``
      - |snapshotting-spec|
    * - ``pruning``
@ -101,6 +103,8 @@ Job Type ``source``
      - |serve-transport|
    * - ``filesystems``
      - |filter-spec| for filesystems to be snapshotted and exposed to connecting clients
+    * - ``send``
+      - |send-options| 
    * - ``snapshotting``
      - |snapshotting-spec|

--- a/docs/configuration/overview.rst
+++ b/docs/configuration/overview.rst
@ -90,6 +90,8 @@ It uses the client identity for access control:
 .. TIP::
   The implementation of the ``sink`` job requires that the connecting client identities be a valid ZFS filesystem name components.

+.. _overview-how-replication-works:
+
 How Replication Works
 ---------------------

@ -107,9 +109,8 @@ The following steps take place during replication and can be monitored using the
    * Per filesystem, compute a diff between sender and receiver snapshots
    * Build a list of replication steps

-      * If possible, use incremental sends (``zfs send -i``)
+      * If possible, use incremental and resumable sends (``zfs send -i``)
      * Otherwise, use full send of most recent snapshot on sender
-      * Give up on filesystems that cannot be replicated without data loss

  * Retry on errors that are likely temporary (i.e. network failures).
  * Give up on filesystems where a permanent error was received over RPC.
@ -119,17 +120,23 @@ The following steps take place during replication and can be monitored using the
  * Perform replication steps in the following order:
    Among all filesystems with pending replication steps, pick the filesystem whose next replication step's snapshot is the oldest.
  * Create placeholder filesystems on the receiving side to mirror the dataset paths on the sender to ``root_fs/${client_identity}``.
-  * After a successful replication step, update the replication cursor bookmark (see below).
+  * Aquire send-side step-holds on the step's `from` and `to` snapshots.
+  * Perform the replication step.
+  * Move the **replication cursor** bookmark on the sending side (see below).
+  * Move the **last-received-hold** on the receiving side (see below).
+  * Release the send-side step-holds.
   
 The idea behind the execution order of replication steps is that if the sender snapshots all filesystems simultaneously at fixed intervals, the receiver will have all filesystems snapshotted at time ``T1`` before the first snapshot at ``T2 = T1 + $interval`` is replicated.

-.. _replication-cursor-bookmark:
+.. _replication-cursor-and-last-received-hold:

-The **replication cursor bookmark** ``#zrepl_replication_cursor`` is kept per filesystem on the sending side of a replication setup:
-It is a bookmark of the most recent successfully replicated snapshot to the receiving side.
-It is is used by the :ref:`not_replicated <prune-keep-not-replicated>` keep rule to identify all snapshots that have not yet been replicated to the receiving side.
-Regardless of whether that keep rule is used, the bookmark ensures that replication can always continue incrementally.
-Note that there is only one cursor bookmark per filesystem, which prohibits multiple jobs to replicate the same filesystem (:ref:`see below<jobs-multiple-jobs>`).
+**Replication cursor** bookmark and **last-received-hold** are managed by zrepl to ensure that future replications can always be done incrementally:
+the replication cursor is a send-side bookmark of the most recent successfully replicated snapshot,
+and the last-received-hold is a hold of that snapshot on the receiving side.
+The replication cursor has the format ``#zrepl_CUSOR_G_<GUID>_J_<JOBNAME>``.
+The last-received-hold tag has the format ``#zrepl_last_received_J_<JOBNAME>``.
+Encoding the job name in the names ensures that multiple sending jobs can replicate the same filesystem to different receivers without interference.
+The ``zrepl holds list`` provides a listing of all bookmarks and holds managed by zrepl.

 .. _replication-placeholder-property:

@ -144,9 +151,13 @@ The ``zrepl test placeholder`` command can be used to check whether a filesystem
 .. ATTENTION::

    Currently, zrepl does not replicate filesystem properties.
-    Whe receiving a filesystem, it is never mounted (`-u` flag)  and `mountpoint=none` is set.
+    When receiving a filesystem, it is never mounted (`-u` flag)  and `mountpoint=none` is set.
    This is temporary and being worked on :issue:`24`.

+.. NOTE::
+
+    More details can be found in the design document :repomasterlink:`replication/design.md`.
+

 .. _jobs-multiple-jobs:

@ -171,7 +182,7 @@ No Overlapping
 Jobs run independently of each other.
 If two jobs match the same filesystem with their ``filesystems`` filter, they will operate on that filesystem independently and potentially in parallel.
 For example, if job A prunes snapshots that job B is planning to replicate, the replication will fail because B asssumed the snapshot to still be present.
-More subtle race conditions can occur with the :ref:`replication cursor bookmark <replication-cursor-bookmark>`, which currently only exists once per filesystem.
+However, the next replication attempt will re-examine the situation from scratch and should work.

 N push jobs to 1 sink
 ~~~~~~~~~~~~~~~~~~~~~
--- a/docs/configuration/prune.rst
+++ b/docs/configuration/prune.rst
@ -66,7 +66,7 @@ Policy ``not_replicated``

 ``not_replicated`` keeps all snapshots that have not been replicated to the receiving side.
 It only makes sense to specify this rule on a sender (source or push job).
-The state required to evaluate this rule is stored in the :ref:`replication cursor bookmark <replication-cursor-bookmark>` on the sending side.
+The state required to evaluate this rule is stored in the :ref:`replication cursor bookmark <replication-cursor-and-last-received-hold>` on the sending side.

 .. _prune-keep-retention-grid:

--- a/docs/configuration/sendrecvoptions.rst
+++ b/docs/configuration/sendrecvoptions.rst
@ -0,0 +1,45 @@
+.. include:: ../global.rst.inc
+
+
+Send & Recv Options
+===================
+
+.. _job-send-options:
+
+Send Options
+~~~~~~~~~~~~
+
+::
+   
+   jobs:
+   - type: push
+     filesystems: ...
+     send:
+       encrypted: true
+     ...
+
+:ref:`Source<job-source>` and :ref:`push<job-push>` jobs have an optional ``send`` configuration section.
+
+``encryption`` option
+---------------------
+
+The ``encryption`` variable controls whether the matched filesystems are sent as `OpenZFS native encryption <http://open-zfs.org/wiki/ZFS-Native_Encryption>`_ raw sends.
+More specificially, if ``encryption=true``, zrepl
+
+* checks for any of the filesystems matched by ``filesystems`` whether the ZFS ``encryption`` property indicates that the filesystem is actually encrypted with ZFS native encryption and
+* invokes the ``zfs send`` subcommand with the ``-w`` option (raw sends) and
+* expects the receiving side to support OpenZFS native encryption (recv will fail otherwise)
+
+Filesystems matched by ``filesystems`` that are not encrypted are not sent and will cause error log messages.
+
+If ``encryption=false``, zrepl expects that filesystems matching ``filesystems`` are not encrypted or have loaded encryption keys.
+
+.. _job-recv-options:
+
+Recv Options
+~~~~~~~~~~~~
+
+:ref:`Sink<job-sink>` and :ref:`pull<job-pull>` jobs have an optional ``recv`` configuration section.
+However, there are currently no variables to configure there.
+
+