mirror of
https://github.com/zrepl/zrepl.git
synced 2024-11-25 01:44:43 +01:00
docs: update multi-job & multi-host setup section
This commit is contained in:
parent
41b4038ad5
commit
8839ed1f95
@ -80,16 +80,19 @@ The active side (:ref:`push <job-push>` and :ref:`pull <job-pull>` job) executes
|
|||||||
.. TIP::
|
.. TIP::
|
||||||
The progress of the active side can be watched live using the ``zrepl status`` subcommand.
|
The progress of the active side can be watched live using the ``zrepl status`` subcommand.
|
||||||
|
|
||||||
|
.. _overview-passive-side--client-identity:
|
||||||
|
|
||||||
How the Passive Side Works
|
How the Passive Side Works
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
The passive side (:ref:`sink <job-sink>` and :ref:`source <job-source>`) waits for connections from the corresponding active side,
|
The passive side (:ref:`sink <job-sink>` and :ref:`source <job-source>`) waits for connections from the corresponding active side,
|
||||||
using the transport listener type specified in the ``serve`` field of the job configuration.
|
using the transport listener type specified in the ``serve`` field of the job configuration.
|
||||||
Each transport listener provides a client's identity to the passive side job.
|
When a client connects, the transport listener performS listener-specific access control (cert validation, IP ACLs, etc)
|
||||||
It uses the client identity for access control:
|
and determines the *client identity*.
|
||||||
|
The passive side job then uses this client identity as follows:
|
||||||
|
|
||||||
* The ``sink`` job maps requests from different client identities to their respective sub-filesystem tree ``root_fs/${client_identity}``.
|
* The ``sink`` job maps requests from different client identities to their respective sub-filesystem tree ``root_fs/${client_identity}``.
|
||||||
* The ``source`` job has a whitelist of client identities that are allowed pull access.
|
* The ``source`` might, in the future, embed the client identity in :ref:`zrepl's ZFS abstraction names <zrepl-zfs-abstractions>` in order to support multi-host replication.
|
||||||
|
|
||||||
.. TIP::
|
.. TIP::
|
||||||
The implementation of the ``sink`` job requires that the connecting client identities be a valid ZFS filesystem name components.
|
The implementation of the ``sink`` job requires that the connecting client identities be a valid ZFS filesystem name components.
|
||||||
@ -164,7 +167,7 @@ With the background knowledge from the previous paragraph, we now summarize the
|
|||||||
|
|
||||||
.. _replication-placeholder-property:
|
.. _replication-placeholder-property:
|
||||||
|
|
||||||
**Placeholder filesystems** on the receiving side are regular ZFS filesystems with the placeholder property ``zrepl:placeholder=on``.
|
**Placeholder filesystems** on the receiving side are regular ZFS filesystems with the ZFS property ``zrepl:placeholder=on``.
|
||||||
Placeholders allow the receiving side to mirror the sender's ZFS dataset hierarchy without replicating every filesystem at every intermediary dataset path component.
|
Placeholders allow the receiving side to mirror the sender's ZFS dataset hierarchy without replicating every filesystem at every intermediary dataset path component.
|
||||||
Consider the following example: ``S/H/J`` shall be replicated to ``R/sink/job/S/H/J``, but neither ``S/H`` nor ``S`` shall be replicated.
|
Consider the following example: ``S/H/J`` shall be replicated to ``R/sink/job/S/H/J``, but neither ``S/H`` nor ``S`` shall be replicated.
|
||||||
ZFS requires the existence of ``R/sink/job/S`` and ``R/sink/job/S/H`` in order to receive into ``R/sink/job/S/H/J``.
|
ZFS requires the existence of ``R/sink/job/S`` and ``R/sink/job/S/H`` in order to receive into ``R/sink/job/S/H/J``.
|
||||||
@ -229,39 +232,53 @@ Limitations
|
|||||||
Multiple Jobs & More than 2 Machines
|
Multiple Jobs & More than 2 Machines
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
|
||||||
|
The quick-start guides focus on simple setups with a single sender and a single receiver.
|
||||||
|
This section documents considerations for more complex setups.
|
||||||
|
|
||||||
.. ATTENTION::
|
.. ATTENTION::
|
||||||
|
|
||||||
When using multiple jobs across single or multiple machines, the following rules are critical to avoid race conditions & data loss:
|
Before you continue, make sure you have a working understanding of :ref:`how zrepl works <overview-how-replication-works>`
|
||||||
|
and :ref:`what zrepl does to ensure <zrepl-zfs-abstractions>` that replication between sender and receiver is always
|
||||||
|
possible without conflicts.
|
||||||
|
This will help you understand why certain kinds of multi-machine setups do not (yet) work.
|
||||||
|
|
||||||
1. The sets of ZFS filesystems matched by the ``filesystems`` filter fields must be disjoint across all jobs configured on a machine.
|
.. NOTE::
|
||||||
2. The ZFS filesystem subtrees of jobs with ``root_fs`` must be disjoint.
|
|
||||||
3. Across all zrepl instances on all machines in the replication domain, there must be a 1:1 correspondence between active and passive jobs.
|
|
||||||
|
|
||||||
Explanations & exceptions to above rules are detailed below.
|
If you can't find your desired configuration, have questions or would like to see improvements to multi-job setups, please `open an issue on GitHub <https://github.com/zrepl/zrepl/issues/new>`_.
|
||||||
|
|
||||||
If you would like to see improvements to multi-job setups, please `open an issue on GitHub <https://github.com/zrepl/zrepl/issues/new>`_.
|
Multiple Jobs on one Machine
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
As a general rule, multiple jobs configured on one machine **must operate on disjoint sets of filesystems**.
|
||||||
|
Otherwise, concurrently running jobs might interfere when operating on the same filesystem.
|
||||||
|
|
||||||
No Overlapping
|
On your setup, ensure that
|
||||||
^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
Jobs run independently of each other.
|
* all ``filesystems`` filter specifications are disjoint
|
||||||
If two jobs match the same filesystem with their ``filesystems`` filter, they will operate on that filesystem independently and potentially in parallel.
|
* no ``root_fs`` is a prefix or equal to another ``root_fs``
|
||||||
For example, if job A prunes snapshots that job B is planning to replicate, the replication will fail because B assumed the snapshot to still be present.
|
* no ``filesystems`` filter maches any ``root_fs``
|
||||||
However, the next replication attempt will re-examine the situation from scratch and should work.
|
|
||||||
|
|
||||||
N push jobs to 1 sink
|
**Exceptions to the rule**:
|
||||||
^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
The :ref:`sink job <job-sink>` namespaces by client identity.
|
* A ``snap`` and ``push`` job on the same machine can match the same ``filesystems``.
|
||||||
It is thus safe to push to one sink job with different client identities.
|
To avoid interference, only one of the jobs should be pruning snapshots on the sender, the other one should keep all snapshots.
|
||||||
If the push jobs have the same client identity, the filesystems matched by the push jobs must be disjoint to avoid races.
|
Since the jobs won't coordinate, errors in the log are to be expected, but :ref:`zrepl's ZFS abstractions <zrepl-zfs-abstractions>` ensure that ``push`` and ``sink`` can always replicate incrementally.
|
||||||
|
This scenario is detailed in one of the :ref:`quick-start guides <quickstart-backup-to-external-disk>`.
|
||||||
|
|
||||||
N pull jobs from 1 source
|
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
Multiple pull jobs pulling from the same source have potential for race conditions during pruning:
|
More Than 2 Machines
|
||||||
each pull job prunes the source side independently, causing replication-prune and prune-prune races.
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
There is currently no way for a pull job to filter which snapshots it should attempt to replicate.
|
This section might be relevant to users who wish to *fan-in* (N machines replicate to 1) or *fan-out* (replicate 1 machine to N machines).
|
||||||
Thus, it is not possible to just manually assert that the prune rules of all pull jobs are disjoint to avoid replication-prune and prune-prune races.
|
|
||||||
|
**Working setups**:
|
||||||
|
|
||||||
|
* N ``push`` identities, 1 ``sink`` (as long as the different push jobs have a different :ref:`client identity <overview-passive-side--client-identity>`)
|
||||||
|
|
||||||
|
* ``sink`` constrains each client to a disjoint sub-tree of the sink-side dataset hierarchy ``${root_fs}/${client_identity}``.
|
||||||
|
Therefore, the different clients cannot interfere.
|
||||||
|
|
||||||
|
|
||||||
|
**Setups that do not work**:
|
||||||
|
|
||||||
|
* N ``pull`` identities, 1 ``source`` job. Tracking :issue:`380`.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user