docs: document fan-out replication & add quick-start guide

closes https://github.com/zrepl/zrepl/pull/552
fixes https://github.com/zrepl/zrepl/issues/551

Signed-off-by: Andrew Gunnerson <chillermillerlong@hotmail.com>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
This commit is contained in:
Andrew Gunnerson 2021-12-31 18:39:49 -05:00 committed by Christian Schwarz
parent 1ad7df2df3
commit 556fac3002
5 changed files with 221 additions and 3 deletions

View File

@ -0,0 +1,59 @@
jobs:
# Separate job for snapshots and pruning
- name: snapshots
type: snap
filesystems:
'tank<': true # all filesystems
snapshotting:
type: periodic
prefix: zrepl_
interval: 10m
pruning:
keep:
# Keep non-zrepl snapshots
- type: regex
negate: true
regex: '^zrepl_'
# Time-based snapshot retention
- type: grid
grid: 1x1h(keep=all) | 24x1h | 30x1d | 12x30d
regex: '^zrepl_'
# Source job for target B
- name: target_b
type: source
serve:
type: tls
listen: :8888
ca: /etc/zrepl/b.example.com.crt
cert: /etc/zrepl/a.example.com.crt
key: /etc/zrepl/a.example.com.key
client_cns:
- b.example.com
filesystems:
'tank<': true # all filesystems
# Snapshots are handled by the separate snap job
snapshotting:
type: manual
# Source job for target C
- name: target_c
type: source
serve:
type: tls
listen: :8889
ca: /etc/zrepl/c.example.com.crt
cert: /etc/zrepl/a.example.com.crt
key: /etc/zrepl/a.example.com.key
client_cns:
- c.example.com
filesystems:
'tank<': true # all filesystems
# Snapshots are handled by the separate snap job
snapshotting:
type: manual
# Source jobs for remaining targets. Each one should listen on a different port
# and reference the correct certificate and client CN.
# - name: target_c
# ...

View File

@ -0,0 +1,30 @@
jobs:
# Pull from source server A
- name: source_a
type: pull
connect:
type: tls
# Use the correct port for this specific client (eg. B is 8888, C is 8889, etc.)
address: a.example.com:8888
ca: /etc/zrepl/a.example.com.crt
# Use the correct key pair for this specific client
cert: /etc/zrepl/b.example.com.crt
key: /etc/zrepl/b.example.com.key
server_cn: a.example.com
root_fs: pool0/backup
interval: 10m
pruning:
keep_sender:
# Source does the pruning in its snap job
- type: regex
regex: '.*'
# Receiver-side pruning can be configured as desired on each target server
keep_receiver:
# Keep non-zrepl snapshots
- type: regex
negate: true
regex: '^zrepl_'
# Time-based snapshot retention
- type: grid
grid: 1x1h(keep=all) | 24x1h | 30x1d | 12x30d
regex: '^zrepl_'

View File

@ -248,7 +248,7 @@ Limitations
Multiple Jobs & More than 2 Machines Multiple Jobs & More than 2 Machines
------------------------------------ ------------------------------------
The quick-start guides focus on simple setups with a single sender and a single receiver. Most users are served well with a single sender and a single receiver job.
This section documents considerations for more complex setups. This section documents considerations for more complex setups.
.. ATTENTION:: .. ATTENTION::
@ -288,11 +288,46 @@ This section might be relevant to users who wish to *fan-in* (N machines replica
**Working setups**: **Working setups**:
* N ``push`` identities, 1 ``sink`` (as long as the different push jobs have a different :ref:`client identity <overview-passive-side--client-identity>`) * **Fan-in: N servers replicated to one receiver, disjoint dataset trees.**
* ``sink`` constrains each client to a disjoint sub-tree of the sink-side dataset hierarchy ``${root_fs}/${client_identity}``. * This is the common use case of a centralized backup server.
* Implementation:
* N ``push`` jobs (one per sender server), 1 ``sink`` (as long as the different push jobs have a different :ref:`client identity <overview-passive-side--client-identity>`)
* N ``source`` jobs (one per sender server), N ``pull`` on the receiver server (unique names, disjoing ``root_fs``)
* The ``sink`` job automatically constrains each client to a disjoint sub-tree of the sink-side dataset hierarchy ``${root_fs}/${client_identity}``.
Therefore, the different clients cannot interfere. Therefore, the different clients cannot interfere.
* The ``pull`` job only pulls from one host, so it's up to the zrepl user to ensure that the different ``pull`` jobs don't interfere.
.. _fan-out-replication:
* **Fan-out: 1 server replicated to N receivers**
* Can be implemented either in a pull or push fashion.
* **pull setup**: 1 ``pull`` job on each receiver server, each with a corresponding **unique** ``source`` job on the sender server.
* **push setup**: 1 ``sink`` job on each receiver server, each with a corresponding **unique** ``push`` job on the sender server.
* It is critical that we have one sending-side job (``source``, ``push``) per receiver.
The reason is that :ref:`zrepl's ZFS abstractions <zrepl-zfs-abstractions>` (``zrepl zfs-abstraction list``) include the name of the ``source``/``push`` job, but not the receive-side job name or client identity (see :issue:`380`).
As a counter-example, suppose we used multiple ``pull`` jobs with only one ``source`` job.
All ``pull`` jobs would share the same :ref:`replication cursor bookmark <replication-cursor-and-last-received-hold>` and trip over each other, breaking incremental replication guarantees quickly.
The anlogous problem exists for 1 ``push`` to N ``sink`` jobs.
* The ``filesystems`` matched by the sending side jobs (``source``, ``push``) need not necessarily be disjoint.
For this to work, we need to avoid interference between snapshotting and pruning of the different sending jobs.
The solution is to centralize sender-side snapshot management in a separate ``snap`` job.
Snapshotting in the ``source``/``push`` job should then be disabled (``type: manual``).
And sender-side pruning (``keep_sender``) needs to be disabled in the active side (``pull`` / ``push``), since that'll be done by the ``snap job``.
* **Restore limitations**: when restoring from one of the ``pull`` targets (e.g., using ``zfs send -R``), the replication cursor bookmarks don't exist on the restored system.
This can break incremental replication to all other receive-sides after restore.
* See :ref:`the fan-out replication quick-start guide <quickstart-fan-out-replication>` for an example of this setup.
**Setups that do not work**: **Setups that do not work**:

View File

@ -33,6 +33,7 @@ Keep the :ref:`full config documentation <configuration_toc>` handy if a config
quickstart/continuous_server_backup quickstart/continuous_server_backup
quickstart/backup_to_external_disk quickstart/backup_to_external_disk
quickstart/fan_out_replication
Use ``zrepl configcheck`` to validate your configuration. Use ``zrepl configcheck`` to validate your configuration.
No output indicates that everything is fine. No output indicates that everything is fine.

View File

@ -0,0 +1,93 @@
.. include:: ../global.rst.inc
.. _quickstart-fan-out-replication:
Fan-out replication
===================
This quick-start example demonstrates how to implement a fan-out replication setup where datasets on a server (A) are replicated to multiple targets (B, C, etc.).
This example uses multiple ``source`` jobs on server A and ``pull`` jobs on the target servers.
.. WARNING::
Before implementing this setup, please see the caveats listed in the :ref:`fan-out replication configuration overview <fan-out-replication>`.
Overview
--------
On the source server (A), there should be:
* A ``snap`` job
* Creates the snapshots
* Handles the pruning of snapshots
* A ``source`` job for target B
* Accepts connections from server B and B only
* Further ``source`` jobs for each additional target (C, D, etc.)
* Listens on a unique port
* Only accepts connections from the specific target
On each target server, there should be:
* A ``pull`` job that connects to the corresponding ``source`` job on A
* ``prune_sender`` should keep all snapshots since A's ``snap`` job handles the pruning
* ``prune_receiver`` can be configured as appropriate on each target server
Generate TLS Certificates
-------------------------
Mutual TLS via the :ref:`TLS client authentication transport <transport-tcp+tlsclientauth>` can be used to secure the connections between the servers. In this example, a self-signed certificate is created for each server without setting up a CA.
.. code-block:: bash
source=a.example.com
targets=(
b.example.com
c.example.com
# ...
)
for server in "${source}" "${targets[@]}"; do
openssl req -x509 -sha256 -nodes \
-newkey rsa:4096 \
-days 365 \
-keyout "${server}.key" \
-out "${server}.crt" \
-addext "subjectAltName = DNS:${server}" \
-subj "/CN=${server}"
done
# Distribute each host's keypair
for server in "${source}" "${targets[@]}"; do
ssh root@"${server}" mkdir /etc/zrepl
scp "${server}".{crt,key} root@"${server}":/etc/zrepl/
done
# Distribute target certificates to the source
scp "${targets[@]/%/.crt}" root@"${source}":/etc/zrepl/
# Distribute source certificate to the targets
for server in "${targets[@]}"; do
scp "${source}.crt" root@"${server}":/etc/zrepl/
done
Configure source server A
-------------------------
.. literalinclude:: ../../config/samples/quickstart_fan_out_replication_source.yml
Configure each target server
----------------------------
.. literalinclude:: ../../config/samples/quickstart_fan_out_replication_target.yml
Go Back To Quickstart Guide
---------------------------
:ref:`Click here <quickstart-apply-config>` to go back to the quickstart guide.