zrepl/docs/tutorial.rst

200 lines
7.1 KiB
ReStructuredText
Raw Normal View History

.. include:: global.rst.inc
2017-11-09 21:17:09 +01:00
.. _tutorial:
Tutorial
========
This tutorial shows how zrepl can be used to implement a ZFS-based pull backup.
We assume the following scenario:
2017-11-09 21:17:09 +01:00
* Production server ``app-srv`` with filesystems to back up:
2017-11-09 21:17:09 +01:00
* ``zroot/var/db``
* ``zroot/usr/home`` and all its child filesystems
* **except** ``zroot/usr/home/paranoid`` belonging to a user doing backups themselves
2017-11-09 21:17:09 +01:00
* Backup server ``backup-srv`` with
2017-11-09 21:17:09 +01:00
* Filesystem ``storage/zrepl/pull/app-srv`` + children dedicated to backups of ``app-srv``
Our backup solution should fulfill the following requirements:
2017-11-09 21:17:09 +01:00
* Periodically snapshot the filesystems on ``app-srv`` *every 10 minutes*
* Incrementally replicate these snapshots to ``storage/zrepl/pull/app-srv/*`` on ``backup-srv``
* Keep only very few snapshots on ``app-srv`` to save disk space
* Keep a fading history (24 hourly, 30 daily, 6 monthly) of snapshots on ``backup-srv``
Analysis
--------
We can model this situation as two jobs:
2017-11-09 21:17:09 +01:00
* A **source job** on ``app-srv``
* Creates the snapshots
2017-11-09 21:17:09 +01:00
* Keeps a short history of snapshots to enable incremental replication to ``backup-srv``
* Accepts connections from ``backup-srv``
2017-11-09 21:17:09 +01:00
* A **pull job** on ``backup-srv``
2017-11-09 21:17:09 +01:00
* Connects to the ``zrepl daemon`` process on ``app-srv``
* Pulls the snapshots to ``storage/zrepl/pull/app-srv/*``
* Fades out snapshots in ``storage/zrepl/pull/app-srv/*`` as they age
Why doesn't the **pull job** create the snapshots before pulling?
2017-11-09 21:17:09 +01:00
As is the case with all distributed systems, the link between ``app-srv`` and ``backup-srv`` might be down for an hour or two.
We do not want to sacrifice our required backup resolution of 10 minute intervals for a temporary connection outage.
2017-11-09 21:17:09 +01:00
When the link comes up again, ``backup-srv`` will happily catch up the 12 snapshots taken by ``app-srv`` in the meantime, without
a gap in our backup history.
Install zrepl
-------------
2017-11-09 21:17:09 +01:00
Follow the :ref:`OS-specific installation instructions <installation>` and come back here.
2017-11-09 21:17:09 +01:00
Configure ``backup-srv``
------------------------
2018-04-01 14:58:12 +02:00
We define a **pull job** named ``pull_app-srv`` in the |mainconfig| on host ``backup-srv``: ::
jobs:
- name: pull_app-srv
type: pull
connect:
type: ssh+stdinserver
host: app-srv.example.com
user: root
port: 22
identity_file: /etc/zrepl/ssh/identity
interval: 10m
mapping: {
"<":"storage/zrepl/pull/app-srv"
}
initial_repl_policy: most_recent
snapshot_prefix: zrepl_pull_backup_
prune:
policy: grid
grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d
2017-11-09 21:17:09 +01:00
The ``connect`` section instructs the zrepl daemon to use the ``stdinserver`` transport:
``backup-srv`` will connect to the specified SSH server and expect ``zrepl stdinserver CLIENT_IDENTITY`` instead of the shell on the other side.
2017-11-11 22:33:09 +01:00
It uses the private key specified at ``connect.identity_file`` which we still need to create: ::
cd /etc/zrepl
mkdir -p ssh
chmod 0700 ssh
ssh-keygen -t ed25519 -N '' -f /etc/zrepl/ssh/identity
Note that most use cases do not benefit from separate keypairs per remote endpoint.
2017-11-09 21:17:09 +01:00
Thus, it is sufficient to create one keypair and use it for all ``connect`` directives on one host.
zrepl uses ssh's default ``known_hosts`` file, which must contain a host identification entry for ``app-srv.example.com``.
If that entry does not already exist, we need to generate it.
Run the following command, compare the host fingerprints, and confirm with yes if they match.
You will not be able to get a shell with the identity file we just generated, which is fine. ::
ssh -i /etc/zrepl/ssh/identity root@app-srv.example.com
2017-11-09 21:17:09 +01:00
Learn more about :ref:`transport-ssh+stdinserver` transport and the :ref:`pull job <job-pull>` format.
2017-11-10 13:20:56 +01:00
.. _tutorial-configure-app-srv:
2017-11-09 21:17:09 +01:00
Configure ``app-srv``
---------------------
2018-04-01 14:58:12 +02:00
We define a corresponding **source job** named ``pull_backup`` in the |mainconfig| on host ``app-srv``: ::
jobs:
- name: pull_backup
type: source
serve:
type: stdinserver
client_identity: backup-srv.example.com
filesystems: {
"zroot/var/db": "ok",
"zroot/usr/home<": "ok",
"zroot/usr/home/paranoid": "!",
}
snapshot_prefix: zrepl_pull_backup_
interval: 10m
prune:
policy: grid
grid: 1x1d(keep=all)
keep_bookmarks: 144
2017-11-09 21:17:09 +01:00
The ``serve`` section corresponds to the ``connect`` section in the configuration of ``backup-srv``.
2017-11-09 21:17:09 +01:00
We now want to authenticate ``backup-srv`` before allowing it to pull data.
This is done by limiting SSH connections from ``backup-srv`` to execute the ``stdinserver`` subcommand.
2017-11-09 21:17:09 +01:00
Open ``/root/.ssh/authorized_keys`` and add either of the the following lines.::
# for OpenSSH >= 7.2
command="zrepl stdinserver backup-srv.example.com",restrict CLIENT_SSH_KEY
# for older OpenSSH versions
command="zrepl stdinserver backup-srv.example.com",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY
.. ATTENTION::
2017-11-09 21:17:09 +01:00
Replace CLIENT_SSH_KEY with the contents of ``/etc/zrepl/ssh/identity.pub`` from ``app-srv``.
Mind the trailing ``.pub`` in the filename.
The entries **must** be on a single line, including the replaced CLIENT_SSH_KEY.
.. HINT::
2017-11-09 21:17:09 +01:00
You may need to adjust the ``PermitRootLogin`` option in ``/etc/ssh/sshd_config`` to ``forced-commands-only`` or higher for this to work.
Refer to sshd_config(5) for details.
2017-11-09 21:17:09 +01:00
The argument ``backup-srv.example.com`` is the client identity of ``backup-srv`` as defined in ``jobs.serve.client_identity``.
2017-11-09 21:17:09 +01:00
Again, both :ref:`transport-ssh+stdinserver` transport and the :ref:`job-source` format are documented.
Apply Configuration Changes
---------------------------
2017-11-09 21:17:09 +01:00
We need to restart the zrepl daemon on **both** ``app-srv`` and ``backup-srv``.
This is :ref:`OS-specific <usage-zrepl-daemon-restarting>`.
Watch it Work
-------------
Run ``zrepl control status`` to view the current activity of the configured jobs.
If a job encountered problems since it last left idle state, the output contains useful debug log.
Additionally, you can check the detailed structured logs of the `zrepl daemon` process and use GNU *watch* to view the snapshots present on both machines.
If you like tmux, here is a handy script that works on FreeBSD: ::
pkg install gnu-watch tmux
tmux new-window
tmux split-window "tail -f /var/log/zrepl.log"
tmux split-window "gnu-watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
tmux select-layout tiled
The Linux equivalent might look like this: ::
# make sure tmux is installed & let's assume you use systemd + journald
tmux new-window
tmux split-window "journalctl -f -u zrepl.service"
tmux split-window "watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
tmux select-layout tiled
Summary
-------
Congratulations, you have a working pull backup. Where to go next?
2017-11-09 21:17:09 +01:00
* Read more about :ref:`configuration format, options & job types <configuration_toc>`
* Configure :ref:`logging <logging>` \& :ref:`monitoring <monitoring>`.
2017-11-09 21:17:09 +01:00
* Learn about :ref:`implementation details <implementation_toc>` of zrepl.