zrepl/docs/content/tutorial/_index.md
Christian Schwarz 3f394a8960 docs: tutorial: unambiguous hostnames
s/backups/backup-srv
s/prod1/app-srv

Also fixes wrong hostname in Analysis section.

fixes #15
2017-10-02 13:51:05 +02:00

5.9 KiB

title weight
Tutorial 1

This tutorial shows how zrepl can be used to implement a ZFS-based pull backup.

We assume the following scenario:

  • Production server app-srv with filesystems to back up:
    • zroot/var/db
    • zroot/usr/home and all its child filesystems
    • except zroot/usr/home/paranoid belonging to a user doing backups themselves
  • Backup server backup-srv with
    • Filesystem storage/zrepl/pull/app-srv + children dedicated to backups of app-srv

Our backup solution should fulfill the following requirements:

  • Periodically snapshot the filesystems on app-srv every 10 minutes
  • Incrementally replicate these snapshots to storage/zrepl/pull/app-srv/* on backup-srv
  • Keep only very few snapshots on app-srv to save disk space
  • Keep a fading history (24 hourly, 30 daily, 6 monthly) of snapshots on backup-srv

Analysis

We can model this situation as two jobs:

  • A source job on app-srv
    • Creates the snapshots
    • Keeps a short history of snapshots to enable incremental replication to backup-srv
    • Accepts connections from backup-srv
  • A pull job on backup-srv
    • Connects to the zrepl daemon process on app-srv
    • Pulls the snapshots to storage/zrepl/pull/app-srv/*
    • Fades out snapshots in storage/zrepl/pull/app-srv/* as they age

{{%expand "Side note: why doesn't the pull job create the snapshots?" %}}

As is the case with all distributed systems, the link between app-srv and backup-srv might be down for an hour or two. We do not want to sacrifice our required backup resolution of 10 minute intervals for a temporary connection outage.

When the link comes up again, backup-srv will happily catch up the 12 snapshots taken by app-srv in the meantime, without a gap in our backup history. {{%/expand%}}

Install zrepl

Follow the [OS-specific installation instructions]({{< relref "install/_index.md" >}}) and come back here.

Configure backup-srv

We define a pull job named pull_app-srv in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}} ):

jobs:
- name: pull_app-srv
  type: pull
  connect:
    type: ssh+stdinserver
    host: app-srv.example.com
    user: root
    port: 22
    identity_file: /etc/zrepl/ssh/app-srv
  interval: 10m
  mapping: {
    "<":"storage/zrepl/pull/app-srv"
  }
  initial_repl_policy: most_recent
  snapshot_prefix: zrepl_pull_backup_
  prune:
    policy: grid
    grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d

The connect section instructs zrepl to use the stdinserver transport: instead of directly exposing zrepl on app-srv to the internet, backup-srv starts the zrepl stdinserver subcommand on app-srv via SSH. (You can learn more about what happens [here]({{< relref "configuration/transports.md#stdinserver" >}}), or just continue following this tutorial.)

Thus, we need to create the SSH key pair /etc/zrepl/ssh/app-srv{,.pub} which identifies backup-srv toward app-srv. Execute the following commands on backup-srv as the root user:

cd /etc/zrepl
mkdir -p ssh
chmod 0700 ssh
ssh-keygen -t ed25519 -N '' -f /etc/zrepl/ssh/app-srv

You can learn more about the [pull job format here]({{< relref "configuration/jobs.md#pull" >}}) but for now we are good to go.

Configure app-srv

We define a corresponding source job named pull_backup in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}}) zrepl.yml:

jobs:

- name: pull_backup
  type: source
  serve:
    type: stdinserver
    client_identity: backup-srv.example.com
  datasets: {
    "zroot/var/db": "ok",
    "zroot/usr/home<": "ok",
    "zroot/usr/home/paranoid": "!",
  }
  snapshot_prefix: zrepl_pull_backup_
  interval: 10m
  prune:
    policy: grid
    grid: 1x1d(keep=all)

The serve section corresponds to the connect section in the configuration of backup-srv.

As mentioned before, the SSH key app-srv.pub created in the section before identifies backup-srv toward app-srv. We enforce that by limiting backup-srv to execute exactly zrepl stdinserver CLIENT_IDENTITY when connecting to app-srv.

Open /root/.ssh/authorized_keys and add either of the the following lines.

# for OpenSSH >= 7.2
command="zrepl stdinserver backup-srv.example.com",restrict PULLING_SSH_KEY
# for older OpenSSH versions
command="zrepl stdinserver backup-srv.example.com",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc  PULLING_SSH_KEY

{{% notice info %}} Replace PULLING_SSH_KEY with the contents of app-srv.pub.
The entries must be on a single line, including the replaced PULLING_SSH_KEY. {{% /notice %}}

Again, you can learn more about the [source job format here]({{< relref "configuration/jobs.md#source" >}}).

Apply Configuration Changes

We need to restart the zrepl daemon on both app-srv and backup-srv.

This is [OS-specific]({{< relref "install/_index.md#restarting" >}}).

Watch it Work

A common setup is to watch the log output and zfs list of snapshots on both machines.

If you like tmux, here is a handy script that works on FreeBSD:

pkg install gnu-watch tmux
tmux new-window
tmux split-window "tail -f /var/log/zrepl.log"
tmux split-window "gnu-watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
tmux select-layout tiled

The Linux equivalent might look like this

# make sure tmux is installed & let's assume you use systemd + journald
tmux new-window
tmux split-window "journalctl -f -u zrepl.service"
tmux split-window "watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
tmux select-layout tiled

Summary

Congratulations, you have a working pull backup. Where to go next?

  • Read more about [configuration format, options & job types]({{< relref "configuration/_index.md" >}})
  • Learn about [implementation details]({{<relref "impl/_index.md" >}}) of zrepl.