From df181108b4a1c1deb151bda6343d460617951b6d Mon Sep 17 00:00:00 2001 From: Christian Schwarz Date: Thu, 9 Nov 2017 20:33:09 +0100 Subject: [PATCH] docs: initial port of hugo to sphinx, including rtd theme --- docs/conf.py | 2 +- docs/configuration.rst | 13 ++ docs/configuration/jobs.rst | 129 ++++++++++++++++ docs/configuration/logging.rst | 148 ++++++++++++++++++ docs/configuration/map_filter_syntax.rst | 101 +++++++++++++ docs/configuration/misc.rst | 61 ++++++++ docs/configuration/prune.rst | 59 ++++++++ docs/configuration/transports.rst | 103 +++++++++++++ docs/implementation.rst | 58 +++++++ docs/index.rst | 70 +++++++-- docs/installation.rst | 88 +++++++++++ docs/pr.rst | 5 + docs/tutorial.rst | 183 +++++++++++++++++++++++ 13 files changed, 1009 insertions(+), 11 deletions(-) create mode 100644 docs/configuration.rst create mode 100644 docs/configuration/jobs.rst create mode 100644 docs/configuration/logging.rst create mode 100644 docs/configuration/map_filter_syntax.rst create mode 100644 docs/configuration/misc.rst create mode 100644 docs/configuration/prune.rst create mode 100644 docs/configuration/transports.rst create mode 100644 docs/implementation.rst create mode 100644 docs/installation.rst create mode 100644 docs/pr.rst create mode 100644 docs/tutorial.rst diff --git a/docs/conf.py b/docs/conf.py index a138920..21d2e1a 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -84,7 +84,7 @@ todo_include_todos = True # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # -html_theme = 'alabaster' +html_theme = 'sphinx_rtd_theme' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the diff --git a/docs/configuration.rst b/docs/configuration.rst new file mode 100644 index 0000000..82d766b --- /dev/null +++ b/docs/configuration.rst @@ -0,0 +1,13 @@ + +************* +Configuration +************* + +.. toctree:: + + configuration/jobs + configuration/transports + configuration/map_filter_syntax + configuration/prune + configuration/logging + configuration/misc diff --git a/docs/configuration/jobs.rst b/docs/configuration/jobs.rst new file mode 100644 index 0000000..71eea20 --- /dev/null +++ b/docs/configuration/jobs.rst @@ -0,0 +1,129 @@ +Job Types +========= + +A *job* is the unit of activity tracked by the zrepl daemon and configured in the [configuration file]({{< relref "install/_index.md#configuration-files" >}}). + +Every job has a unique `name`, a `type` and type-dependent fields which are documented on this page. + +Check out the [tutorial]({{< relref "tutorial/_index.md" >}}) and {{< sampleconflink >}} for examples on how job types are actually used. + +.. ATTENTION:: + + Currently, zrepl does not replicate filesystem properties. + Whe receiving a filesystem, it is never mounted (`-u` flag) and `mountpoint=none` is set. + This is temporary and being worked on {{< zrepl-issue 24 >}}. + +Source Job +---------- + +========== ======= ===================== +Parameter Default Description / Example +========== ======= ===================== +========== ======= ===================== + +:: + + |-----|-------|-------| + |`type`||`source`| + |`name`||unique name of the job| + |`serve`||{{< zrepl-transport "serve transport" >}} specification| + |`datasets`||{{< zrepl-filter >}} for datasets to expose to client| + |`snapshot_prefix`||prefix for ZFS snapshots taken by this job| + |`interval`||snapshotting interval| + |`prune`||{{< zrepl-prune >}} policy for datasets in `datasets` with prefix `snapshot_prefix`| + +* Snapshotting Task (every `interval`, {{% zrepl-job-patient %}}) + 1. A snapshot of filesystems matched by `datasets` is taken every `interval` with prefix `snapshot_prefix`. + 1. The `prune` policy is triggered on datasets matched by `datasets` with snapshots matched by `snapshot_prefix`. +* Serve Task + * Wait for connections from pull job using `serve` + +A source job is the counterpart to a [pull job]({{< relref "#pull" >}}). + +Note that the prune policy determines the maximum replication lag: +a pull job may stop replication due to link failure, misconfiguration or administrative action. +The source prune policy will eventually destroy the last common snapshot between source and pull job, requiring full replication. +Make sure you read the [prune policy documentation]({{< relref "configuration/prune.md" >}}). + +Example: {{< sampleconflink "pullbackup/productionhost.yml" >}} + +Pull Job +-------- + +:: + + |Parameter|Default|Description / Example| + |-----|-------|-------| + |`type`||`pull`| + |`name`||unique name of the job| + |`connect`||{{< zrepl-transport "connect transport" >}} specification| + |`interval`||Interval between pull attempts| + |`mapping`||{{< zrepl-mapping >}} for remote to local filesystems| + |`initial_repl_policy`|`most_recent`|initial replication policy| + |`snapshot_prefix`||prefix filter used for replication & pruning| + |`prune`||{{< zrepl-prune >}} policy for local filesystems reachable by `mapping`| + + - Main Task (every `interval`, {{% zrepl-job-patient %}}) + #. A connection to the remote source job is established using the strategy in `connect` + #. `mapping` maps filesystems presented by the remote side to local *target filesystems* + #. Those remote filesystems with a local *target filesystem* are replicated + #. Only snapshots with prefix `snapshot_prefix` are replicated. + #. If possible, incremental replication takes place. + #. If the local target filesystem does not exist, `initial_repl_policy` is used. + #. On conflicts, an error is logged but replication of other filesystems with mapping continues. + #. The `prune` policy is triggered for all *target filesystems* + +A pull job is the counterpart to a [source job]({{< relref "#source" >}}). + +Example: {{< sampleconflink "pullbackup/backuphost.yml" >}} + +Local Job +--------- + +:: + + |Parameter|Default|Description / Example| + |-----|-------|-------| + |`type`||`local`| + |`name`||unique name of the job| + |`mapping`||{{}} from source to target filesystem (both local)| + |`snapshot_prefix`||prefix for ZFS snapshots taken by this job| + |`interval`|snapshotting & replication interval| + |`initial_repl_policy`|`most_recent`|initial replication policy| + |`prune_lhs`||pruning policy on left-hand-side (source)| + |`prune_rhs`||pruning policy on right-hand-side (target)| + + * Main Task (every `interval`, {{% zrepl-job-patient %}}) + 1. Evaluate `mapping` for local filesystems, those with a *target filesystem* are called *mapped filesystems*. + 1. Snapshot *mapped filesystems* with `snapshot_prefix`. + 1. Replicate *mapped filesystems* to their respective *target filesystems*: + 1. Only snapshots with prefix `snapshot_prefix` are replicated. + 1. If possible, incremental replication takes place. + 1. If the *target filesystem* does not exist, `initial_repl_policy` is used. + 1. On conflicts, an error is logged but replication of other *mapped filesystems* continues. + 1. The `prune_lhs` policy is triggered for all *mapped filesystems* + 1. The `prune_rhs` policy is triggered for all *target filesystems* + +A local job is combination of source & pull job executed on the same machine. + +Example: {{< sampleconflink "localbackup/host1.yml" >}} + +Terminology +----------- + +task + + A job consists of one or more tasks and a task consists of one or more steps. + Some tasks may be periodic while others wait for an event to occur. + +patient task + + A patient task is supposed to execute some task every `interval`. + We call the start of the task an *invocation*. + + * If the task completes in less than `interval`, the task is restarted at `last_invocation + interval`. + * Otherwise, a patient job + * logs a warning as soon as a task exceeds its configured `interval` + * waits for the last invocation to finish + * logs a warning with the effective task duration + * immediately starts a new invocation of the task diff --git a/docs/configuration/logging.rst b/docs/configuration/logging.rst new file mode 100644 index 0000000..81f0523 --- /dev/null +++ b/docs/configuration/logging.rst @@ -0,0 +1,148 @@ +Logging +======= + +zrepl uses structured logging to provide users with easily processable log messages. + +Configuration +------------- + +Logging outlets are configured in the `global` section of the [configuration file]({{< relref "install/_index.md#configuration-files" >}}).
+Check out {{< sampleconflink "random/logging.yml" >}} for an example on how to configure multiple outlets: + +:: + + global: + logging: + + - outlet: OUTLET_TYPE + level: MINIMUM_LEVEL + format: FORMAT + + - outlet: OUTLET_TYPE + level: MINIMUM_LEVEL + format: FORMAT + + ... + + jobs: ... + +Default Configuration +~~~~~~~~~~~~~~~~~~~~~ + +By default, the following logging configuration is used + +:: + + global: + logging: + + - outlet: "stdout" + level: "warn" + format: "human" + +.. ATTENTION:: + Output to **stderr** should always be considered a **critical error**.
+ Only errors in the logging infrastructure itself, e.g. IO errors when writing to an outlet, are sent to stderr. + +Building Blocks +--------------- + +The following sections document the semantics of the different log levels, formats and outlet types. + +Levels +~~~~~~ + +:: + + | Level | SHORT | Description | + |-------|-------|-------------| + |`error`|`ERRO` | immediate action required | + |`warn` |`WARN` | symptoms for misconfiguration, soon expected failure, etc.| + |`info` |`INFO` | explains what happens without too much detail | + |`debug`|`DEBG` | tracing information, state dumps, etc. useful for debugging. | + +Incorrectly classified messages are considered a bug and should be reported. + +Formats +~~~~~~~ + +:: + + | Format | Description | + |--------|---------| + |`human` | emphasized context by putting job, task, step and other context variables into brackets before the actual message, followed by remaining fields in logfmt style| + |`logfmt`| [logfmt](https://brandur.org/logfmt) output. zrepl uses [github.com/go-logfmt/logfmt](github.com/go-logfmt/logfmt).| + |`json` | JSON formatted output. Each line is a valid JSON document. Fields are marshaled by `encoding/json.Marshal()`, which is particularly useful for processing in log aggregation or when processing state dumps. + +Outlets +~~~~~~~ + +Outlets are ... well ... outlets for log entries into the world. + +**`stdout`** +^^^^^^^^^^^^ + +:: + + | Parameter | Default | Comment | + |-----------| --------- | ----------- | + |`outlet` | *none* | required | + |`level` | *none* | minimum [log level](#levels), required | + |`format` | *none* | output [format](#formats), required | + +Writes all log entries with minimum level `level` formatted by `format` to stdout. + +Can only be specified once. + +**`syslog`** +^^^^^^^^^^^^ + +:: + + | Parameter | Default | Comment | + |-----------| --------- | ----------- | + |`outlet` | *none* | required | + |`level` | *none* | minimum [log level](#levels), required, usually `debug` | + |`format` | *none* | output [format](#formats), required| + |`retry_interval`| 0 | Interval between reconnection attempts to syslog | + +Writes all log entries formatted by `format` to syslog. +On normal setups, you should not need to change the `retry_interval`. + +Can only be specified once. + +**`tcp`** +^^^^^^^^^ + +:: + + | Parameter | Default | Comment | + |-----------| --------- | ----------- | + |`outlet` | *none* | required | + |`level` | *none* | minimum [log level](#levels), required | + |`format` | *none* | output [format](#formats), required | + |`net`|*none*|`tcp` in most cases| + |`address`|*none*|remote network, e.g. `logs.example.com:10202`| + |`retry_interval`|*none*|Interval between reconnection attempts to `address`| + |`tls`|*none*|TLS config (see below)| + +Establishes a TCP connection to `address` and sends log messages with minimum level `level` formatted by `format`. + +If `tls` is not specified, an unencrypted connection is established. + +If `tls` is specified, the TCP connection is secured with TLS + Client Authentication. +This is particularly useful in combination with log aggregation services that run on an other machine. + +:: + + |Parameter|Description| + |---------|-----------| + |`ca`|PEM-encoded certificate authority that signed the remote server's TLS certificate| + |`cert`|PEM-encoded client certificate identifying this zrepl daemon toward the remote server| + |`key`|PEM-encoded, unencrypted client private key identifying this zrepl daemon toward the remote server| + + +.. NOTE:: + + zrepl uses Go's `crypto/tls` and `crypto/x509` packages and leaves all but the required fields in `tls.Config` at their default values. + In case of a security defect in these packages, zrepl has to be rebuilt because Go binaries are statically linked. diff --git a/docs/configuration/map_filter_syntax.rst b/docs/configuration/map_filter_syntax.rst new file mode 100644 index 0000000..40fa819 --- /dev/null +++ b/docs/configuration/map_filter_syntax.rst @@ -0,0 +1,101 @@ +Mapping & Filter Syntax +======================= + +For various job types, a filesystem `mapping` or `filter` needs to be +specified. + +Both have in common that they take a filesystem path (in the ZFS filesystem hierarchy)as parameters and return something. +Mappings return a *target filesystem* and filters return a *filter result*. + +The pattern syntax is the same for mappings and filters and is documented in the following section. + +Common Pattern Syntax +--------------------- + +A mapping / filter is specified as a **YAML dictionary** with patterns as keys and +results as values.
+The following rules determine which result is chosen for a given filesystem path: + +* More specific path patterns win over less specific ones +* Non-wildcard patterns (full path patterns) win over *subtree wildcards* (`<` at end of pattern) + +The **subtree wildcard** `<` means "*the dataset left of `<` and all its children*". + +Example +~~~~~~~ + +:: + # Rule number and its pattern + 1: tank< # tank and all its children + 2: tank/foo/bar # full path pattern (no wildcard) + 3: tank/foo< # tank/foo and all its children + + # Which rule applies to given path? + tank/foo/bar/loo => 3 + tank/bar => 1 + tank/foo/bar => 2 + zroot => NO MATCH + tank/var/log => 1 + + +Mappings +-------- + +Mappings map a *source filesystem path* to a *target filesystem path*. +Per pattern, either a target filesystem path or `"!"` is specified as a result. + +* If no pattern matches, there exists no target filesystem (`NO MATCH`). +* If the result is a `"!"`, there exists no target filesystem (`NO MATCH`). +* If the pattern is a non-wildcard pattern, the source path is mapped to the target path on the right. +* If the pattern ends with a *subtree wildcard* (`<`), the source path is **prefix-trimmed** with the path specified left of `<`. + * Note: this means that only for *wildcard-only* patterns (pattern=`<` ) is the source path simply appended to the target path. + +The example is from the {{< sampleconflink "localbackup/host1.yml" >}} example config. + +:: + + jobs: + - name: mirror_local + type: local + mapping: { + "zroot/var/db<": "storage/backups/local/zroot/var/db", + "zroot/usr/home<": "storage/backups/local/zroot/usr/home", + "zroot/usr/home/paranoid": "!", #don't backup paranoid user + "zroot/poudriere/ports<": "!", #don't backup the ports trees + } + ... + +Results in the following mappings + +:: + + zroot/var/db => storage/backups/local/zroot/var/db + zroot/var/db/a/child => storage/backups/local/zroot/var/db/a/child + zroot/usr/home => storage/backups/local/zroot/usr/home + zroot/usr/home/paranoid => NOT MAPPED + zroot/usr/home/bob => storage/backups/local/zroot/usr/home/bob + zroot/usr/src => NOT MAPPED + zroot/poudriere/ports/2017Q3 => NOT MAPPED + zroot/poudriere/ports/HEAD => NOT MAPPED + +Filters +------- + +Valid filter results: `ok` or `!`. + +The example below show the source job from the [tutorial]({{< relref "tutorial/_index.md#configure-app-srv" >}}): + +The client is allowed access to `zroot/var/db`, `zroot/usr/home` + children except `zroot/usr/home/paranoid`. + +:: + + jobs: + - name: pull_backup + type: source + ... + filesystems: { + "zroot/var/db": "ok", + "zroot/usr/home<": "ok", + "zroot/usr/home/paranoid": "!", + } + ... diff --git a/docs/configuration/misc.rst b/docs/configuration/misc.rst new file mode 100644 index 0000000..b5c0449 --- /dev/null +++ b/docs/configuration/misc.rst @@ -0,0 +1,61 @@ +Miscellaneous +============= + +Runtime Directories & UNIX Sockets +---------------------------------- + +zrepl daemon creates various UNIX sockets to allow communicating with it: + +* the `stdinserver` transport connects to a socket named after `client_identity` parameter +* the `control` subcommand connects to a defined control socket + +There is no further authentication on these sockets. +Therefore we have to make sure they can only be created and accessed by `zrepl daemon`. + +In fact, `zrepl daemon` will not bind a socket to a path in a directory that is world-accessible. + +The directories can be configured in the main configuration file: + +:: + + global: + control: + sockpath: /var/run/zrepl/control + serve: + stdinserver: + sockdir: /var/run/zrepl/stdinserver + + +Durations & Intervals +--------------------- + +Interval & duration fields in job definitions, pruning configurations, etc. must match the following regex: + +:: + + var durationStringRegex *regexp.Regexp = regexp.MustCompile(`^\s*(\d+)\s*(s|m|h|d|w)\s*$`) + // s = second, m = minute, h = hour, d = day, w = week (7 days) + +Super-Verbose Job Debugging +--------------------------- + +You have probably landed here because you opened an issue on GitHub and some developer told you to do this... +So just read the annotated comments ;) + +:: + + job: + - name: ... + ... + # JOB DEBUGGING OPTIONS + # should be equal for all job types, but each job implements the debugging itself + debug: + conn: # debug the io.ReadWriteCloser connection + read_dump: /tmp/connlog_read # dump results of Read() invocations to this file + write_dump: /tmp/connlog_write # dump results of Write() invocations to this file + rpc: # debug the RPC protocol implementation + log: true # log output from rpc layer to the job log + +.. ATTENTION:: + + Connection dumps will almost certainly contain your or other's private data. Do not share it in a bug report. diff --git a/docs/configuration/prune.rst b/docs/configuration/prune.rst new file mode 100644 index 0000000..27d1359 --- /dev/null +++ b/docs/configuration/prune.rst @@ -0,0 +1,59 @@ +Snapshot Pruning +================ + +In zrepl, *pruning* means *destroying snapshots by some policy*. + +A *pruning policy* takes a list of snapshots and - for each snapshot - decides whether it should be kept or destroyed. + +The job context defines which snapshots are even considered for pruning, for example through the `snapshot_prefix` variable. +Check the [job definition]({{< relref "configuration/jobs.md">}}) for details. + +Currently, the retention grid is the only supported pruning policy. + +Retention Grid +-------------- + +:: + + jobs: + - name: pull_app-srv + ... + prune: + policy: grid + grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d + │ │ + └─ one hour interval + │ + └─ 24 adjacent one-hour intervals + +The retention grid can be thought of as a time-based sieve: + +The `grid` field specifies a list of adjacent time intervals: +the left edge of the leftmost (first) interval is the `creation` date of the youngest snapshot. +All intervals to its right describe time intervals further in the past. + +Each interval carries a maximum number of snapshots to keep. +It is secified via `(keep=N)`, where `N` is either `all` (all snapshots are kept) or a positive integer. +The default value is **1**. + +The following procedure happens during pruning: + +1. The list of snapshots eligible for pruning is sorted by `creation` +1. The left edge of the first interval is aligned to the `creation` date of the youngest snapshot +1. A list of buckets is created, one for each interval +1. The list of snapshots is split up into the buckets. +1. For each bucket + + 1. the contained snapshot list is sorted by creation. + 1. snapshots from the list, oldest first, are destroyed until the specified `keep` count is reached. + 1. all remaining snapshots on the list are kept. + +.. ATTENTION:: + + The configuration of the first interval (`1x1h(keep=all)` in the example) determines the **maximum allowable replication lag** between source and destination. + After the first interval, source and destination likely have different retention settings. + This means source and destination may prune different snapshots, prohibiting incremental replication froms snapshots that are not in the first interval. + + **Always** configure the first interval to **`1x?(keep=all)`**, substituting `?` with the maximum time replication may fail due to downtimes, maintenance, connectivity issues, etc. + After outages longer than `?` you may be required to perform **full replication** again. + diff --git a/docs/configuration/transports.rst b/docs/configuration/transports.rst new file mode 100644 index 0000000..a577d77 --- /dev/null +++ b/docs/configuration/transports.rst @@ -0,0 +1,103 @@ +.. highlight:: bash + +Transports +========== + +A transport provides an authenticated [`io.ReadWriteCloser`](https://golang.org/pkg/io/#ReadWriteCloser) to the RPC layer. +(An `io.ReadWriteCloser` is essentially a bidirectional reliable communication channel.) + +Currently, only the `ssh+stdinserver` transport is supported. + +`ssh+stdinserver` +----------------- + +The way the `ssh+stdinserver` transport works is inspired by [git shell](https://git-scm.com/docs/git-shell) and [Borg Backup](https://borgbackup.readthedocs.io/en/stable/deployment.html). +It is implemented in the Go package `github.com/zrepl/zrepl/sshbytestream`. +The config excerpts are taken from the [tutorial]({{< relref "tutorial/_index.md" >}}) which you should complete before reading further. + +`serve` +~~~~~~~ + +:: + + jobs: + - name: pull_backup + type: source + serve: + type: stdinserver + client_identity: backup-srv.example.com + ... + +The serving job opens a UNIX socket named after `client_identity` in the runtime directory, e.g. `/var/run/zrepl/stdinserver/backup-srv.example.com`. + +On the same machine, the :code:`zrepl stdinserver $client_identity` command connects to that socket. +For example, `zrepl stdinserver backup-srv.example.com` connects to the UNIX socket `/var/run/zrepl/stdinserver/backup-srv.example.com`. + +It then passes its stdin and stdout file descriptors to the zrepl daemon via *cmsg(3)*. +zrepl daemon in turn combines them into an `io.ReadWriteCloser`: +a `Write()` turns into a write to stdout, a `Read()` turns into a read from stdin. + +Interactive use of the `stdinserver` subcommand does not make much sense. +However, we can force its execution when a user with a particular SSH pubkey connects via SSH. +This can be achieved with an entry in the `authorized_keys` file of the serving zrepl daemon. + +:: + + # for OpenSSH >= 7.2 + command="zrepl stdinserver CLIENT_IDENTITY",restrict CLIENT_SSH_KEY + # for older OpenSSH versions + command="zrepl stdinserver CLIENT_IDENTITY",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY + +* CLIENT_IDENTITY is substituted with `backup-srv.example.com` in our example +* CLIENT_SSH_KEY is substituted with the public part of the SSH keypair specified in the `connect` directive on the connecting host. + +.. NOTE:: + You may need to adjust the `PermitRootLogin` option in `/etc/ssh/sshd_config` to `forced-commands-only` or higher for this to work. + Refer to sshd_config(5) for details. + +To recap, this is of how client authentication works with the `ssh+stdinserver` transport: + +* Connections to the `client_identity` UNIX socket are blindly trusted by zrepl daemon. +* Thus, the runtime directory must be private to the zrepl user (checked by zrepl daemon) +* The admin of the host with the serving zrepl daemon controls the `authorized_keys` file. +* Thus, the administrator controls the mapping `PUBKEY -> CLIENT_IDENTITY`. + +`connect` +~~~~~~~~~ + +:: + + jobs: + - name: pull_app-srv + type: pull + connect: + type: ssh+stdinserver + host: app-srv.example.com + user: root + port: 22 + identity_file: /etc/zrepl/ssh/identity + options: # optional + - "Compression=on" + +The connecting zrepl daemon + +1. Creates a pipe +1. Forks +1. In the forked process + + 1. Replaces forked stdin and stdout with the corresponding pipe ends + 1. Executes the `ssh` binary found in `$PATH`. + + 1. The identity file (`-i`) is set to `$identity_file`. + 1. The remote user, host and port correspond to those configured. + 1. Further options can be specified using the `options` field, which appends each entry in the list to the command line using `-o $entry`. + +1. Wraps the pipe ends in an `io.ReadWriteCloser` and uses it for RPC. + +As discussed in the section above, the connecting zrepl daemon expects that `zrepl stdinserver $client_identity` is executed automatically via an `authorized_keys` file entry. + +.. NOTE:: + + The environment variables of the underlying SSH process are cleared. `$SSH_AUTH_SOCK` will not be available. + It is suggested to create a separate, unencrypted SSH key solely for that purpose. + diff --git a/docs/implementation.rst b/docs/implementation.rst new file mode 100644 index 0000000..ef23463 --- /dev/null +++ b/docs/implementation.rst @@ -0,0 +1,58 @@ +Implementation Overview +======================= + +.. WARNING:: + + Incomplete / under construction + +The following design aspects may convince you that `zrepl` is superior to a hacked-together shell script solution. + +Testability & Performance +------------------------- + +zrepl is written in Go, a real programming language with type safety, +reasonable performance, testing infrastructure and an (opinionated) idea of +software engineering. + +* key parts & algorithms of zrepl are covered by unit tests (work in progress) +* zrepl is noticably faster than comparable shell scripts + + +RPC protocol +------------ + +While it is tempting to just issue a few `ssh remote 'zfs send ...' | zfs recv`, this has a number of drawbacks: + +* The snapshot streams need to be compatible. +* Communication is still unidirectional. Thus, you will most likely + * either not take advantage of features such as *compressed send & recv* + * or issue additional `ssh` commands in advance to figure out what features are supported on the other side. +* Advanced logic in shell scripts is ugly to read, poorly testable and a pain to maintain. + +zrepl takes a different approach: + +* Define an RPC protocol. +* Establish an encrypted, authenticated, bidirectional communication channel... +* ... with zrepl running at both ends of it. + + This has several obvious benefits: + +* No blank root shell access is given to the other side. +* Instead, an *authenticated* peer can *request* filesystem lists, snapshot streams, etc. +* Requests are then checked against job-specific ACLs, limiting a client to the filesystems it is actually allowed to replicate. +* The {{< zrepl-transport "transport mechanism" >}} is decoupled from the remaining logic, keeping it extensible. + +Protocol Implementation +~~~~~~~~~~~~~~~~~~~~~~~ + +zrepl implements its own RPC protocol. +This is mostly due to the fact that existing solutions do not provide efficient means to transport large amounts of data. + +Package [`github.com/zrepl/zrepl/rpc`](https://github.com/zrepl/zrepl/tree/master/rpc) builds a special-case handling around returning an `io.Reader` as part of a unary RPC call. + +Measurements show only a single memory-to-memory copy of a snapshot stream is made using `github.com/zrepl/zrepl/rpc`, and there is still potential for further optimizations. + +Logging & Transparency +---------------------- + +zrepl comes with [rich, structured and configurable logging]({{< relref "configuration/logging.md" >}}), allowing administators to understand what the software is actually doing. diff --git a/docs/index.rst b/docs/index.rst index 4a085ca..f975bdc 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,18 +3,68 @@ You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -Welcome to zrepl's documentation! -================================= +zrepl - ZFS replication +----------------------- + +.. ATTENTION:: + zrepl as well as this documentation is still under active development. + It is neither feature complete nor is there a stability guarantee on the configuration format. + Use & test at your own risk ;) + +Getting started +~~~~~~~~~~~~~~~ + +The [5 minute tutorial setup]({{< relref "tutorial/_index.md" >}}) gives you a first impression. + +Main Features +~~~~~~~~~~~~~ + +* Filesystem Replication + * [x] Local & Remote + * [x] Pull mode + * [ ] Push mode + * [x] Access control checks when pulling datasets + * [x] [Flexible mapping]({{< ref "configuration/map_filter_syntax.md" >}}) rules + * [x] Bookmarks support + * [ ] Feature-negotiation for + * Resumable `send & receive` + * Compressed `send & receive` + * Raw encrypted `send & receive` (as soon as it is available) +* Automatic snapshot creation + * [x] Ensure fixed time interval between snapshots +* Automatic snapshot [pruning]({{< ref "configuration/prune.md" >}}) + * [x] Age-based fading (grandfathering scheme) +* Flexible, detailed & structured [logging]({{< ref "configuration/logging.md" >}}) + * [x] `human`, `logfmt` and `json` formatting + * [x] stdout, syslog and TCP (+TLS client auth) outlets +* Maintainable implementation in Go + * [x] Cross platform + * [x] Type safe & testable code + +Contributing +~~~~~~~~~~~~ + +We are happy about any help we can get! + +* Explore the codebase + * These docs live in the `docs/` subdirectory +* Document any non-obvious / confusing / plain broken behavior you encounter when setting up zrepl for the first time +* Check the *Issues* and *Projects* sections for things to do + +{{% panel header=" Development Workflow"%}} +[The GitHub repository](https://github.com/zrepl/zrepl) is where all development happens.
+Make sure to read the [Developer Documentation section](https://github.com/zrepl/zrepl) and open new issues or pull requests there. +{{% /panel %}} + +Table of Contents +~~~~~~~~~~~~~~~~~ .. toctree:: :maxdepth: 2 :caption: Contents: - - -Indices and tables -================== - -* :ref:`genindex` -* :ref:`modindex` -* :ref:`search` + tutorial + installation + configuration + implementation + pr diff --git a/docs/installation.rst b/docs/installation.rst new file mode 100644 index 0000000..8a5572f --- /dev/null +++ b/docs/installation.rst @@ -0,0 +1,88 @@ +Installation +============ + +.. TIP:: + + Note: check out the [tutorial]({{< relref "tutorial/_index.md" >}}) if you want a first impression of zrepl. + +User Privileges +--------------- + +It is possible to run zrepl as an unprivileged user in combination with +[ZFS delegation](https://www.freebsd.org/doc/handbook/zfs-zfs-allow.html). + +Also, there is the possibility to run it in a jail on FreeBSD by delegating a dataset to the jail. + +However, until we get around documenting those setups, you will have to run zrepl as root or experiment yourself :) + +Installation +------------ + +zrepl is currently not packaged on any operating system. Signed & versioned releases are planned but not available yet. + +Check out the sources yourself, fetch dependencies using dep, compile and install to the zrepl user's `$PATH`.
+**Note**: if the zrepl binary is not in `$PATH`, you will have to adjust the examples in the [tutorial]({{< relref "tutorial/_index.md" >}}). + +:: + + # NOTE: you may want to checkout & build as an unprivileged user + cd /root + git clone https://github.com/zrepl/zrepl.git + cd zrepl + dep ensure + go build -o zrepl + cp zrepl /usr/local/bin/zrepl + rehash + # see if it worked + zrepl help + +Configuration Files +------------------- + +zrepl searches for its main configuration file in the following locations (in that order): + +* `/etc/zrepl/zrepl.yml` +* `/usr/local/etc/zrepl/zrepl.yml` + +Alternatively, use CLI flags to specify a config location. + +Copy a config from the [tutorial]({{< relref "tutorial/_index.md" >}}) or the `cmd/sampleconf` directory to one of these locations and customize it to your setup. + +## Runtime Directories + +Check the the [configuration documentation]({{< relref "configuration/misc.md#runtime-directories-unix-sockets" >}}) for more information. +For default settings, the following should to the trick. + +```bash +mkdir -p /var/run/zrepl/stdinserver +chmod -R 0700 /var/run/zrepl +``` + + +Running the Daemon +------------------ + +All actual work zrepl does is performed by a daemon process. + +Logging is configurable via the config file. Please refer to the [logging documentation]({{< relref "configuration/logging.md" >}}). + +:: + + zrepl daemon + +There are no *rc(8)* or *systemd.service(5)* service definitions yet. Note the *daemon(8)* utility on FreeBSD. + +.. ATTENTION:: + + Make sure to actually monitor the error level output of zrepl: some configuration errors will not make the daemon exit.
+ Example: if the daemon cannot create the [stdinserver]({{< relref "configuration/transports.md#stdinserver" >}}) sockets + in the runtime directory, it will emit an error message but not exit because other tasks such as periodic snapshots & pruning are of equal importance. + +Restarting +~~~~~~~~~~ + +The daemon handles SIGINT and SIGTERM for graceful shutdown. + +Graceful shutdown means at worst that a job will not be rescheduled for the next interval. + +The daemon exits as soon as all jobs have reported shut down. diff --git a/docs/pr.rst b/docs/pr.rst new file mode 100644 index 0000000..d991674 --- /dev/null +++ b/docs/pr.rst @@ -0,0 +1,5 @@ +Talks & Presentations +===================== + +* Talk at EuroBSDCon2017 FreeBSD DevSummit ([Slides](https://docs.google.com/presentation/d/1EmmeEvOXAWJHCVnOS9-TTsxswbcGKmeLWdY_6BH4w0Q/edit?usp=sharing), [Event](https://wiki.freebsd.org/DevSummit/201709)) + diff --git a/docs/tutorial.rst b/docs/tutorial.rst new file mode 100644 index 0000000..5df77bb --- /dev/null +++ b/docs/tutorial.rst @@ -0,0 +1,183 @@ +Tutorial +======== + + +This tutorial shows how zrepl can be used to implement a ZFS-based pull backup. +We assume the following scenario: + +* Production server `app-srv` with filesystems to back up: + + * `zroot/var/db` + * `zroot/usr/home` and all its child filesystems + * **except** `zroot/usr/home/paranoid` belonging to a user doing backups themselves + +* Backup server `backup-srv` with + + * Filesystem `storage/zrepl/pull/app-srv` + children dedicated to backups of `app-srv` + +Our backup solution should fulfill the following requirements: + +* Periodically snapshot the filesystems on `app-srv` *every 10 minutes* +* Incrementally replicate these snapshots to `storage/zrepl/pull/app-srv/*` on `backup-srv` +* Keep only very few snapshots on `app-srv` to save disk space +* Keep a fading history (24 hourly, 30 daily, 6 monthly) of snapshots on `backup-srv` + +Analysis +-------- + +We can model this situation as two jobs: + +* A **source job** on `app-srv` + + * Creates the snapshots + * Keeps a short history of snapshots to enable incremental replication to `backup-srv` + * Accepts connections from `backup-srv` + +* A **pull job** on `backup-srv` + + * Connects to the `zrepl daemon` process on `app-srv` + * Pulls the snapshots to `storage/zrepl/pull/app-srv/*` + * Fades out snapshots in `storage/zrepl/pull/app-srv/*` as they age + + +Why doesn't the **pull job** create the snapshots before pulling? + +As is the case with all distributed systems, the link between `app-srv` and `backup-srv` might be down for an hour or two. +We do not want to sacrifice our required backup resolution of 10 minute intervals for a temporary connection outage. + +When the link comes up again, `backup-srv` will happily catch up the 12 snapshots taken by `app-srv` in the meantime, without +a gap in our backup history. + +Install zrepl +------------- + +Follow the [OS-specific installation instructions]({{< relref "install/_index.md" >}}) and come back here. + +Configure `backup-srv` +---------------------- + +We define a **pull job** named `pull_app-srv` in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}} )::: + + jobs: + - name: pull_app-srv + type: pull + connect: + type: ssh+stdinserver + host: app-srv.example.com + user: root + port: 22 + identity_file: /etc/zrepl/ssh/identity + interval: 10m + mapping: { + "<":"storage/zrepl/pull/app-srv" + } + initial_repl_policy: most_recent + snapshot_prefix: zrepl_pull_backup_ + prune: + policy: grid + grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d + +The `connect` section instructs the zrepl daemon to use the `stdinserver` transport: +`backup-srv` will connect to the specified SSH server and expect `zrepl stdinserver CLIENT_IDENTITY` instead of the shell on the other side. + +It uses the private key specified at `connect.identity_file` which we still need to create::: + + cd /etc/zrepl + mkdir -p ssh + chmod 0700 ssh + ssh-keygen -t ed25519 -N '' -f /etc/zrepl/ssh/identity + +Note that most use cases do not benefit from separate keypairs per remote endpoint. +Thus, it is sufficient to create one keypair and use it for all `connect` directives on one host. + +Learn more about [stdinserver]({{< relref "configuration/transports.md#ssh-stdinserver" >}}) and the [**pull job** format]({{< relref "configuration/jobs.md#pull" >}}). + +Configure `app-srv` +------------------- + +We define a corresponding **source job** named `pull_backup` in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}}) +`zrepl.yml`::: + + jobs: + - name: pull_backup + type: source + serve: + type: stdinserver + client_identity: backup-srv.example.com + filesystems: { + "zroot/var/db": "ok", + "zroot/usr/home<": "ok", + "zroot/usr/home/paranoid": "!", + } + snapshot_prefix: zrepl_pull_backup_ + interval: 10m + prune: + policy: grid + grid: 1x1d(keep=all) + + +The `serve` section corresponds to the `connect` section in the configuration of `backup-srv`. + +We now want to authenticate `backup-srv` before allowing it to pull data. +This is done by limiting SSH connections from `backup-srv` to execute the `stdinserver` subcommand. + +Open `/root/.ssh/authorized_keys` and add either of the the following lines.:: + + # for OpenSSH >= 7.2 + command="zrepl stdinserver backup-srv.example.com",restrict CLIENT_SSH_KEY + # for older OpenSSH versions + command="zrepl stdinserver backup-srv.example.com",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY + +.. ATTENTION:: + + Replace CLIENT_SSH_KEY with the contents of `/etc/zrepl/ssh/identity.pub` from `app-srv`. + Mind the trailing `.pub` in the filename. + The entries **must** be on a single line, including the replaced CLIENT_SSH_KEY. + + +.. HINT:: + + You may need to adjust the `PermitRootLogin` option in `/etc/ssh/sshd_config` to `forced-commands-only` or higher for this to work. + Refer to sshd_config(5) for details. + +The argument `backup-srv.example.com` is the client identity of `backup-srv` as defined in `jobs.serve.client_identity`. + +Again, you both [stdinserver]({{< relref "configuration/transports.md#ssh-stdinserver" >}}) and the [**source job** format]({{< relref "configuration/jobs.md#source" >}}) are documented. + +Apply Configuration Changes +--------------------------- + +We need to restart the zrepl daemon on **both** `app-srv` and `backup-srv`. + +This is [OS-specific]({{< relref "install/_index.md#restarting" >}}). + +Watch it Work +------------- + +A common setup is to `watch` the log output and `zfs list` of snapshots on both machines. + +If you like tmux, here is a handy script that works on FreeBSD::: + + pkg install gnu-watch tmux + tmux new-window + tmux split-window "tail -f /var/log/zrepl.log" + tmux split-window "gnu-watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'" + tmux select-layout tiled + +The Linux equivalent might look like this:: + + # make sure tmux is installed & let's assume you use systemd + journald + tmux new-window + tmux split-window "journalctl -f -u zrepl.service" + tmux split-window "watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'" + tmux select-layout tiled + +Summary +------- + +Congratulations, you have a working pull backup. Where to go next? + +* Read more about [configuration format, options & job types]({{< relref "configuration/_index.md" >}}) +* Learn about [implementation details]({{}}) of zrepl. + +