mirror of
https://github.com/zrepl/zrepl.git
synced 2024-12-21 22:51:09 +01:00
docs: initial port of hugo to sphinx, including rtd theme
This commit is contained in:
parent
c3af267f48
commit
df181108b4
@ -84,7 +84,7 @@ todo_include_todos = True
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
#
|
||||
html_theme = 'alabaster'
|
||||
html_theme = 'sphinx_rtd_theme'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
|
13
docs/configuration.rst
Normal file
13
docs/configuration.rst
Normal file
@ -0,0 +1,13 @@
|
||||
|
||||
*************
|
||||
Configuration
|
||||
*************
|
||||
|
||||
.. toctree::
|
||||
|
||||
configuration/jobs
|
||||
configuration/transports
|
||||
configuration/map_filter_syntax
|
||||
configuration/prune
|
||||
configuration/logging
|
||||
configuration/misc
|
129
docs/configuration/jobs.rst
Normal file
129
docs/configuration/jobs.rst
Normal file
@ -0,0 +1,129 @@
|
||||
Job Types
|
||||
=========
|
||||
|
||||
A *job* is the unit of activity tracked by the zrepl daemon and configured in the [configuration file]({{< relref "install/_index.md#configuration-files" >}}).
|
||||
|
||||
Every job has a unique `name`, a `type` and type-dependent fields which are documented on this page.
|
||||
|
||||
Check out the [tutorial]({{< relref "tutorial/_index.md" >}}) and {{< sampleconflink >}} for examples on how job types are actually used.
|
||||
|
||||
.. ATTENTION::
|
||||
|
||||
Currently, zrepl does not replicate filesystem properties.
|
||||
Whe receiving a filesystem, it is never mounted (`-u` flag) and `mountpoint=none` is set.
|
||||
This is temporary and being worked on {{< zrepl-issue 24 >}}.
|
||||
|
||||
Source Job
|
||||
----------
|
||||
|
||||
========== ======= =====================
|
||||
Parameter Default Description / Example
|
||||
========== ======= =====================
|
||||
========== ======= =====================
|
||||
|
||||
::
|
||||
|
||||
|-----|-------|-------|
|
||||
|`type`||`source`|
|
||||
|`name`||unique name of the job|
|
||||
|`serve`||{{< zrepl-transport "serve transport" >}} specification|
|
||||
|`datasets`||{{< zrepl-filter >}} for datasets to expose to client|
|
||||
|`snapshot_prefix`||prefix for ZFS snapshots taken by this job|
|
||||
|`interval`||snapshotting interval|
|
||||
|`prune`||{{< zrepl-prune >}} policy for datasets in `datasets` with prefix `snapshot_prefix`|
|
||||
|
||||
* Snapshotting Task (every `interval`, {{% zrepl-job-patient %}})
|
||||
1. A snapshot of filesystems matched by `datasets` is taken every `interval` with prefix `snapshot_prefix`.
|
||||
1. The `prune` policy is triggered on datasets matched by `datasets` with snapshots matched by `snapshot_prefix`.
|
||||
* Serve Task
|
||||
* Wait for connections from pull job using `serve`
|
||||
|
||||
A source job is the counterpart to a [pull job]({{< relref "#pull" >}}).
|
||||
|
||||
Note that the prune policy determines the maximum replication lag:
|
||||
a pull job may stop replication due to link failure, misconfiguration or administrative action.
|
||||
The source prune policy will eventually destroy the last common snapshot between source and pull job, requiring full replication.
|
||||
Make sure you read the [prune policy documentation]({{< relref "configuration/prune.md" >}}).
|
||||
|
||||
Example: {{< sampleconflink "pullbackup/productionhost.yml" >}}
|
||||
|
||||
Pull Job
|
||||
--------
|
||||
|
||||
::
|
||||
|
||||
|Parameter|Default|Description / Example|
|
||||
|-----|-------|-------|
|
||||
|`type`||`pull`|
|
||||
|`name`||unique name of the job|
|
||||
|`connect`||{{< zrepl-transport "connect transport" >}} specification|
|
||||
|`interval`||Interval between pull attempts|
|
||||
|`mapping`||{{< zrepl-mapping >}} for remote to local filesystems|
|
||||
|`initial_repl_policy`|`most_recent`|initial replication policy|
|
||||
|`snapshot_prefix`||prefix filter used for replication & pruning|
|
||||
|`prune`||{{< zrepl-prune >}} policy for local filesystems reachable by `mapping`|
|
||||
|
||||
- Main Task (every `interval`, {{% zrepl-job-patient %}})
|
||||
#. A connection to the remote source job is established using the strategy in `connect`
|
||||
#. `mapping` maps filesystems presented by the remote side to local *target filesystems*
|
||||
#. Those remote filesystems with a local *target filesystem* are replicated
|
||||
#. Only snapshots with prefix `snapshot_prefix` are replicated.
|
||||
#. If possible, incremental replication takes place.
|
||||
#. If the local target filesystem does not exist, `initial_repl_policy` is used.
|
||||
#. On conflicts, an error is logged but replication of other filesystems with mapping continues.
|
||||
#. The `prune` policy is triggered for all *target filesystems*
|
||||
|
||||
A pull job is the counterpart to a [source job]({{< relref "#source" >}}).
|
||||
|
||||
Example: {{< sampleconflink "pullbackup/backuphost.yml" >}}
|
||||
|
||||
Local Job
|
||||
---------
|
||||
|
||||
::
|
||||
|
||||
|Parameter|Default|Description / Example|
|
||||
|-----|-------|-------|
|
||||
|`type`||`local`|
|
||||
|`name`||unique name of the job|
|
||||
|`mapping`||{{<zrepl-mapping>}} from source to target filesystem (both local)|
|
||||
|`snapshot_prefix`||prefix for ZFS snapshots taken by this job|
|
||||
|`interval`|snapshotting & replication interval|
|
||||
|`initial_repl_policy`|`most_recent`|initial replication policy|
|
||||
|`prune_lhs`||pruning policy on left-hand-side (source)|
|
||||
|`prune_rhs`||pruning policy on right-hand-side (target)|
|
||||
|
||||
* Main Task (every `interval`, {{% zrepl-job-patient %}})
|
||||
1. Evaluate `mapping` for local filesystems, those with a *target filesystem* are called *mapped filesystems*.
|
||||
1. Snapshot *mapped filesystems* with `snapshot_prefix`.
|
||||
1. Replicate *mapped filesystems* to their respective *target filesystems*:
|
||||
1. Only snapshots with prefix `snapshot_prefix` are replicated.
|
||||
1. If possible, incremental replication takes place.
|
||||
1. If the *target filesystem* does not exist, `initial_repl_policy` is used.
|
||||
1. On conflicts, an error is logged but replication of other *mapped filesystems* continues.
|
||||
1. The `prune_lhs` policy is triggered for all *mapped filesystems*
|
||||
1. The `prune_rhs` policy is triggered for all *target filesystems*
|
||||
|
||||
A local job is combination of source & pull job executed on the same machine.
|
||||
|
||||
Example: {{< sampleconflink "localbackup/host1.yml" >}}
|
||||
|
||||
Terminology
|
||||
-----------
|
||||
|
||||
task
|
||||
|
||||
A job consists of one or more tasks and a task consists of one or more steps.
|
||||
Some tasks may be periodic while others wait for an event to occur.
|
||||
|
||||
patient task
|
||||
|
||||
A patient task is supposed to execute some task every `interval`.
|
||||
We call the start of the task an *invocation*.
|
||||
|
||||
* If the task completes in less than `interval`, the task is restarted at `last_invocation + interval`.
|
||||
* Otherwise, a patient job
|
||||
* logs a warning as soon as a task exceeds its configured `interval`
|
||||
* waits for the last invocation to finish
|
||||
* logs a warning with the effective task duration
|
||||
* immediately starts a new invocation of the task
|
148
docs/configuration/logging.rst
Normal file
148
docs/configuration/logging.rst
Normal file
@ -0,0 +1,148 @@
|
||||
Logging
|
||||
=======
|
||||
|
||||
zrepl uses structured logging to provide users with easily processable log messages.
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
Logging outlets are configured in the `global` section of the [configuration file]({{< relref "install/_index.md#configuration-files" >}}).<br />
|
||||
Check out {{< sampleconflink "random/logging.yml" >}} for an example on how to configure multiple outlets:
|
||||
|
||||
::
|
||||
|
||||
global:
|
||||
logging:
|
||||
|
||||
- outlet: OUTLET_TYPE
|
||||
level: MINIMUM_LEVEL
|
||||
format: FORMAT
|
||||
|
||||
- outlet: OUTLET_TYPE
|
||||
level: MINIMUM_LEVEL
|
||||
format: FORMAT
|
||||
|
||||
...
|
||||
|
||||
jobs: ...
|
||||
|
||||
Default Configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
By default, the following logging configuration is used
|
||||
|
||||
::
|
||||
|
||||
global:
|
||||
logging:
|
||||
|
||||
- outlet: "stdout"
|
||||
level: "warn"
|
||||
format: "human"
|
||||
|
||||
.. ATTENTION::
|
||||
Output to **stderr** should always be considered a **critical error**.<br />
|
||||
Only errors in the logging infrastructure itself, e.g. IO errors when writing to an outlet, are sent to stderr.
|
||||
|
||||
Building Blocks
|
||||
---------------
|
||||
|
||||
The following sections document the semantics of the different log levels, formats and outlet types.
|
||||
|
||||
Levels
|
||||
~~~~~~
|
||||
|
||||
::
|
||||
|
||||
| Level | SHORT | Description |
|
||||
|-------|-------|-------------|
|
||||
|`error`|`ERRO` | immediate action required |
|
||||
|`warn` |`WARN` | symptoms for misconfiguration, soon expected failure, etc.|
|
||||
|`info` |`INFO` | explains what happens without too much detail |
|
||||
|`debug`|`DEBG` | tracing information, state dumps, etc. useful for debugging. |
|
||||
|
||||
Incorrectly classified messages are considered a bug and should be reported.
|
||||
|
||||
Formats
|
||||
~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
| Format | Description |
|
||||
|--------|---------|
|
||||
|`human` | emphasized context by putting job, task, step and other context variables into brackets before the actual message, followed by remaining fields in logfmt style|
|
||||
|`logfmt`| [logfmt](https://brandur.org/logfmt) output. zrepl uses [github.com/go-logfmt/logfmt](github.com/go-logfmt/logfmt).|
|
||||
|`json` | JSON formatted output. Each line is a valid JSON document. Fields are marshaled by `encoding/json.Marshal()`, which is particularly useful for processing in log aggregation or when processing state dumps.
|
||||
|
||||
Outlets
|
||||
~~~~~~~
|
||||
|
||||
Outlets are ... well ... outlets for log entries into the world.
|
||||
|
||||
**`stdout`**
|
||||
^^^^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
| Parameter | Default | Comment |
|
||||
|-----------| --------- | ----------- |
|
||||
|`outlet` | *none* | required |
|
||||
|`level` | *none* | minimum [log level](#levels), required |
|
||||
|`format` | *none* | output [format](#formats), required |
|
||||
|
||||
Writes all log entries with minimum level `level` formatted by `format` to stdout.
|
||||
|
||||
Can only be specified once.
|
||||
|
||||
**`syslog`**
|
||||
^^^^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
| Parameter | Default | Comment |
|
||||
|-----------| --------- | ----------- |
|
||||
|`outlet` | *none* | required |
|
||||
|`level` | *none* | minimum [log level](#levels), required, usually `debug` |
|
||||
|`format` | *none* | output [format](#formats), required|
|
||||
|`retry_interval`| 0 | Interval between reconnection attempts to syslog |
|
||||
|
||||
Writes all log entries formatted by `format` to syslog.
|
||||
On normal setups, you should not need to change the `retry_interval`.
|
||||
|
||||
Can only be specified once.
|
||||
|
||||
**`tcp`**
|
||||
^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
| Parameter | Default | Comment |
|
||||
|-----------| --------- | ----------- |
|
||||
|`outlet` | *none* | required |
|
||||
|`level` | *none* | minimum [log level](#levels), required |
|
||||
|`format` | *none* | output [format](#formats), required |
|
||||
|`net`|*none*|`tcp` in most cases|
|
||||
|`address`|*none*|remote network, e.g. `logs.example.com:10202`|
|
||||
|`retry_interval`|*none*|Interval between reconnection attempts to `address`|
|
||||
|`tls`|*none*|TLS config (see below)|
|
||||
|
||||
Establishes a TCP connection to `address` and sends log messages with minimum level `level` formatted by `format`.
|
||||
|
||||
If `tls` is not specified, an unencrypted connection is established.
|
||||
|
||||
If `tls` is specified, the TCP connection is secured with TLS + Client Authentication.
|
||||
This is particularly useful in combination with log aggregation services that run on an other machine.
|
||||
|
||||
::
|
||||
|
||||
|Parameter|Description|
|
||||
|---------|-----------|
|
||||
|`ca`|PEM-encoded certificate authority that signed the remote server's TLS certificate|
|
||||
|`cert`|PEM-encoded client certificate identifying this zrepl daemon toward the remote server|
|
||||
|`key`|PEM-encoded, unencrypted client private key identifying this zrepl daemon toward the remote server|
|
||||
|
||||
|
||||
.. NOTE::
|
||||
|
||||
zrepl uses Go's `crypto/tls` and `crypto/x509` packages and leaves all but the required fields in `tls.Config` at their default values.
|
||||
In case of a security defect in these packages, zrepl has to be rebuilt because Go binaries are statically linked.
|
101
docs/configuration/map_filter_syntax.rst
Normal file
101
docs/configuration/map_filter_syntax.rst
Normal file
@ -0,0 +1,101 @@
|
||||
Mapping & Filter Syntax
|
||||
=======================
|
||||
|
||||
For various job types, a filesystem `mapping` or `filter` needs to be
|
||||
specified.
|
||||
|
||||
Both have in common that they take a filesystem path (in the ZFS filesystem hierarchy)as parameters and return something.
|
||||
Mappings return a *target filesystem* and filters return a *filter result*.
|
||||
|
||||
The pattern syntax is the same for mappings and filters and is documented in the following section.
|
||||
|
||||
Common Pattern Syntax
|
||||
---------------------
|
||||
|
||||
A mapping / filter is specified as a **YAML dictionary** with patterns as keys and
|
||||
results as values.<br />
|
||||
The following rules determine which result is chosen for a given filesystem path:
|
||||
|
||||
* More specific path patterns win over less specific ones
|
||||
* Non-wildcard patterns (full path patterns) win over *subtree wildcards* (`<` at end of pattern)
|
||||
|
||||
The **subtree wildcard** `<` means "*the dataset left of `<` and all its children*".
|
||||
|
||||
Example
|
||||
~~~~~~~
|
||||
|
||||
::
|
||||
# Rule number and its pattern
|
||||
1: tank< # tank and all its children
|
||||
2: tank/foo/bar # full path pattern (no wildcard)
|
||||
3: tank/foo< # tank/foo and all its children
|
||||
|
||||
# Which rule applies to given path?
|
||||
tank/foo/bar/loo => 3
|
||||
tank/bar => 1
|
||||
tank/foo/bar => 2
|
||||
zroot => NO MATCH
|
||||
tank/var/log => 1
|
||||
|
||||
|
||||
Mappings
|
||||
--------
|
||||
|
||||
Mappings map a *source filesystem path* to a *target filesystem path*.
|
||||
Per pattern, either a target filesystem path or `"!"` is specified as a result.
|
||||
|
||||
* If no pattern matches, there exists no target filesystem (`NO MATCH`).
|
||||
* If the result is a `"!"`, there exists no target filesystem (`NO MATCH`).
|
||||
* If the pattern is a non-wildcard pattern, the source path is mapped to the target path on the right.
|
||||
* If the pattern ends with a *subtree wildcard* (`<`), the source path is **prefix-trimmed** with the path specified left of `<`.
|
||||
* Note: this means that only for *wildcard-only* patterns (pattern=`<` ) is the source path simply appended to the target path.
|
||||
|
||||
The example is from the {{< sampleconflink "localbackup/host1.yml" >}} example config.
|
||||
|
||||
::
|
||||
|
||||
jobs:
|
||||
- name: mirror_local
|
||||
type: local
|
||||
mapping: {
|
||||
"zroot/var/db<": "storage/backups/local/zroot/var/db",
|
||||
"zroot/usr/home<": "storage/backups/local/zroot/usr/home",
|
||||
"zroot/usr/home/paranoid": "!", #don't backup paranoid user
|
||||
"zroot/poudriere/ports<": "!", #don't backup the ports trees
|
||||
}
|
||||
...
|
||||
|
||||
Results in the following mappings
|
||||
|
||||
::
|
||||
|
||||
zroot/var/db => storage/backups/local/zroot/var/db
|
||||
zroot/var/db/a/child => storage/backups/local/zroot/var/db/a/child
|
||||
zroot/usr/home => storage/backups/local/zroot/usr/home
|
||||
zroot/usr/home/paranoid => NOT MAPPED
|
||||
zroot/usr/home/bob => storage/backups/local/zroot/usr/home/bob
|
||||
zroot/usr/src => NOT MAPPED
|
||||
zroot/poudriere/ports/2017Q3 => NOT MAPPED
|
||||
zroot/poudriere/ports/HEAD => NOT MAPPED
|
||||
|
||||
Filters
|
||||
-------
|
||||
|
||||
Valid filter results: `ok` or `!`.
|
||||
|
||||
The example below show the source job from the [tutorial]({{< relref "tutorial/_index.md#configure-app-srv" >}}):
|
||||
|
||||
The client is allowed access to `zroot/var/db`, `zroot/usr/home` + children except `zroot/usr/home/paranoid`.
|
||||
|
||||
::
|
||||
|
||||
jobs:
|
||||
- name: pull_backup
|
||||
type: source
|
||||
...
|
||||
filesystems: {
|
||||
"zroot/var/db": "ok",
|
||||
"zroot/usr/home<": "ok",
|
||||
"zroot/usr/home/paranoid": "!",
|
||||
}
|
||||
...
|
61
docs/configuration/misc.rst
Normal file
61
docs/configuration/misc.rst
Normal file
@ -0,0 +1,61 @@
|
||||
Miscellaneous
|
||||
=============
|
||||
|
||||
Runtime Directories & UNIX Sockets
|
||||
----------------------------------
|
||||
|
||||
zrepl daemon creates various UNIX sockets to allow communicating with it:
|
||||
|
||||
* the `stdinserver` transport connects to a socket named after `client_identity` parameter
|
||||
* the `control` subcommand connects to a defined control socket
|
||||
|
||||
There is no further authentication on these sockets.
|
||||
Therefore we have to make sure they can only be created and accessed by `zrepl daemon`.
|
||||
|
||||
In fact, `zrepl daemon` will not bind a socket to a path in a directory that is world-accessible.
|
||||
|
||||
The directories can be configured in the main configuration file:
|
||||
|
||||
::
|
||||
|
||||
global:
|
||||
control:
|
||||
sockpath: /var/run/zrepl/control
|
||||
serve:
|
||||
stdinserver:
|
||||
sockdir: /var/run/zrepl/stdinserver
|
||||
|
||||
|
||||
Durations & Intervals
|
||||
---------------------
|
||||
|
||||
Interval & duration fields in job definitions, pruning configurations, etc. must match the following regex:
|
||||
|
||||
::
|
||||
|
||||
var durationStringRegex *regexp.Regexp = regexp.MustCompile(`^\s*(\d+)\s*(s|m|h|d|w)\s*$`)
|
||||
// s = second, m = minute, h = hour, d = day, w = week (7 days)
|
||||
|
||||
Super-Verbose Job Debugging
|
||||
---------------------------
|
||||
|
||||
You have probably landed here because you opened an issue on GitHub and some developer told you to do this...
|
||||
So just read the annotated comments ;)
|
||||
|
||||
::
|
||||
|
||||
job:
|
||||
- name: ...
|
||||
...
|
||||
# JOB DEBUGGING OPTIONS
|
||||
# should be equal for all job types, but each job implements the debugging itself
|
||||
debug:
|
||||
conn: # debug the io.ReadWriteCloser connection
|
||||
read_dump: /tmp/connlog_read # dump results of Read() invocations to this file
|
||||
write_dump: /tmp/connlog_write # dump results of Write() invocations to this file
|
||||
rpc: # debug the RPC protocol implementation
|
||||
log: true # log output from rpc layer to the job log
|
||||
|
||||
.. ATTENTION::
|
||||
|
||||
Connection dumps will almost certainly contain your or other's private data. Do not share it in a bug report.
|
59
docs/configuration/prune.rst
Normal file
59
docs/configuration/prune.rst
Normal file
@ -0,0 +1,59 @@
|
||||
Snapshot Pruning
|
||||
================
|
||||
|
||||
In zrepl, *pruning* means *destroying snapshots by some policy*.
|
||||
|
||||
A *pruning policy* takes a list of snapshots and - for each snapshot - decides whether it should be kept or destroyed.
|
||||
|
||||
The job context defines which snapshots are even considered for pruning, for example through the `snapshot_prefix` variable.
|
||||
Check the [job definition]({{< relref "configuration/jobs.md">}}) for details.
|
||||
|
||||
Currently, the retention grid is the only supported pruning policy.
|
||||
|
||||
Retention Grid
|
||||
--------------
|
||||
|
||||
::
|
||||
|
||||
jobs:
|
||||
- name: pull_app-srv
|
||||
...
|
||||
prune:
|
||||
policy: grid
|
||||
grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d
|
||||
│ │
|
||||
└─ one hour interval
|
||||
│
|
||||
└─ 24 adjacent one-hour intervals
|
||||
|
||||
The retention grid can be thought of as a time-based sieve:
|
||||
|
||||
The `grid` field specifies a list of adjacent time intervals:
|
||||
the left edge of the leftmost (first) interval is the `creation` date of the youngest snapshot.
|
||||
All intervals to its right describe time intervals further in the past.
|
||||
|
||||
Each interval carries a maximum number of snapshots to keep.
|
||||
It is secified via `(keep=N)`, where `N` is either `all` (all snapshots are kept) or a positive integer.
|
||||
The default value is **1**.
|
||||
|
||||
The following procedure happens during pruning:
|
||||
|
||||
1. The list of snapshots eligible for pruning is sorted by `creation`
|
||||
1. The left edge of the first interval is aligned to the `creation` date of the youngest snapshot
|
||||
1. A list of buckets is created, one for each interval
|
||||
1. The list of snapshots is split up into the buckets.
|
||||
1. For each bucket
|
||||
|
||||
1. the contained snapshot list is sorted by creation.
|
||||
1. snapshots from the list, oldest first, are destroyed until the specified `keep` count is reached.
|
||||
1. all remaining snapshots on the list are kept.
|
||||
|
||||
.. ATTENTION::
|
||||
|
||||
The configuration of the first interval (`1x1h(keep=all)` in the example) determines the **maximum allowable replication lag** between source and destination.
|
||||
After the first interval, source and destination likely have different retention settings.
|
||||
This means source and destination may prune different snapshots, prohibiting incremental replication froms snapshots that are not in the first interval.
|
||||
|
||||
**Always** configure the first interval to **`1x?(keep=all)`**, substituting `?` with the maximum time replication may fail due to downtimes, maintenance, connectivity issues, etc.
|
||||
After outages longer than `?` you may be required to perform **full replication** again.
|
||||
|
103
docs/configuration/transports.rst
Normal file
103
docs/configuration/transports.rst
Normal file
@ -0,0 +1,103 @@
|
||||
.. highlight:: bash
|
||||
|
||||
Transports
|
||||
==========
|
||||
|
||||
A transport provides an authenticated [`io.ReadWriteCloser`](https://golang.org/pkg/io/#ReadWriteCloser) to the RPC layer.
|
||||
(An `io.ReadWriteCloser` is essentially a bidirectional reliable communication channel.)
|
||||
|
||||
Currently, only the `ssh+stdinserver` transport is supported.
|
||||
|
||||
`ssh+stdinserver`
|
||||
-----------------
|
||||
|
||||
The way the `ssh+stdinserver` transport works is inspired by [git shell](https://git-scm.com/docs/git-shell) and [Borg Backup](https://borgbackup.readthedocs.io/en/stable/deployment.html).
|
||||
It is implemented in the Go package `github.com/zrepl/zrepl/sshbytestream`.
|
||||
The config excerpts are taken from the [tutorial]({{< relref "tutorial/_index.md" >}}) which you should complete before reading further.
|
||||
|
||||
`serve`
|
||||
~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
jobs:
|
||||
- name: pull_backup
|
||||
type: source
|
||||
serve:
|
||||
type: stdinserver
|
||||
client_identity: backup-srv.example.com
|
||||
...
|
||||
|
||||
The serving job opens a UNIX socket named after `client_identity` in the runtime directory, e.g. `/var/run/zrepl/stdinserver/backup-srv.example.com`.
|
||||
|
||||
On the same machine, the :code:`zrepl stdinserver $client_identity` command connects to that socket.
|
||||
For example, `zrepl stdinserver backup-srv.example.com` connects to the UNIX socket `/var/run/zrepl/stdinserver/backup-srv.example.com`.
|
||||
|
||||
It then passes its stdin and stdout file descriptors to the zrepl daemon via *cmsg(3)*.
|
||||
zrepl daemon in turn combines them into an `io.ReadWriteCloser`:
|
||||
a `Write()` turns into a write to stdout, a `Read()` turns into a read from stdin.
|
||||
|
||||
Interactive use of the `stdinserver` subcommand does not make much sense.
|
||||
However, we can force its execution when a user with a particular SSH pubkey connects via SSH.
|
||||
This can be achieved with an entry in the `authorized_keys` file of the serving zrepl daemon.
|
||||
|
||||
::
|
||||
|
||||
# for OpenSSH >= 7.2
|
||||
command="zrepl stdinserver CLIENT_IDENTITY",restrict CLIENT_SSH_KEY
|
||||
# for older OpenSSH versions
|
||||
command="zrepl stdinserver CLIENT_IDENTITY",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY
|
||||
|
||||
* CLIENT_IDENTITY is substituted with `backup-srv.example.com` in our example
|
||||
* CLIENT_SSH_KEY is substituted with the public part of the SSH keypair specified in the `connect` directive on the connecting host.
|
||||
|
||||
.. NOTE::
|
||||
You may need to adjust the `PermitRootLogin` option in `/etc/ssh/sshd_config` to `forced-commands-only` or higher for this to work.
|
||||
Refer to sshd_config(5) for details.
|
||||
|
||||
To recap, this is of how client authentication works with the `ssh+stdinserver` transport:
|
||||
|
||||
* Connections to the `client_identity` UNIX socket are blindly trusted by zrepl daemon.
|
||||
* Thus, the runtime directory must be private to the zrepl user (checked by zrepl daemon)
|
||||
* The admin of the host with the serving zrepl daemon controls the `authorized_keys` file.
|
||||
* Thus, the administrator controls the mapping `PUBKEY -> CLIENT_IDENTITY`.
|
||||
|
||||
`connect`
|
||||
~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
jobs:
|
||||
- name: pull_app-srv
|
||||
type: pull
|
||||
connect:
|
||||
type: ssh+stdinserver
|
||||
host: app-srv.example.com
|
||||
user: root
|
||||
port: 22
|
||||
identity_file: /etc/zrepl/ssh/identity
|
||||
options: # optional
|
||||
- "Compression=on"
|
||||
|
||||
The connecting zrepl daemon
|
||||
|
||||
1. Creates a pipe
|
||||
1. Forks
|
||||
1. In the forked process
|
||||
|
||||
1. Replaces forked stdin and stdout with the corresponding pipe ends
|
||||
1. Executes the `ssh` binary found in `$PATH`.
|
||||
|
||||
1. The identity file (`-i`) is set to `$identity_file`.
|
||||
1. The remote user, host and port correspond to those configured.
|
||||
1. Further options can be specified using the `options` field, which appends each entry in the list to the command line using `-o $entry`.
|
||||
|
||||
1. Wraps the pipe ends in an `io.ReadWriteCloser` and uses it for RPC.
|
||||
|
||||
As discussed in the section above, the connecting zrepl daemon expects that `zrepl stdinserver $client_identity` is executed automatically via an `authorized_keys` file entry.
|
||||
|
||||
.. NOTE::
|
||||
|
||||
The environment variables of the underlying SSH process are cleared. `$SSH_AUTH_SOCK` will not be available.
|
||||
It is suggested to create a separate, unencrypted SSH key solely for that purpose.
|
||||
|
58
docs/implementation.rst
Normal file
58
docs/implementation.rst
Normal file
@ -0,0 +1,58 @@
|
||||
Implementation Overview
|
||||
=======================
|
||||
|
||||
.. WARNING::
|
||||
|
||||
Incomplete / under construction
|
||||
|
||||
The following design aspects may convince you that `zrepl` is superior to a hacked-together shell script solution.
|
||||
|
||||
Testability & Performance
|
||||
-------------------------
|
||||
|
||||
zrepl is written in Go, a real programming language with type safety,
|
||||
reasonable performance, testing infrastructure and an (opinionated) idea of
|
||||
software engineering.
|
||||
|
||||
* key parts & algorithms of zrepl are covered by unit tests (work in progress)
|
||||
* zrepl is noticably faster than comparable shell scripts
|
||||
|
||||
|
||||
RPC protocol
|
||||
------------
|
||||
|
||||
While it is tempting to just issue a few `ssh remote 'zfs send ...' | zfs recv`, this has a number of drawbacks:
|
||||
|
||||
* The snapshot streams need to be compatible.
|
||||
* Communication is still unidirectional. Thus, you will most likely
|
||||
* either not take advantage of features such as *compressed send & recv*
|
||||
* or issue additional `ssh` commands in advance to figure out what features are supported on the other side.
|
||||
* Advanced logic in shell scripts is ugly to read, poorly testable and a pain to maintain.
|
||||
|
||||
zrepl takes a different approach:
|
||||
|
||||
* Define an RPC protocol.
|
||||
* Establish an encrypted, authenticated, bidirectional communication channel...
|
||||
* ... with zrepl running at both ends of it.
|
||||
|
||||
This has several obvious benefits:
|
||||
|
||||
* No blank root shell access is given to the other side.
|
||||
* Instead, an *authenticated* peer can *request* filesystem lists, snapshot streams, etc.
|
||||
* Requests are then checked against job-specific ACLs, limiting a client to the filesystems it is actually allowed to replicate.
|
||||
* The {{< zrepl-transport "transport mechanism" >}} is decoupled from the remaining logic, keeping it extensible.
|
||||
|
||||
Protocol Implementation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
zrepl implements its own RPC protocol.
|
||||
This is mostly due to the fact that existing solutions do not provide efficient means to transport large amounts of data.
|
||||
|
||||
Package [`github.com/zrepl/zrepl/rpc`](https://github.com/zrepl/zrepl/tree/master/rpc) builds a special-case handling around returning an `io.Reader` as part of a unary RPC call.
|
||||
|
||||
Measurements show only a single memory-to-memory copy of a snapshot stream is made using `github.com/zrepl/zrepl/rpc`, and there is still potential for further optimizations.
|
||||
|
||||
Logging & Transparency
|
||||
----------------------
|
||||
|
||||
zrepl comes with [rich, structured and configurable logging]({{< relref "configuration/logging.md" >}}), allowing administators to understand what the software is actually doing.
|
@ -3,18 +3,68 @@
|
||||
You can adapt this file completely to your liking, but it should at least
|
||||
contain the root `toctree` directive.
|
||||
|
||||
Welcome to zrepl's documentation!
|
||||
=================================
|
||||
zrepl - ZFS replication
|
||||
-----------------------
|
||||
|
||||
.. ATTENTION::
|
||||
zrepl as well as this documentation is still under active development.
|
||||
It is neither feature complete nor is there a stability guarantee on the configuration format.
|
||||
Use & test at your own risk ;)
|
||||
|
||||
Getting started
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The [5 minute tutorial setup]({{< relref "tutorial/_index.md" >}}) gives you a first impression.
|
||||
|
||||
Main Features
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
* Filesystem Replication
|
||||
* [x] Local & Remote
|
||||
* [x] Pull mode
|
||||
* [ ] Push mode
|
||||
* [x] Access control checks when pulling datasets
|
||||
* [x] [Flexible mapping]({{< ref "configuration/map_filter_syntax.md" >}}) rules
|
||||
* [x] Bookmarks support
|
||||
* [ ] Feature-negotiation for
|
||||
* Resumable `send & receive`
|
||||
* Compressed `send & receive`
|
||||
* Raw encrypted `send & receive` (as soon as it is available)
|
||||
* Automatic snapshot creation
|
||||
* [x] Ensure fixed time interval between snapshots
|
||||
* Automatic snapshot [pruning]({{< ref "configuration/prune.md" >}})
|
||||
* [x] Age-based fading (grandfathering scheme)
|
||||
* Flexible, detailed & structured [logging]({{< ref "configuration/logging.md" >}})
|
||||
* [x] `human`, `logfmt` and `json` formatting
|
||||
* [x] stdout, syslog and TCP (+TLS client auth) outlets
|
||||
* Maintainable implementation in Go
|
||||
* [x] Cross platform
|
||||
* [x] Type safe & testable code
|
||||
|
||||
Contributing
|
||||
~~~~~~~~~~~~
|
||||
|
||||
We are happy about any help we can get!
|
||||
|
||||
* Explore the codebase
|
||||
* These docs live in the `docs/` subdirectory
|
||||
* Document any non-obvious / confusing / plain broken behavior you encounter when setting up zrepl for the first time
|
||||
* Check the *Issues* and *Projects* sections for things to do
|
||||
|
||||
{{% panel header="<i class='fa fa-github'></i> Development Workflow"%}}
|
||||
[The <i class='fa fa-github'></i> GitHub repository](https://github.com/zrepl/zrepl) is where all development happens.<br />
|
||||
Make sure to read the [Developer Documentation section](https://github.com/zrepl/zrepl) and open new issues or pull requests there.
|
||||
{{% /panel %}}
|
||||
|
||||
Table of Contents
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
|
||||
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
||||
* :ref:`genindex`
|
||||
* :ref:`modindex`
|
||||
* :ref:`search`
|
||||
tutorial
|
||||
installation
|
||||
configuration
|
||||
implementation
|
||||
pr
|
||||
|
88
docs/installation.rst
Normal file
88
docs/installation.rst
Normal file
@ -0,0 +1,88 @@
|
||||
Installation
|
||||
============
|
||||
|
||||
.. TIP::
|
||||
|
||||
Note: check out the [tutorial]({{< relref "tutorial/_index.md" >}}) if you want a first impression of zrepl.
|
||||
|
||||
User Privileges
|
||||
---------------
|
||||
|
||||
It is possible to run zrepl as an unprivileged user in combination with
|
||||
[ZFS delegation](https://www.freebsd.org/doc/handbook/zfs-zfs-allow.html).
|
||||
|
||||
Also, there is the possibility to run it in a jail on FreeBSD by delegating a dataset to the jail.
|
||||
|
||||
However, until we get around documenting those setups, you will have to run zrepl as root or experiment yourself :)
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
zrepl is currently not packaged on any operating system. Signed & versioned releases are planned but not available yet.
|
||||
|
||||
Check out the sources yourself, fetch dependencies using dep, compile and install to the zrepl user's `$PATH`.<br />
|
||||
**Note**: if the zrepl binary is not in `$PATH`, you will have to adjust the examples in the [tutorial]({{< relref "tutorial/_index.md" >}}).
|
||||
|
||||
::
|
||||
|
||||
# NOTE: you may want to checkout & build as an unprivileged user
|
||||
cd /root
|
||||
git clone https://github.com/zrepl/zrepl.git
|
||||
cd zrepl
|
||||
dep ensure
|
||||
go build -o zrepl
|
||||
cp zrepl /usr/local/bin/zrepl
|
||||
rehash
|
||||
# see if it worked
|
||||
zrepl help
|
||||
|
||||
Configuration Files
|
||||
-------------------
|
||||
|
||||
zrepl searches for its main configuration file in the following locations (in that order):
|
||||
|
||||
* `/etc/zrepl/zrepl.yml`
|
||||
* `/usr/local/etc/zrepl/zrepl.yml`
|
||||
|
||||
Alternatively, use CLI flags to specify a config location.
|
||||
|
||||
Copy a config from the [tutorial]({{< relref "tutorial/_index.md" >}}) or the `cmd/sampleconf` directory to one of these locations and customize it to your setup.
|
||||
|
||||
## Runtime Directories
|
||||
|
||||
Check the the [configuration documentation]({{< relref "configuration/misc.md#runtime-directories-unix-sockets" >}}) for more information.
|
||||
For default settings, the following should to the trick.
|
||||
|
||||
```bash
|
||||
mkdir -p /var/run/zrepl/stdinserver
|
||||
chmod -R 0700 /var/run/zrepl
|
||||
```
|
||||
|
||||
|
||||
Running the Daemon
|
||||
------------------
|
||||
|
||||
All actual work zrepl does is performed by a daemon process.
|
||||
|
||||
Logging is configurable via the config file. Please refer to the [logging documentation]({{< relref "configuration/logging.md" >}}).
|
||||
|
||||
::
|
||||
|
||||
zrepl daemon
|
||||
|
||||
There are no *rc(8)* or *systemd.service(5)* service definitions yet. Note the *daemon(8)* utility on FreeBSD.
|
||||
|
||||
.. ATTENTION::
|
||||
|
||||
Make sure to actually monitor the error level output of zrepl: some configuration errors will not make the daemon exit.<br />
|
||||
Example: if the daemon cannot create the [stdinserver]({{< relref "configuration/transports.md#stdinserver" >}}) sockets
|
||||
in the runtime directory, it will emit an error message but not exit because other tasks such as periodic snapshots & pruning are of equal importance.
|
||||
|
||||
Restarting
|
||||
~~~~~~~~~~
|
||||
|
||||
The daemon handles SIGINT and SIGTERM for graceful shutdown.
|
||||
|
||||
Graceful shutdown means at worst that a job will not be rescheduled for the next interval.
|
||||
|
||||
The daemon exits as soon as all jobs have reported shut down.
|
5
docs/pr.rst
Normal file
5
docs/pr.rst
Normal file
@ -0,0 +1,5 @@
|
||||
Talks & Presentations
|
||||
=====================
|
||||
|
||||
* Talk at EuroBSDCon2017 FreeBSD DevSummit ([Slides](https://docs.google.com/presentation/d/1EmmeEvOXAWJHCVnOS9-TTsxswbcGKmeLWdY_6BH4w0Q/edit?usp=sharing), [Event](https://wiki.freebsd.org/DevSummit/201709))
|
||||
|
183
docs/tutorial.rst
Normal file
183
docs/tutorial.rst
Normal file
@ -0,0 +1,183 @@
|
||||
Tutorial
|
||||
========
|
||||
|
||||
|
||||
This tutorial shows how zrepl can be used to implement a ZFS-based pull backup.
|
||||
We assume the following scenario:
|
||||
|
||||
* Production server `app-srv` with filesystems to back up:
|
||||
|
||||
* `zroot/var/db`
|
||||
* `zroot/usr/home` and all its child filesystems
|
||||
* **except** `zroot/usr/home/paranoid` belonging to a user doing backups themselves
|
||||
|
||||
* Backup server `backup-srv` with
|
||||
|
||||
* Filesystem `storage/zrepl/pull/app-srv` + children dedicated to backups of `app-srv`
|
||||
|
||||
Our backup solution should fulfill the following requirements:
|
||||
|
||||
* Periodically snapshot the filesystems on `app-srv` *every 10 minutes*
|
||||
* Incrementally replicate these snapshots to `storage/zrepl/pull/app-srv/*` on `backup-srv`
|
||||
* Keep only very few snapshots on `app-srv` to save disk space
|
||||
* Keep a fading history (24 hourly, 30 daily, 6 monthly) of snapshots on `backup-srv`
|
||||
|
||||
Analysis
|
||||
--------
|
||||
|
||||
We can model this situation as two jobs:
|
||||
|
||||
* A **source job** on `app-srv`
|
||||
|
||||
* Creates the snapshots
|
||||
* Keeps a short history of snapshots to enable incremental replication to `backup-srv`
|
||||
* Accepts connections from `backup-srv`
|
||||
|
||||
* A **pull job** on `backup-srv`
|
||||
|
||||
* Connects to the `zrepl daemon` process on `app-srv`
|
||||
* Pulls the snapshots to `storage/zrepl/pull/app-srv/*`
|
||||
* Fades out snapshots in `storage/zrepl/pull/app-srv/*` as they age
|
||||
|
||||
|
||||
Why doesn't the **pull job** create the snapshots before pulling?
|
||||
|
||||
As is the case with all distributed systems, the link between `app-srv` and `backup-srv` might be down for an hour or two.
|
||||
We do not want to sacrifice our required backup resolution of 10 minute intervals for a temporary connection outage.
|
||||
|
||||
When the link comes up again, `backup-srv` will happily catch up the 12 snapshots taken by `app-srv` in the meantime, without
|
||||
a gap in our backup history.
|
||||
|
||||
Install zrepl
|
||||
-------------
|
||||
|
||||
Follow the [OS-specific installation instructions]({{< relref "install/_index.md" >}}) and come back here.
|
||||
|
||||
Configure `backup-srv`
|
||||
----------------------
|
||||
|
||||
We define a **pull job** named `pull_app-srv` in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}} ):::
|
||||
|
||||
jobs:
|
||||
- name: pull_app-srv
|
||||
type: pull
|
||||
connect:
|
||||
type: ssh+stdinserver
|
||||
host: app-srv.example.com
|
||||
user: root
|
||||
port: 22
|
||||
identity_file: /etc/zrepl/ssh/identity
|
||||
interval: 10m
|
||||
mapping: {
|
||||
"<":"storage/zrepl/pull/app-srv"
|
||||
}
|
||||
initial_repl_policy: most_recent
|
||||
snapshot_prefix: zrepl_pull_backup_
|
||||
prune:
|
||||
policy: grid
|
||||
grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d
|
||||
|
||||
The `connect` section instructs the zrepl daemon to use the `stdinserver` transport:
|
||||
`backup-srv` will connect to the specified SSH server and expect `zrepl stdinserver CLIENT_IDENTITY` instead of the shell on the other side.
|
||||
|
||||
It uses the private key specified at `connect.identity_file` which we still need to create:::
|
||||
|
||||
cd /etc/zrepl
|
||||
mkdir -p ssh
|
||||
chmod 0700 ssh
|
||||
ssh-keygen -t ed25519 -N '' -f /etc/zrepl/ssh/identity
|
||||
|
||||
Note that most use cases do not benefit from separate keypairs per remote endpoint.
|
||||
Thus, it is sufficient to create one keypair and use it for all `connect` directives on one host.
|
||||
|
||||
Learn more about [stdinserver]({{< relref "configuration/transports.md#ssh-stdinserver" >}}) and the [**pull job** format]({{< relref "configuration/jobs.md#pull" >}}).
|
||||
|
||||
Configure `app-srv`
|
||||
-------------------
|
||||
|
||||
We define a corresponding **source job** named `pull_backup` in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}})
|
||||
`zrepl.yml`:::
|
||||
|
||||
jobs:
|
||||
- name: pull_backup
|
||||
type: source
|
||||
serve:
|
||||
type: stdinserver
|
||||
client_identity: backup-srv.example.com
|
||||
filesystems: {
|
||||
"zroot/var/db": "ok",
|
||||
"zroot/usr/home<": "ok",
|
||||
"zroot/usr/home/paranoid": "!",
|
||||
}
|
||||
snapshot_prefix: zrepl_pull_backup_
|
||||
interval: 10m
|
||||
prune:
|
||||
policy: grid
|
||||
grid: 1x1d(keep=all)
|
||||
|
||||
|
||||
The `serve` section corresponds to the `connect` section in the configuration of `backup-srv`.
|
||||
|
||||
We now want to authenticate `backup-srv` before allowing it to pull data.
|
||||
This is done by limiting SSH connections from `backup-srv` to execute the `stdinserver` subcommand.
|
||||
|
||||
Open `/root/.ssh/authorized_keys` and add either of the the following lines.::
|
||||
|
||||
# for OpenSSH >= 7.2
|
||||
command="zrepl stdinserver backup-srv.example.com",restrict CLIENT_SSH_KEY
|
||||
# for older OpenSSH versions
|
||||
command="zrepl stdinserver backup-srv.example.com",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY
|
||||
|
||||
.. ATTENTION::
|
||||
|
||||
Replace CLIENT_SSH_KEY with the contents of `/etc/zrepl/ssh/identity.pub` from `app-srv`.
|
||||
Mind the trailing `.pub` in the filename.
|
||||
The entries **must** be on a single line, including the replaced CLIENT_SSH_KEY.
|
||||
|
||||
|
||||
.. HINT::
|
||||
|
||||
You may need to adjust the `PermitRootLogin` option in `/etc/ssh/sshd_config` to `forced-commands-only` or higher for this to work.
|
||||
Refer to sshd_config(5) for details.
|
||||
|
||||
The argument `backup-srv.example.com` is the client identity of `backup-srv` as defined in `jobs.serve.client_identity`.
|
||||
|
||||
Again, you both [stdinserver]({{< relref "configuration/transports.md#ssh-stdinserver" >}}) and the [**source job** format]({{< relref "configuration/jobs.md#source" >}}) are documented.
|
||||
|
||||
Apply Configuration Changes
|
||||
---------------------------
|
||||
|
||||
We need to restart the zrepl daemon on **both** `app-srv` and `backup-srv`.
|
||||
|
||||
This is [OS-specific]({{< relref "install/_index.md#restarting" >}}).
|
||||
|
||||
Watch it Work
|
||||
-------------
|
||||
|
||||
A common setup is to `watch` the log output and `zfs list` of snapshots on both machines.
|
||||
|
||||
If you like tmux, here is a handy script that works on FreeBSD:::
|
||||
|
||||
pkg install gnu-watch tmux
|
||||
tmux new-window
|
||||
tmux split-window "tail -f /var/log/zrepl.log"
|
||||
tmux split-window "gnu-watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
|
||||
tmux select-layout tiled
|
||||
|
||||
The Linux equivalent might look like this::
|
||||
|
||||
# make sure tmux is installed & let's assume you use systemd + journald
|
||||
tmux new-window
|
||||
tmux split-window "journalctl -f -u zrepl.service"
|
||||
tmux split-window "watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
|
||||
tmux select-layout tiled
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
Congratulations, you have a working pull backup. Where to go next?
|
||||
|
||||
* Read more about [configuration format, options & job types]({{< relref "configuration/_index.md" >}})
|
||||
* Learn about [implementation details]({{<relref "impl/_index.md" >}}) of zrepl.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user