docs: initial port of hugo to sphinx, including rtd theme

This commit is contained in:
Christian Schwarz 2017-11-09 20:33:09 +01:00
parent c3af267f48
commit df181108b4
13 changed files with 1009 additions and 11 deletions

View File

@ -84,7 +84,7 @@ todo_include_todos = True
# The theme to use for HTML and HTML Help pages. See the documentation for # The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes. # a list of builtin themes.
# #
html_theme = 'alabaster' html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme # Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the # further. For a list of options available for each theme, see the

13
docs/configuration.rst Normal file
View File

@ -0,0 +1,13 @@
*************
Configuration
*************
.. toctree::
configuration/jobs
configuration/transports
configuration/map_filter_syntax
configuration/prune
configuration/logging
configuration/misc

129
docs/configuration/jobs.rst Normal file
View File

@ -0,0 +1,129 @@
Job Types
=========
A *job* is the unit of activity tracked by the zrepl daemon and configured in the [configuration file]({{< relref "install/_index.md#configuration-files" >}}).
Every job has a unique `name`, a `type` and type-dependent fields which are documented on this page.
Check out the [tutorial]({{< relref "tutorial/_index.md" >}}) and {{< sampleconflink >}} for examples on how job types are actually used.
.. ATTENTION::
Currently, zrepl does not replicate filesystem properties.
Whe receiving a filesystem, it is never mounted (`-u` flag) and `mountpoint=none` is set.
This is temporary and being worked on {{< zrepl-issue 24 >}}.
Source Job
----------
========== ======= =====================
Parameter Default Description / Example
========== ======= =====================
========== ======= =====================
::
|-----|-------|-------|
|`type`||`source`|
|`name`||unique name of the job|
|`serve`||{{< zrepl-transport "serve transport" >}} specification|
|`datasets`||{{< zrepl-filter >}} for datasets to expose to client|
|`snapshot_prefix`||prefix for ZFS snapshots taken by this job|
|`interval`||snapshotting interval|
|`prune`||{{< zrepl-prune >}} policy for datasets in `datasets` with prefix `snapshot_prefix`|
* Snapshotting Task (every `interval`, {{% zrepl-job-patient %}})
1. A snapshot of filesystems matched by `datasets` is taken every `interval` with prefix `snapshot_prefix`.
1. The `prune` policy is triggered on datasets matched by `datasets` with snapshots matched by `snapshot_prefix`.
* Serve Task
* Wait for connections from pull job using `serve`
A source job is the counterpart to a [pull job]({{< relref "#pull" >}}).
Note that the prune policy determines the maximum replication lag:
a pull job may stop replication due to link failure, misconfiguration or administrative action.
The source prune policy will eventually destroy the last common snapshot between source and pull job, requiring full replication.
Make sure you read the [prune policy documentation]({{< relref "configuration/prune.md" >}}).
Example: {{< sampleconflink "pullbackup/productionhost.yml" >}}
Pull Job
--------
::
|Parameter|Default|Description / Example|
|-----|-------|-------|
|`type`||`pull`|
|`name`||unique name of the job|
|`connect`||{{< zrepl-transport "connect transport" >}} specification|
|`interval`||Interval between pull attempts|
|`mapping`||{{< zrepl-mapping >}} for remote to local filesystems|
|`initial_repl_policy`|`most_recent`|initial replication policy|
|`snapshot_prefix`||prefix filter used for replication & pruning|
|`prune`||{{< zrepl-prune >}} policy for local filesystems reachable by `mapping`|
- Main Task (every `interval`, {{% zrepl-job-patient %}})
#. A connection to the remote source job is established using the strategy in `connect`
#. `mapping` maps filesystems presented by the remote side to local *target filesystems*
#. Those remote filesystems with a local *target filesystem* are replicated
#. Only snapshots with prefix `snapshot_prefix` are replicated.
#. If possible, incremental replication takes place.
#. If the local target filesystem does not exist, `initial_repl_policy` is used.
#. On conflicts, an error is logged but replication of other filesystems with mapping continues.
#. The `prune` policy is triggered for all *target filesystems*
A pull job is the counterpart to a [source job]({{< relref "#source" >}}).
Example: {{< sampleconflink "pullbackup/backuphost.yml" >}}
Local Job
---------
::
|Parameter|Default|Description / Example|
|-----|-------|-------|
|`type`||`local`|
|`name`||unique name of the job|
|`mapping`||{{<zrepl-mapping>}} from source to target filesystem (both local)|
|`snapshot_prefix`||prefix for ZFS snapshots taken by this job|
|`interval`|snapshotting & replication interval|
|`initial_repl_policy`|`most_recent`|initial replication policy|
|`prune_lhs`||pruning policy on left-hand-side (source)|
|`prune_rhs`||pruning policy on right-hand-side (target)|
* Main Task (every `interval`, {{% zrepl-job-patient %}})
1. Evaluate `mapping` for local filesystems, those with a *target filesystem* are called *mapped filesystems*.
1. Snapshot *mapped filesystems* with `snapshot_prefix`.
1. Replicate *mapped filesystems* to their respective *target filesystems*:
1. Only snapshots with prefix `snapshot_prefix` are replicated.
1. If possible, incremental replication takes place.
1. If the *target filesystem* does not exist, `initial_repl_policy` is used.
1. On conflicts, an error is logged but replication of other *mapped filesystems* continues.
1. The `prune_lhs` policy is triggered for all *mapped filesystems*
1. The `prune_rhs` policy is triggered for all *target filesystems*
A local job is combination of source & pull job executed on the same machine.
Example: {{< sampleconflink "localbackup/host1.yml" >}}
Terminology
-----------
task
A job consists of one or more tasks and a task consists of one or more steps.
Some tasks may be periodic while others wait for an event to occur.
patient task
A patient task is supposed to execute some task every `interval`.
We call the start of the task an *invocation*.
* If the task completes in less than `interval`, the task is restarted at `last_invocation + interval`.
* Otherwise, a patient job
* logs a warning as soon as a task exceeds its configured `interval`
* waits for the last invocation to finish
* logs a warning with the effective task duration
* immediately starts a new invocation of the task

View File

@ -0,0 +1,148 @@
Logging
=======
zrepl uses structured logging to provide users with easily processable log messages.
Configuration
-------------
Logging outlets are configured in the `global` section of the [configuration file]({{< relref "install/_index.md#configuration-files" >}}).<br />
Check out {{< sampleconflink "random/logging.yml" >}} for an example on how to configure multiple outlets:
::
global:
logging:
- outlet: OUTLET_TYPE
level: MINIMUM_LEVEL
format: FORMAT
- outlet: OUTLET_TYPE
level: MINIMUM_LEVEL
format: FORMAT
...
jobs: ...
Default Configuration
~~~~~~~~~~~~~~~~~~~~~
By default, the following logging configuration is used
::
global:
logging:
- outlet: "stdout"
level: "warn"
format: "human"
.. ATTENTION::
Output to **stderr** should always be considered a **critical error**.<br />
Only errors in the logging infrastructure itself, e.g. IO errors when writing to an outlet, are sent to stderr.
Building Blocks
---------------
The following sections document the semantics of the different log levels, formats and outlet types.
Levels
~~~~~~
::
| Level | SHORT | Description |
|-------|-------|-------------|
|`error`|`ERRO` | immediate action required |
|`warn` |`WARN` | symptoms for misconfiguration, soon expected failure, etc.|
|`info` |`INFO` | explains what happens without too much detail |
|`debug`|`DEBG` | tracing information, state dumps, etc. useful for debugging. |
Incorrectly classified messages are considered a bug and should be reported.
Formats
~~~~~~~
::
| Format | Description |
|--------|---------|
|`human` | emphasized context by putting job, task, step and other context variables into brackets before the actual message, followed by remaining fields in logfmt style|
|`logfmt`| [logfmt](https://brandur.org/logfmt) output. zrepl uses [github.com/go-logfmt/logfmt](github.com/go-logfmt/logfmt).|
|`json` | JSON formatted output. Each line is a valid JSON document. Fields are marshaled by `encoding/json.Marshal()`, which is particularly useful for processing in log aggregation or when processing state dumps.
Outlets
~~~~~~~
Outlets are ... well ... outlets for log entries into the world.
**`stdout`**
^^^^^^^^^^^^
::
| Parameter | Default | Comment |
|-----------| --------- | ----------- |
|`outlet` | *none* | required |
|`level` | *none* | minimum [log level](#levels), required |
|`format` | *none* | output [format](#formats), required |
Writes all log entries with minimum level `level` formatted by `format` to stdout.
Can only be specified once.
**`syslog`**
^^^^^^^^^^^^
::
| Parameter | Default | Comment |
|-----------| --------- | ----------- |
|`outlet` | *none* | required |
|`level` | *none* | minimum [log level](#levels), required, usually `debug` |
|`format` | *none* | output [format](#formats), required|
|`retry_interval`| 0 | Interval between reconnection attempts to syslog |
Writes all log entries formatted by `format` to syslog.
On normal setups, you should not need to change the `retry_interval`.
Can only be specified once.
**`tcp`**
^^^^^^^^^
::
| Parameter | Default | Comment |
|-----------| --------- | ----------- |
|`outlet` | *none* | required |
|`level` | *none* | minimum [log level](#levels), required |
|`format` | *none* | output [format](#formats), required |
|`net`|*none*|`tcp` in most cases|
|`address`|*none*|remote network, e.g. `logs.example.com:10202`|
|`retry_interval`|*none*|Interval between reconnection attempts to `address`|
|`tls`|*none*|TLS config (see below)|
Establishes a TCP connection to `address` and sends log messages with minimum level `level` formatted by `format`.
If `tls` is not specified, an unencrypted connection is established.
If `tls` is specified, the TCP connection is secured with TLS + Client Authentication.
This is particularly useful in combination with log aggregation services that run on an other machine.
::
|Parameter|Description|
|---------|-----------|
|`ca`|PEM-encoded certificate authority that signed the remote server's TLS certificate|
|`cert`|PEM-encoded client certificate identifying this zrepl daemon toward the remote server|
|`key`|PEM-encoded, unencrypted client private key identifying this zrepl daemon toward the remote server|
.. NOTE::
zrepl uses Go's `crypto/tls` and `crypto/x509` packages and leaves all but the required fields in `tls.Config` at their default values.
In case of a security defect in these packages, zrepl has to be rebuilt because Go binaries are statically linked.

View File

@ -0,0 +1,101 @@
Mapping & Filter Syntax
=======================
For various job types, a filesystem `mapping` or `filter` needs to be
specified.
Both have in common that they take a filesystem path (in the ZFS filesystem hierarchy)as parameters and return something.
Mappings return a *target filesystem* and filters return a *filter result*.
The pattern syntax is the same for mappings and filters and is documented in the following section.
Common Pattern Syntax
---------------------
A mapping / filter is specified as a **YAML dictionary** with patterns as keys and
results as values.<br />
The following rules determine which result is chosen for a given filesystem path:
* More specific path patterns win over less specific ones
* Non-wildcard patterns (full path patterns) win over *subtree wildcards* (`<` at end of pattern)
The **subtree wildcard** `<` means "*the dataset left of `<` and all its children*".
Example
~~~~~~~
::
# Rule number and its pattern
1: tank< # tank and all its children
2: tank/foo/bar # full path pattern (no wildcard)
3: tank/foo< # tank/foo and all its children
# Which rule applies to given path?
tank/foo/bar/loo => 3
tank/bar => 1
tank/foo/bar => 2
zroot => NO MATCH
tank/var/log => 1
Mappings
--------
Mappings map a *source filesystem path* to a *target filesystem path*.
Per pattern, either a target filesystem path or `"!"` is specified as a result.
* If no pattern matches, there exists no target filesystem (`NO MATCH`).
* If the result is a `"!"`, there exists no target filesystem (`NO MATCH`).
* If the pattern is a non-wildcard pattern, the source path is mapped to the target path on the right.
* If the pattern ends with a *subtree wildcard* (`<`), the source path is **prefix-trimmed** with the path specified left of `<`.
* Note: this means that only for *wildcard-only* patterns (pattern=`<` ) is the source path simply appended to the target path.
The example is from the {{< sampleconflink "localbackup/host1.yml" >}} example config.
::
jobs:
- name: mirror_local
type: local
mapping: {
"zroot/var/db<": "storage/backups/local/zroot/var/db",
"zroot/usr/home<": "storage/backups/local/zroot/usr/home",
"zroot/usr/home/paranoid": "!", #don't backup paranoid user
"zroot/poudriere/ports<": "!", #don't backup the ports trees
}
...
Results in the following mappings
::
zroot/var/db => storage/backups/local/zroot/var/db
zroot/var/db/a/child => storage/backups/local/zroot/var/db/a/child
zroot/usr/home => storage/backups/local/zroot/usr/home
zroot/usr/home/paranoid => NOT MAPPED
zroot/usr/home/bob => storage/backups/local/zroot/usr/home/bob
zroot/usr/src => NOT MAPPED
zroot/poudriere/ports/2017Q3 => NOT MAPPED
zroot/poudriere/ports/HEAD => NOT MAPPED
Filters
-------
Valid filter results: `ok` or `!`.
The example below show the source job from the [tutorial]({{< relref "tutorial/_index.md#configure-app-srv" >}}):
The client is allowed access to `zroot/var/db`, `zroot/usr/home` + children except `zroot/usr/home/paranoid`.
::
jobs:
- name: pull_backup
type: source
...
filesystems: {
"zroot/var/db": "ok",
"zroot/usr/home<": "ok",
"zroot/usr/home/paranoid": "!",
}
...

View File

@ -0,0 +1,61 @@
Miscellaneous
=============
Runtime Directories & UNIX Sockets
----------------------------------
zrepl daemon creates various UNIX sockets to allow communicating with it:
* the `stdinserver` transport connects to a socket named after `client_identity` parameter
* the `control` subcommand connects to a defined control socket
There is no further authentication on these sockets.
Therefore we have to make sure they can only be created and accessed by `zrepl daemon`.
In fact, `zrepl daemon` will not bind a socket to a path in a directory that is world-accessible.
The directories can be configured in the main configuration file:
::
global:
control:
sockpath: /var/run/zrepl/control
serve:
stdinserver:
sockdir: /var/run/zrepl/stdinserver
Durations & Intervals
---------------------
Interval & duration fields in job definitions, pruning configurations, etc. must match the following regex:
::
var durationStringRegex *regexp.Regexp = regexp.MustCompile(`^\s*(\d+)\s*(s|m|h|d|w)\s*$`)
// s = second, m = minute, h = hour, d = day, w = week (7 days)
Super-Verbose Job Debugging
---------------------------
You have probably landed here because you opened an issue on GitHub and some developer told you to do this...
So just read the annotated comments ;)
::
job:
- name: ...
...
# JOB DEBUGGING OPTIONS
# should be equal for all job types, but each job implements the debugging itself
debug:
conn: # debug the io.ReadWriteCloser connection
read_dump: /tmp/connlog_read # dump results of Read() invocations to this file
write_dump: /tmp/connlog_write # dump results of Write() invocations to this file
rpc: # debug the RPC protocol implementation
log: true # log output from rpc layer to the job log
.. ATTENTION::
Connection dumps will almost certainly contain your or other's private data. Do not share it in a bug report.

View File

@ -0,0 +1,59 @@
Snapshot Pruning
================
In zrepl, *pruning* means *destroying snapshots by some policy*.
A *pruning policy* takes a list of snapshots and - for each snapshot - decides whether it should be kept or destroyed.
The job context defines which snapshots are even considered for pruning, for example through the `snapshot_prefix` variable.
Check the [job definition]({{< relref "configuration/jobs.md">}}) for details.
Currently, the retention grid is the only supported pruning policy.
Retention Grid
--------------
::
jobs:
- name: pull_app-srv
...
prune:
policy: grid
grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d
│ │
└─ one hour interval
└─ 24 adjacent one-hour intervals
The retention grid can be thought of as a time-based sieve:
The `grid` field specifies a list of adjacent time intervals:
the left edge of the leftmost (first) interval is the `creation` date of the youngest snapshot.
All intervals to its right describe time intervals further in the past.
Each interval carries a maximum number of snapshots to keep.
It is secified via `(keep=N)`, where `N` is either `all` (all snapshots are kept) or a positive integer.
The default value is **1**.
The following procedure happens during pruning:
1. The list of snapshots eligible for pruning is sorted by `creation`
1. The left edge of the first interval is aligned to the `creation` date of the youngest snapshot
1. A list of buckets is created, one for each interval
1. The list of snapshots is split up into the buckets.
1. For each bucket
1. the contained snapshot list is sorted by creation.
1. snapshots from the list, oldest first, are destroyed until the specified `keep` count is reached.
1. all remaining snapshots on the list are kept.
.. ATTENTION::
The configuration of the first interval (`1x1h(keep=all)` in the example) determines the **maximum allowable replication lag** between source and destination.
After the first interval, source and destination likely have different retention settings.
This means source and destination may prune different snapshots, prohibiting incremental replication froms snapshots that are not in the first interval.
**Always** configure the first interval to **`1x?(keep=all)`**, substituting `?` with the maximum time replication may fail due to downtimes, maintenance, connectivity issues, etc.
After outages longer than `?` you may be required to perform **full replication** again.

View File

@ -0,0 +1,103 @@
.. highlight:: bash
Transports
==========
A transport provides an authenticated [`io.ReadWriteCloser`](https://golang.org/pkg/io/#ReadWriteCloser) to the RPC layer.
(An `io.ReadWriteCloser` is essentially a bidirectional reliable communication channel.)
Currently, only the `ssh+stdinserver` transport is supported.
`ssh+stdinserver`
-----------------
The way the `ssh+stdinserver` transport works is inspired by [git shell](https://git-scm.com/docs/git-shell) and [Borg Backup](https://borgbackup.readthedocs.io/en/stable/deployment.html).
It is implemented in the Go package `github.com/zrepl/zrepl/sshbytestream`.
The config excerpts are taken from the [tutorial]({{< relref "tutorial/_index.md" >}}) which you should complete before reading further.
`serve`
~~~~~~~
::
jobs:
- name: pull_backup
type: source
serve:
type: stdinserver
client_identity: backup-srv.example.com
...
The serving job opens a UNIX socket named after `client_identity` in the runtime directory, e.g. `/var/run/zrepl/stdinserver/backup-srv.example.com`.
On the same machine, the :code:`zrepl stdinserver $client_identity` command connects to that socket.
For example, `zrepl stdinserver backup-srv.example.com` connects to the UNIX socket `/var/run/zrepl/stdinserver/backup-srv.example.com`.
It then passes its stdin and stdout file descriptors to the zrepl daemon via *cmsg(3)*.
zrepl daemon in turn combines them into an `io.ReadWriteCloser`:
a `Write()` turns into a write to stdout, a `Read()` turns into a read from stdin.
Interactive use of the `stdinserver` subcommand does not make much sense.
However, we can force its execution when a user with a particular SSH pubkey connects via SSH.
This can be achieved with an entry in the `authorized_keys` file of the serving zrepl daemon.
::
# for OpenSSH >= 7.2
command="zrepl stdinserver CLIENT_IDENTITY",restrict CLIENT_SSH_KEY
# for older OpenSSH versions
command="zrepl stdinserver CLIENT_IDENTITY",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY
* CLIENT_IDENTITY is substituted with `backup-srv.example.com` in our example
* CLIENT_SSH_KEY is substituted with the public part of the SSH keypair specified in the `connect` directive on the connecting host.
.. NOTE::
You may need to adjust the `PermitRootLogin` option in `/etc/ssh/sshd_config` to `forced-commands-only` or higher for this to work.
Refer to sshd_config(5) for details.
To recap, this is of how client authentication works with the `ssh+stdinserver` transport:
* Connections to the `client_identity` UNIX socket are blindly trusted by zrepl daemon.
* Thus, the runtime directory must be private to the zrepl user (checked by zrepl daemon)
* The admin of the host with the serving zrepl daemon controls the `authorized_keys` file.
* Thus, the administrator controls the mapping `PUBKEY -> CLIENT_IDENTITY`.
`connect`
~~~~~~~~~
::
jobs:
- name: pull_app-srv
type: pull
connect:
type: ssh+stdinserver
host: app-srv.example.com
user: root
port: 22
identity_file: /etc/zrepl/ssh/identity
options: # optional
- "Compression=on"
The connecting zrepl daemon
1. Creates a pipe
1. Forks
1. In the forked process
1. Replaces forked stdin and stdout with the corresponding pipe ends
1. Executes the `ssh` binary found in `$PATH`.
1. The identity file (`-i`) is set to `$identity_file`.
1. The remote user, host and port correspond to those configured.
1. Further options can be specified using the `options` field, which appends each entry in the list to the command line using `-o $entry`.
1. Wraps the pipe ends in an `io.ReadWriteCloser` and uses it for RPC.
As discussed in the section above, the connecting zrepl daemon expects that `zrepl stdinserver $client_identity` is executed automatically via an `authorized_keys` file entry.
.. NOTE::
The environment variables of the underlying SSH process are cleared. `$SSH_AUTH_SOCK` will not be available.
It is suggested to create a separate, unencrypted SSH key solely for that purpose.

58
docs/implementation.rst Normal file
View File

@ -0,0 +1,58 @@
Implementation Overview
=======================
.. WARNING::
Incomplete / under construction
The following design aspects may convince you that `zrepl` is superior to a hacked-together shell script solution.
Testability & Performance
-------------------------
zrepl is written in Go, a real programming language with type safety,
reasonable performance, testing infrastructure and an (opinionated) idea of
software engineering.
* key parts & algorithms of zrepl are covered by unit tests (work in progress)
* zrepl is noticably faster than comparable shell scripts
RPC protocol
------------
While it is tempting to just issue a few `ssh remote 'zfs send ...' | zfs recv`, this has a number of drawbacks:
* The snapshot streams need to be compatible.
* Communication is still unidirectional. Thus, you will most likely
* either not take advantage of features such as *compressed send & recv*
* or issue additional `ssh` commands in advance to figure out what features are supported on the other side.
* Advanced logic in shell scripts is ugly to read, poorly testable and a pain to maintain.
zrepl takes a different approach:
* Define an RPC protocol.
* Establish an encrypted, authenticated, bidirectional communication channel...
* ... with zrepl running at both ends of it.
This has several obvious benefits:
* No blank root shell access is given to the other side.
* Instead, an *authenticated* peer can *request* filesystem lists, snapshot streams, etc.
* Requests are then checked against job-specific ACLs, limiting a client to the filesystems it is actually allowed to replicate.
* The {{< zrepl-transport "transport mechanism" >}} is decoupled from the remaining logic, keeping it extensible.
Protocol Implementation
~~~~~~~~~~~~~~~~~~~~~~~
zrepl implements its own RPC protocol.
This is mostly due to the fact that existing solutions do not provide efficient means to transport large amounts of data.
Package [`github.com/zrepl/zrepl/rpc`](https://github.com/zrepl/zrepl/tree/master/rpc) builds a special-case handling around returning an `io.Reader` as part of a unary RPC call.
Measurements show only a single memory-to-memory copy of a snapshot stream is made using `github.com/zrepl/zrepl/rpc`, and there is still potential for further optimizations.
Logging & Transparency
----------------------
zrepl comes with [rich, structured and configurable logging]({{< relref "configuration/logging.md" >}}), allowing administators to understand what the software is actually doing.

View File

@ -3,18 +3,68 @@
You can adapt this file completely to your liking, but it should at least You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive. contain the root `toctree` directive.
Welcome to zrepl's documentation! zrepl - ZFS replication
================================= -----------------------
.. ATTENTION::
zrepl as well as this documentation is still under active development.
It is neither feature complete nor is there a stability guarantee on the configuration format.
Use & test at your own risk ;)
Getting started
~~~~~~~~~~~~~~~
The [5 minute tutorial setup]({{< relref "tutorial/_index.md" >}}) gives you a first impression.
Main Features
~~~~~~~~~~~~~
* Filesystem Replication
* [x] Local & Remote
* [x] Pull mode
* [ ] Push mode
* [x] Access control checks when pulling datasets
* [x] [Flexible mapping]({{< ref "configuration/map_filter_syntax.md" >}}) rules
* [x] Bookmarks support
* [ ] Feature-negotiation for
* Resumable `send & receive`
* Compressed `send & receive`
* Raw encrypted `send & receive` (as soon as it is available)
* Automatic snapshot creation
* [x] Ensure fixed time interval between snapshots
* Automatic snapshot [pruning]({{< ref "configuration/prune.md" >}})
* [x] Age-based fading (grandfathering scheme)
* Flexible, detailed & structured [logging]({{< ref "configuration/logging.md" >}})
* [x] `human`, `logfmt` and `json` formatting
* [x] stdout, syslog and TCP (+TLS client auth) outlets
* Maintainable implementation in Go
* [x] Cross platform
* [x] Type safe & testable code
Contributing
~~~~~~~~~~~~
We are happy about any help we can get!
* Explore the codebase
* These docs live in the `docs/` subdirectory
* Document any non-obvious / confusing / plain broken behavior you encounter when setting up zrepl for the first time
* Check the *Issues* and *Projects* sections for things to do
{{% panel header="<i class='fa fa-github'></i> Development Workflow"%}}
[The <i class='fa fa-github'></i> GitHub repository](https://github.com/zrepl/zrepl) is where all development happens.<br />
Make sure to read the [Developer Documentation section](https://github.com/zrepl/zrepl) and open new issues or pull requests there.
{{% /panel %}}
Table of Contents
~~~~~~~~~~~~~~~~~
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: Contents: :caption: Contents:
tutorial
installation
Indices and tables configuration
================== implementation
pr
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

88
docs/installation.rst Normal file
View File

@ -0,0 +1,88 @@
Installation
============
.. TIP::
Note: check out the [tutorial]({{< relref "tutorial/_index.md" >}}) if you want a first impression of zrepl.
User Privileges
---------------
It is possible to run zrepl as an unprivileged user in combination with
[ZFS delegation](https://www.freebsd.org/doc/handbook/zfs-zfs-allow.html).
Also, there is the possibility to run it in a jail on FreeBSD by delegating a dataset to the jail.
However, until we get around documenting those setups, you will have to run zrepl as root or experiment yourself :)
Installation
------------
zrepl is currently not packaged on any operating system. Signed & versioned releases are planned but not available yet.
Check out the sources yourself, fetch dependencies using dep, compile and install to the zrepl user's `$PATH`.<br />
**Note**: if the zrepl binary is not in `$PATH`, you will have to adjust the examples in the [tutorial]({{< relref "tutorial/_index.md" >}}).
::
# NOTE: you may want to checkout & build as an unprivileged user
cd /root
git clone https://github.com/zrepl/zrepl.git
cd zrepl
dep ensure
go build -o zrepl
cp zrepl /usr/local/bin/zrepl
rehash
# see if it worked
zrepl help
Configuration Files
-------------------
zrepl searches for its main configuration file in the following locations (in that order):
* `/etc/zrepl/zrepl.yml`
* `/usr/local/etc/zrepl/zrepl.yml`
Alternatively, use CLI flags to specify a config location.
Copy a config from the [tutorial]({{< relref "tutorial/_index.md" >}}) or the `cmd/sampleconf` directory to one of these locations and customize it to your setup.
## Runtime Directories
Check the the [configuration documentation]({{< relref "configuration/misc.md#runtime-directories-unix-sockets" >}}) for more information.
For default settings, the following should to the trick.
```bash
mkdir -p /var/run/zrepl/stdinserver
chmod -R 0700 /var/run/zrepl
```
Running the Daemon
------------------
All actual work zrepl does is performed by a daemon process.
Logging is configurable via the config file. Please refer to the [logging documentation]({{< relref "configuration/logging.md" >}}).
::
zrepl daemon
There are no *rc(8)* or *systemd.service(5)* service definitions yet. Note the *daemon(8)* utility on FreeBSD.
.. ATTENTION::
Make sure to actually monitor the error level output of zrepl: some configuration errors will not make the daemon exit.<br />
Example: if the daemon cannot create the [stdinserver]({{< relref "configuration/transports.md#stdinserver" >}}) sockets
in the runtime directory, it will emit an error message but not exit because other tasks such as periodic snapshots & pruning are of equal importance.
Restarting
~~~~~~~~~~
The daemon handles SIGINT and SIGTERM for graceful shutdown.
Graceful shutdown means at worst that a job will not be rescheduled for the next interval.
The daemon exits as soon as all jobs have reported shut down.

5
docs/pr.rst Normal file
View File

@ -0,0 +1,5 @@
Talks & Presentations
=====================
* Talk at EuroBSDCon2017 FreeBSD DevSummit ([Slides](https://docs.google.com/presentation/d/1EmmeEvOXAWJHCVnOS9-TTsxswbcGKmeLWdY_6BH4w0Q/edit?usp=sharing), [Event](https://wiki.freebsd.org/DevSummit/201709))

183
docs/tutorial.rst Normal file
View File

@ -0,0 +1,183 @@
Tutorial
========
This tutorial shows how zrepl can be used to implement a ZFS-based pull backup.
We assume the following scenario:
* Production server `app-srv` with filesystems to back up:
* `zroot/var/db`
* `zroot/usr/home` and all its child filesystems
* **except** `zroot/usr/home/paranoid` belonging to a user doing backups themselves
* Backup server `backup-srv` with
* Filesystem `storage/zrepl/pull/app-srv` + children dedicated to backups of `app-srv`
Our backup solution should fulfill the following requirements:
* Periodically snapshot the filesystems on `app-srv` *every 10 minutes*
* Incrementally replicate these snapshots to `storage/zrepl/pull/app-srv/*` on `backup-srv`
* Keep only very few snapshots on `app-srv` to save disk space
* Keep a fading history (24 hourly, 30 daily, 6 monthly) of snapshots on `backup-srv`
Analysis
--------
We can model this situation as two jobs:
* A **source job** on `app-srv`
* Creates the snapshots
* Keeps a short history of snapshots to enable incremental replication to `backup-srv`
* Accepts connections from `backup-srv`
* A **pull job** on `backup-srv`
* Connects to the `zrepl daemon` process on `app-srv`
* Pulls the snapshots to `storage/zrepl/pull/app-srv/*`
* Fades out snapshots in `storage/zrepl/pull/app-srv/*` as they age
Why doesn't the **pull job** create the snapshots before pulling?
As is the case with all distributed systems, the link between `app-srv` and `backup-srv` might be down for an hour or two.
We do not want to sacrifice our required backup resolution of 10 minute intervals for a temporary connection outage.
When the link comes up again, `backup-srv` will happily catch up the 12 snapshots taken by `app-srv` in the meantime, without
a gap in our backup history.
Install zrepl
-------------
Follow the [OS-specific installation instructions]({{< relref "install/_index.md" >}}) and come back here.
Configure `backup-srv`
----------------------
We define a **pull job** named `pull_app-srv` in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}} ):::
jobs:
- name: pull_app-srv
type: pull
connect:
type: ssh+stdinserver
host: app-srv.example.com
user: root
port: 22
identity_file: /etc/zrepl/ssh/identity
interval: 10m
mapping: {
"<":"storage/zrepl/pull/app-srv"
}
initial_repl_policy: most_recent
snapshot_prefix: zrepl_pull_backup_
prune:
policy: grid
grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d
The `connect` section instructs the zrepl daemon to use the `stdinserver` transport:
`backup-srv` will connect to the specified SSH server and expect `zrepl stdinserver CLIENT_IDENTITY` instead of the shell on the other side.
It uses the private key specified at `connect.identity_file` which we still need to create:::
cd /etc/zrepl
mkdir -p ssh
chmod 0700 ssh
ssh-keygen -t ed25519 -N '' -f /etc/zrepl/ssh/identity
Note that most use cases do not benefit from separate keypairs per remote endpoint.
Thus, it is sufficient to create one keypair and use it for all `connect` directives on one host.
Learn more about [stdinserver]({{< relref "configuration/transports.md#ssh-stdinserver" >}}) and the [**pull job** format]({{< relref "configuration/jobs.md#pull" >}}).
Configure `app-srv`
-------------------
We define a corresponding **source job** named `pull_backup` in the [main configuration file]({{< relref "install/_index.md#configuration-files" >}})
`zrepl.yml`:::
jobs:
- name: pull_backup
type: source
serve:
type: stdinserver
client_identity: backup-srv.example.com
filesystems: {
"zroot/var/db": "ok",
"zroot/usr/home<": "ok",
"zroot/usr/home/paranoid": "!",
}
snapshot_prefix: zrepl_pull_backup_
interval: 10m
prune:
policy: grid
grid: 1x1d(keep=all)
The `serve` section corresponds to the `connect` section in the configuration of `backup-srv`.
We now want to authenticate `backup-srv` before allowing it to pull data.
This is done by limiting SSH connections from `backup-srv` to execute the `stdinserver` subcommand.
Open `/root/.ssh/authorized_keys` and add either of the the following lines.::
# for OpenSSH >= 7.2
command="zrepl stdinserver backup-srv.example.com",restrict CLIENT_SSH_KEY
# for older OpenSSH versions
command="zrepl stdinserver backup-srv.example.com",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY
.. ATTENTION::
Replace CLIENT_SSH_KEY with the contents of `/etc/zrepl/ssh/identity.pub` from `app-srv`.
Mind the trailing `.pub` in the filename.
The entries **must** be on a single line, including the replaced CLIENT_SSH_KEY.
.. HINT::
You may need to adjust the `PermitRootLogin` option in `/etc/ssh/sshd_config` to `forced-commands-only` or higher for this to work.
Refer to sshd_config(5) for details.
The argument `backup-srv.example.com` is the client identity of `backup-srv` as defined in `jobs.serve.client_identity`.
Again, you both [stdinserver]({{< relref "configuration/transports.md#ssh-stdinserver" >}}) and the [**source job** format]({{< relref "configuration/jobs.md#source" >}}) are documented.
Apply Configuration Changes
---------------------------
We need to restart the zrepl daemon on **both** `app-srv` and `backup-srv`.
This is [OS-specific]({{< relref "install/_index.md#restarting" >}}).
Watch it Work
-------------
A common setup is to `watch` the log output and `zfs list` of snapshots on both machines.
If you like tmux, here is a handy script that works on FreeBSD:::
pkg install gnu-watch tmux
tmux new-window
tmux split-window "tail -f /var/log/zrepl.log"
tmux split-window "gnu-watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
tmux select-layout tiled
The Linux equivalent might look like this::
# make sure tmux is installed & let's assume you use systemd + journald
tmux new-window
tmux split-window "journalctl -f -u zrepl.service"
tmux split-window "watch 'zfs list -t snapshot -o name,creation -s creation | grep zrepl_pull_backup_'"
tmux select-layout tiled
Summary
-------
Congratulations, you have a working pull backup. Where to go next?
* Read more about [configuration format, options & job types]({{< relref "configuration/_index.md" >}})
* Learn about [implementation details]({{<relref "impl/_index.md" >}}) of zrepl.