pre- and post-snapshot hooks

* stack-based execution model, documented in documentation * circbuf for capturing hook output * built-in hooks for postgres and mysql * refactor docs, too much info on the jobs page, too difficult to discover snapshotting & hooks Co-authored-by: Ross Williams <ross@ross-williams.net> Co-authored-by: Christian Schwarz <me@cschwarz.com> fixes #74
2025-08-15 09:32:25 +02:00 · 2019-07-26 19:12:21 +00:00
parent 00434f4ac9
commit 729c83ee72
39 changed files with 2580 additions and 279 deletions
--- a/docs/configuration/jobs.rst
+++ b/docs/configuration/jobs.rst
@ -1,222 +1,9 @@
 .. include:: ../global.rst.inc

-.. |serve-transport| replace:: :ref:`serve specification<transport>`
-.. |connect-transport| replace:: :ref:`connect specification<transport>`
-.. |snapshotting-spec| replace:: :ref:`snapshotting specification <job-snapshotting-spec>`
-.. |pruning-spec| replace:: :ref:`pruning specification <prune>`
-.. |filter-spec| replace:: :ref:`filter specification<pattern-filter>`
-
 .. _job:

-Job Types & Replication
-=======================
-
-.. _job-overview:
-
-Overview & Terminology
----------------------
-
-A *job* is the unit of activity tracked by the zrepl daemon.
-Every job has a unique ``name``, a ``type`` and type-dependent fields which are documented on this page.
-
-Replication always happens between a pair of jobs: one is the **active side**, and one the **passive side**.
-The active side executes the replication logic whereas the passive side responds to requests after checking the active side's permissions.
-For communication, the active side connects to the passive side using a :ref:`transport <transport>` and starts issuing remote procedure calls (RPCs).
-
-The following table shows how different job types can be combined to achieve both push and pull mode setups:
-
-+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
-| Setup name            | active side  | passive side                     | use case                                                                           |
-+=======================+==============+==================================+====================================================================================+
-| Push mode             | ``push``     | ``sink``                         | * Laptop backup                                                                    |
-|                       |              |                                  | * NAS behind NAT to offsite                                                        |
-+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
-| Pull mode             | ``pull``     | ``source``                       | * Central backup-server for many nodes                                             |
-|                       |              |                                  | * Remote server to NAS behind NAT                                                  |
-+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
-| Local replication     | | ``push`` + ``sink`` in one config             | * Backup FreeBSD boot pool                                                         |
-|                       | | with :ref:`local transport <transport-local>` |                                                                                    |
-+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
-| Snap & prune-only     | ``snap``     | N/A                              | * | Snapshots & pruning but no replication                                         |
-|                       |              |                                  |   | required                                                                       |
-|                       |              |                                  | * Workaround for :ref:`source-side pruning <prune-workaround-source-side-pruning>` |
-+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
-
-How the Active Side Works
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The active side (:ref:`push <job-push>` and :ref:`pull <job-pull>` job) executes the replication and pruning logic:
-
-* Wakeup because of finished snapshotting (``push`` job) or pull interval ticker (``pull`` job).
-* Connect to the corresponding passive side using a :ref:`transport <transport>` and instantiate an RPC client.
-* Replicate data from the sending to the receiving side.
-* Prune on sender & receiver.
-
-.. TIP::
-  The progress of the active side can be watched live using the ``zrepl status`` subcommand.
-
-How the Passive Side Works
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The passive side (:ref:`sink <job-sink>` and :ref:`source <job-source>`) waits for connections from the corresponding active side,
-using the transport listener type specified in the ``serve`` field of the job configuration.
-Each transport listener provides a client's identity to the passive side job.
-It uses the client identity for access control:
-
-* The ``sink`` job maps requests from different client identities to their respective sub-filesystem tree ``root_fs/${client_identity}``.
-* The ``source`` job has a whitelist of client identities that are allowed pull access.
-
-.. TIP::
-   The implementation of the ``sink`` job requires that the connecting client identities be a valid ZFS filesystem name components.
-
-How Replication Works
-~~~~~~~~~~~~~~~~~~~~~
-
-One of the major design goals of the replication module is to avoid any duplication of the nontrivial logic.
-As such, the code works on abstract senders and receiver **endpoints**, where typically one will be implemented by a local program object and the other is an RPC client instance.
-Regardless of push- or pull-style setup, the logic executes on the active side, i.e. in the ``push`` or ``pull`` job.
-
-The following steps take place during replication and can be monitored using the ``zrepl status`` subcommand:
-
-* Plan the replication:
-
-  * Compare sender and receiver filesystem snapshots
-  * Build the **replication plan**
-
-    * Per filesystem, compute a diff between sender and receiver snapshots
-    * Build a list of replication steps
-
-      * If possible, use incremental sends (``zfs send -i``)
-      * Otherwise, use full send of most recent snapshot on sender
-      * Give up on filesystems that cannot be replicated without data loss
-
-  * Retry on errors that are likely temporary (i.e. network failures).
-  * Give up on filesystems where a permanent error was received over RPC.
-
-* Execute the plan
-
-  * Perform replication steps in the following order:
-    Among all filesystems with pending replication steps, pick the filesystem whose next replication step's snapshot is the oldest.
-  * Create placeholder filesystems on the receiving side to mirror the dataset paths on the sender to ``root_fs/${client_identity}``.
-  * After a successful replication step, update the replication cursor bookmark (see below).
-   
-The idea behind the execution order of replication steps is that if the sender snapshots all filesystems simultaneously at fixed intervals, the receiver will have all filesystems snapshotted at time ``T1`` before the first snapshot at ``T2 = T1 + $interval`` is replicated.
-
-.. _replication-cursor-bookmark:
-
-The **replication cursor bookmark** ``#zrepl_replication_cursor`` is kept per filesystem on the sending side of a replication setup:
-It is a bookmark of the most recent successfully replicated snapshot to the receiving side.
-It is is used by the :ref:`not_replicated <prune-keep-not-replicated>` keep rule to identify all snapshots that have not yet been replicated to the receiving side.
-Regardless of whether that keep rule is used, the bookmark ensures that replication can always continue incrementally.
-Note that there is only one cursor bookmark per filesystem, which prohibits multiple jobs to replicate the same filesystem (:ref:`see below<jobs-multiple-jobs>`).
-
-.. _replication-placeholder-property:
-
-**Placeholder filesystems** on the receiving side are regular ZFS filesystems with the placeholder property ``zrepl:placeholder=on``.
-Placeholders allow the receiving side to mirror the sender's ZFS dataset hierachy without replicating every filesystem at every intermediary dataset path component.
-Consider the following example: ``S/H/J`` shall be replicated to ``R/sink/job/S/H/J``, but neither ``S/H`` nor ``S`` shall be replicated.
-ZFS requires the existence of ``R/sink/job/S`` and ``R/sink/job/S/H`` in order to receive into ``R/sink/job/S/H/J``.
-Thus, zrepl creates the parent filesystems as placeholders on the receiving side.
-If at some point ``S/H`` and ``S`` shall be replicated, the receiving side invalidates the placeholder flag automatically.
-The ``zrepl test placeholder`` command can be used to check whether a filesystem is a placeholder.
-
-.. ATTENTION::
-
-    Currently, zrepl does not replicate filesystem properties.
-    Whe receiving a filesystem, it is never mounted (`-u` flag)  and `mountpoint=none` is set.
-    This is temporary and being worked on :issue:`24`.
-
-
-.. _job-snapshotting-spec:
-
-Taking Snaphots
---------------
-
-The ``push``, ``source`` and ``snap`` jobs can automatically take periodic snapshots of the filesystems matched by the ``filesystems`` filter field.
-The snapshot names are composed of a user-defined prefix followed by a UTC date formatted like ``20060102_150405_000``.
-We use UTC because it will avoid name conflicts when switching time zones or between summer and winter time.
-
-For ``push`` jobs, replication is automatically triggered after all filesystems have been snapshotted.
-
-::
-
-    jobs:
-    - type: push
-      filesystems: {
-        "<": true,
-        "tmp": false
-      }
-      snapshotting:
-        type: periodic
-        prefix: zrepl_
-        interval: 10m
-      ...
-
-There is also a ``manual`` snapshotting type, which covers the following use cases:
-
-* Existing infrastructure for automatic snapshots: you only want to use this zrepl job for replication.
-* Run scripts before and after taking snapshots (like locking database tables).
-  We are working on better integration for this use case: see :issue:`74`.
-* Handling snapshotting through a separate ``snap`` job.
-
-Note that you will have to trigger replication manually using the ``zrepl signal wakeup JOB`` subcommand in that case.
-
-::
-
-   jobs:
-   - type: push
-     filesystems: {
-       "<": true,
-       "tmp": false
-     }
-     snapshotting:
-       type: manual
-     ...
-
-.. _jobs-multiple-jobs:
-
-Multiple Jobs & More than 2 Machines
------------------------------------
-
-.. ATTENTION::
-
-  When using multiple jobs across single or multiple machines, the following rules are critical to avoid race conditions & data loss:
-
-  1. The sets of ZFS filesystems matched by the ``filesystems`` filter fields must be disjoint across all jobs configured on a machine.
-  2. The ZFS filesystem subtrees of jobs with ``root_fs`` must be disjoint.
-  3. Across all zrepl instances on all machines in the replication domain, there must be a 1:1 correspondence between active and passive jobs.
-
-  Explanations & exceptions to above rules are detailed below.
-
-If you would like to see improvements to multi-job setups, please `open an issue on GitHub <https://github.com/zrepl/zrepl/issues/new>`_.
-
-No Overlapping
-~~~~~~~~~~~~~~
-
-Jobs run independently of each other.
-If two jobs match the same filesystem with their ``filesystems`` filter, they will operate on that filesystem independently and potentially in parallel.
-For example, if job A prunes snapshots that job B is planning to replicate, the replication will fail because B asssumed the snapshot to still be present.
-More subtle race conditions can occur with the :ref:`replication cursor bookmark <replication-cursor-bookmark>`, which currently only exists once per filesystem.
-
-N push jobs to 1 sink
-~~~~~~~~~~~~~~~~~~~~~
-
-The :ref:`sink job <job-sink>` namespaces by client identity.
-It is thus safe to push to one sink job with different client identities.
-If the push jobs have the same client identity, the filesystems matched by the push jobs must be disjoint to avoid races.
-
-N pull jobs from 1 source
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Multiple pull jobs pulling from the same source have potential for race conditions during pruning:
-each pull job prunes the source side independently, causing replication-prune and prune-prune races.
-
-There is currently no way for a pull job to filter which snapshots it should attempt to replicate.
-Thus, it is not possibe to just manually assert that the prune rules of all pull jobs are disjoint to avoid replication-prune and prune-prune races.
-
-
------------------------------------------------------------------------------
-
+Job Types in Detail
+===================

 .. _job-push:

--- a/docs/configuration/misc.rst
+++ b/docs/configuration/misc.rst
@ -6,16 +6,16 @@ Miscellaneous
 Runtime Directories & UNIX Sockets
 ----------------------------------

-zrepl daemon creates various UNIX sockets to allow communicating with it:
+The zrepl daemon needs to open various UNIX sockets in a runtime directory:

-* the :ref:`transport-ssh+stdinserver` transport connects to a socket named after ``client_identity`` parameter
-* the ``control`` CLI subcommand connects to a defined control socket
+* a ``control`` socket that the CLI commands use to interact with the daemon
+* the :ref:`transport-ssh+stdinserver` listener opens one socket per configured client, named after ``client_identity`` parameter

-There is no further authentication on these sockets.
-Therefore we have to make sure they can only be created and accessed by ``zrepl daemon``.
-In fact, ``zrepl daemon`` will not bind a socket to a path in a directory that is world-accessible.
+There is no authentication on these sockets except the UNIX permissions.
+The zrepl daemon will refuse to bind any of the above sockets in a directory that is world-accessible.

-The directories can be configured in the main configuration file, the defaults are provided below:
+The following sections of the ``global`` config shows the default paths.
+The shell script below shows how the default runtime directory can be created.

 ::

@ -27,6 +27,12 @@ The directories can be configured in the main configuration file, the defaults a
          sockdir: /var/run/zrepl/stdinserver


+::
+
+    mkdir -p /var/run/zrepl/stdinserver
+    chmod -R 0700 /var/run/zrepl
+
+
 Durations & Intervals
 ---------------------

--- a/docs/configuration/overview.rst
+++ b/docs/configuration/overview.rst
@ -0,0 +1,191 @@
+
+Overview & Terminology
+======================
+
+All work zrepl does is performed by the zrepl daemon which is configured in a single YAML configuration file loaded on startup.
+The following paths are considered:
+
+* If set, the location specified via the global ``--config`` flag
+* ``/etc/zrepl/zrepl.yml``
+* ``/usr/local/etc/zrepl/zrepl.yml``
+
+The ``zrepl configcheck`` subcommand can be used to validate the configuration.
+The command will output nothing and exit with zero status code if the configuration is valid.
+The error messages vary in quality and usefulness: please report confusing config errors to the tracking :issue:`155`.
+Full example configs such as in the :ref:`tutorial` or the :sampleconf:`/` directory might also be helpful.
+However, copy-pasting examples is no substitute for reading documentation!
+
+Config File Structure
+---------------------
+
+.. code-block:: yaml
+
+   global: ...
+   jobs:
+   - name: backup
+     type: push
+   - ...
+
+zrepl is confgured using a single YAML configuration file with two main sections: ``global`` and ``jobs``.
+The ``global`` section is filled with sensible defaults and is covered later in this chapter.
+The ``jobs`` section is a list of jobs which we are goind to explain now.
+
+.. _job-overview:
+
+Jobs \& How They Work Together
+------------------------------
+
+A *job* is the unit of activity tracked by the zrepl daemon.
+The ``type`` of a job determines its role in a replication setup and in snapshot management.
+Jobs are identified by their ``name``, both in log files and the ``zrepl status`` command.
+
+Replication always happens between a pair of jobs: one is the **active side**, and one the **passive side**.
+The active side connects to the passive side using a :ref:`transport <transport>` and starts executing the replication logic.
+The passive side responds to requests from the active side after checking its persmissions.
+
+The following table shows how different job types can be combined to achieve **both push and pull mode setups**.
+Note that snapshot-creation denoted by "(snap)" is orthogonal to whether a job is active or passive.
+
+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
+| Setup name            | active side  | passive side                     | use case                                                                           |
+=======================+==============+==================================+====================================================================================+
+| Push mode             | ``push``     | ``sink``                         | * Laptop backup                                                                    |
+|                       | (snap)       |                                  | * NAS behind NAT to offsite                                                        |
+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
+| Pull mode             | ``pull``     | ``source``                       | * Central backup-server for many nodes                                             |
+|                       |              | (snap)                           | * Remote server to NAS behind NAT                                                  |
+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
+| Local replication     | | ``push`` + ``sink`` in one config             | * Backup FreeBSD boot pool                                                         |
+|                       | | with :ref:`local transport <transport-local>` |                                                                                    |
+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
+| Snap & prune-only     | ``snap``     | N/A                              | * | Snapshots & pruning but no replication                                         |
+|                       | (snap)       |                                  |   | required                                                                       |
+|                       |              |                                  | * Workaround for :ref:`source-side pruning <prune-workaround-source-side-pruning>` |
+-----------------------+--------------+----------------------------------+------------------------------------------------------------------------------------+
+
+How the Active Side Works
+-------------------------
+
+The active side (:ref:`push <job-push>` and :ref:`pull <job-pull>` job) executes the replication and pruning logic:
+
+* Wakeup because of finished snapshotting (``push`` job) or pull interval ticker (``pull`` job).
+* Connect to the corresponding passive side using a :ref:`transport <transport>` and instantiate an RPC client.
+* Replicate data from the sending to the receiving side (see below).
+* Prune on sender & receiver.
+
+.. TIP::
+  The progress of the active side can be watched live using the ``zrepl status`` subcommand.
+
+How the Passive Side Works
+--------------------------
+
+The passive side (:ref:`sink <job-sink>` and :ref:`source <job-source>`) waits for connections from the corresponding active side,
+using the transport listener type specified in the ``serve`` field of the job configuration.
+Each transport listener provides a client's identity to the passive side job.
+It uses the client identity for access control:
+
+* The ``sink`` job maps requests from different client identities to their respective sub-filesystem tree ``root_fs/${client_identity}``.
+* The ``source`` job has a whitelist of client identities that are allowed pull access.
+
+.. TIP::
+   The implementation of the ``sink`` job requires that the connecting client identities be a valid ZFS filesystem name components.
+
+How Replication Works
+---------------------
+
+One of the major design goals of the replication module is to avoid any duplication of the nontrivial logic.
+As such, the code works on abstract senders and receiver **endpoints**, where typically one will be implemented by a local program object and the other is an RPC client instance.
+Regardless of push- or pull-style setup, the logic executes on the active side, i.e. in the ``push`` or ``pull`` job.
+
+The following steps take place during replication and can be monitored using the ``zrepl status`` subcommand:
+
+* Plan the replication:
+
+  * Compare sender and receiver filesystem snapshots
+  * Build the **replication plan**
+
+    * Per filesystem, compute a diff between sender and receiver snapshots
+    * Build a list of replication steps
+
+      * If possible, use incremental sends (``zfs send -i``)
+      * Otherwise, use full send of most recent snapshot on sender
+      * Give up on filesystems that cannot be replicated without data loss
+
+  * Retry on errors that are likely temporary (i.e. network failures).
+  * Give up on filesystems where a permanent error was received over RPC.
+
+* Execute the plan
+
+  * Perform replication steps in the following order:
+    Among all filesystems with pending replication steps, pick the filesystem whose next replication step's snapshot is the oldest.
+  * Create placeholder filesystems on the receiving side to mirror the dataset paths on the sender to ``root_fs/${client_identity}``.
+  * After a successful replication step, update the replication cursor bookmark (see below).
+   
+The idea behind the execution order of replication steps is that if the sender snapshots all filesystems simultaneously at fixed intervals, the receiver will have all filesystems snapshotted at time ``T1`` before the first snapshot at ``T2 = T1 + $interval`` is replicated.
+
+.. _replication-cursor-bookmark:
+
+The **replication cursor bookmark** ``#zrepl_replication_cursor`` is kept per filesystem on the sending side of a replication setup:
+It is a bookmark of the most recent successfully replicated snapshot to the receiving side.
+It is is used by the :ref:`not_replicated <prune-keep-not-replicated>` keep rule to identify all snapshots that have not yet been replicated to the receiving side.
+Regardless of whether that keep rule is used, the bookmark ensures that replication can always continue incrementally.
+Note that there is only one cursor bookmark per filesystem, which prohibits multiple jobs to replicate the same filesystem (:ref:`see below<jobs-multiple-jobs>`).
+
+.. _replication-placeholder-property:
+
+**Placeholder filesystems** on the receiving side are regular ZFS filesystems with the placeholder property ``zrepl:placeholder=on``.
+Placeholders allow the receiving side to mirror the sender's ZFS dataset hierachy without replicating every filesystem at every intermediary dataset path component.
+Consider the following example: ``S/H/J`` shall be replicated to ``R/sink/job/S/H/J``, but neither ``S/H`` nor ``S`` shall be replicated.
+ZFS requires the existence of ``R/sink/job/S`` and ``R/sink/job/S/H`` in order to receive into ``R/sink/job/S/H/J``.
+Thus, zrepl creates the parent filesystems as placeholders on the receiving side.
+If at some point ``S/H`` and ``S`` shall be replicated, the receiving side invalidates the placeholder flag automatically.
+The ``zrepl test placeholder`` command can be used to check whether a filesystem is a placeholder.
+
+.. ATTENTION::
+
+    Currently, zrepl does not replicate filesystem properties.
+    Whe receiving a filesystem, it is never mounted (`-u` flag)  and `mountpoint=none` is set.
+    This is temporary and being worked on :issue:`24`.
+
+
+.. _jobs-multiple-jobs:
+
+Multiple Jobs & More than 2 Machines
+------------------------------------
+
+.. ATTENTION::
+
+  When using multiple jobs across single or multiple machines, the following rules are critical to avoid race conditions & data loss:
+
+  1. The sets of ZFS filesystems matched by the ``filesystems`` filter fields must be disjoint across all jobs configured on a machine.
+  2. The ZFS filesystem subtrees of jobs with ``root_fs`` must be disjoint.
+  3. Across all zrepl instances on all machines in the replication domain, there must be a 1:1 correspondence between active and passive jobs.
+
+  Explanations & exceptions to above rules are detailed below.
+
+If you would like to see improvements to multi-job setups, please `open an issue on GitHub <https://github.com/zrepl/zrepl/issues/new>`_.
+
+No Overlapping
+~~~~~~~~~~~~~~
+
+Jobs run independently of each other.
+If two jobs match the same filesystem with their ``filesystems`` filter, they will operate on that filesystem independently and potentially in parallel.
+For example, if job A prunes snapshots that job B is planning to replicate, the replication will fail because B asssumed the snapshot to still be present.
+More subtle race conditions can occur with the :ref:`replication cursor bookmark <replication-cursor-bookmark>`, which currently only exists once per filesystem.
+
+N push jobs to 1 sink
+~~~~~~~~~~~~~~~~~~~~~
+
+The :ref:`sink job <job-sink>` namespaces by client identity.
+It is thus safe to push to one sink job with different client identities.
+If the push jobs have the same client identity, the filesystems matched by the push jobs must be disjoint to avoid races.
+
+N pull jobs from 1 source
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Multiple pull jobs pulling from the same source have potential for race conditions during pruning:
+each pull job prunes the source side independently, causing replication-prune and prune-prune races.
+
+There is currently no way for a pull job to filter which snapshots it should attempt to replicate.
+Thus, it is not possibe to just manually assert that the prune rules of all pull jobs are disjoint to avoid replication-prune and prune-prune races.
+
--- a/docs/configuration/preface.rst
+++ b/docs/configuration/preface.rst
@ -1,39 +0,0 @@
-
-.. _configuration_preface:
-
-=======
-Preface
-=======
-
-----------------------
-Configuration File Path
-----------------------
-
-zrepl searches for its main configuration file in the following locations (in that order):
-
-* If set, the location specified via the global ``--config`` flag
-* ``/etc/zrepl/zrepl.yml``
-* ``/usr/local/etc/zrepl/zrepl.yml``
-
-The examples in the :ref:`tutorial` or the :sampleconf:`/` directory should provide a good starting point.
-
-------------------
-Runtime Directories
-------------------
-
-zrepl requires runtime directories for various UNIX sockets --- they are documented in the :ref:`config file<conf-runtime-directories>`.
-Your package maintainer / init script should take care of creating them.
-Alternatively, for default settings, the following should to the trick.
-
-::
-
-    mkdir -p /var/run/zrepl/stdinserver
-    chmod -R 0700 /var/run/zrepl
-
-
----------
-Validating
----------
-
-The config can be validated using the ``zrepl configcheck`` subcommand.
-
--- a/docs/configuration/snapshotting.rst
+++ b/docs/configuration/snapshotting.rst
@ -0,0 +1,192 @@
+.. include:: ../global.rst.inc
+
+.. _job-snapshotting-spec:
+
+Taking Snaphots
+===============
+
+The ``push``, ``source`` and ``snap`` jobs can automatically take periodic snapshots of the filesystems matched by the ``filesystems`` filter field.
+The snapshot names are composed of a user-defined prefix followed by a UTC date formatted like ``20060102_150405_000``.
+We use UTC because it will avoid name conflicts when switching time zones or between summer and winter time.
+
+For ``push`` jobs, replication is automatically triggered after all filesystems have been snapshotted.
+
+::
+
+    jobs:
+    - type: push
+      filesystems: {
+        "<": true,
+        "tmp": false
+      }
+      snapshotting:
+        type: periodic
+        prefix: zrepl_
+        interval: 10m
+        hooks: ...
+      ...
+
+There is also a ``manual`` snapshotting type, which covers the following use cases:
+
+* Existing infrastructure for automatic snapshots: you only want to use this zrepl job for replication.
+* Handling snapshotting through a separate ``snap`` job.
+
+Note that you will have to trigger replication manually using the ``zrepl signal wakeup JOB`` subcommand in that case.
+
+::
+
+   jobs:
+   - type: push
+     filesystems: {
+       "<": true,
+       "tmp": false
+     }
+     snapshotting:
+       type: manual
+     ...
+
+.. _job-snapshotting-hooks:
+
+Pre- and Post-Snapshot Hooks
+----------------------------
+
+Jobs with `periodic snapshots <job-snapshotting-spec_>`_ can run hooks before and/or after taking the snapshot specified in ``snapshotting.hooks``:
+Hooks are called per filesystem before and after the snapshot is taken (pre- and post-edge).
+Pre-edge invocations are in configuration order, post-edge invocations in reverse order, i.e. like a stack.
+If a pre-snapshot invocation fails, ``err_is_fatal=true`` cuts off subsequent hooks, does not take a snapshot, and only invokes post-edges corresponding to previous successful pre-edges.
+``err_is_fatal=false`` logs the failed pre-edge invocation but does not affect subsequent hooks nor snapshotting itself.
+Post-edges are only invoked for hooks whose pre-edges ran without error.
+Note that hook failures for one filesystem never affect other filesystems.
+
+The optional ``timeout`` parameter specifies a period after which zrepl will kill the hook process and report an error.
+The default is 30 seconds and may be specified in any units understood by `time.ParseDuration <https://golang.org/pkg/time/#ParseDuration>`_.
+
+The optional ``filesystems`` filter which limits the filesystems the hook runs for. This uses the same |filter-spec| as jobs.
+
+Most hook types take additional parameters, please refer to the respective subsections below.
+
+.. list-table::
+    :widths: 20 10 70
+    :header-rows: 1
+
+    * - Hook ``type``
+      - Details
+      - Description
+    * - ``command``
+      - :ref:`Details <job-hook-type-command>`
+      - Arbitrary pre- and post snapshot scripts.
+    * - ``postgres-checkpoint``
+      - :ref:`Details <job-hook-type-postgres-checkpoint>`
+      - Execute Postgres ``CHECKPOINT`` SQL command before snapshot.
+    * - ``mysql-lock-tables``
+      - :ref:`Details <job-hook-type-mysql-lock-tables>`
+      - Flush and read-Lock MySQL tables while taking the snapshot.
+      
+.. _job-hook-type-command:
+
+``command`` Hooks
+~~~~~~~~~~~~~~~~~
+
+::
+
+
+    jobs:
+    - type: push
+      filesystems: {
+        "<": true,
+        "tmp": false
+      }
+      snapshotting:
+        type: periodic
+        prefix: zrepl_
+        interval: 10m
+        hooks:
+        - type: command
+          path: /etc/zrepl/hooks/zrepl-notify.sh
+          timeout: 30s
+          err_is_fatal: false
+        - type: command
+          path: /etc/zrepl/hooks/special-snapshot.sh
+          filesystems: {
+            "tank/special": true
+          }
+      ...
+
+
+The hook type ``command`` is the only currently supported hook type.
+Future versions of zrepl may support other hook types.
+The ``path`` to the hook executables must be absolute (e.g. ``/etc/zrepl/hooks/zrepl-notify.sh``).
+No arguments may be specified; create a wrapper script if zrepl must call an executable that requires arguments.
+zrepl will call the hook both before and after the snapshot, but with different values of the ``ZREPL_HOOKTYPE`` environment variable; see below.
+The process standard output is logged at level INFO. Standard error is logged at level WARN.
+
+zrepl sets a number of environment variables for the hook processes:
+
+* ``ZREPL_HOOKTYPE``: either "pre_snapshot" or "post_snapshot"
+* ``ZREPL_FS``: the ZFS filesystem name being snapshotted
+* ``ZREPL_SNAPNAME``: the zrepl-generated snapshot name (e.g. ``zrepl_20380119_031407_000``)
+* ``ZREPL_DRYRUN``: set to ``"true"`` if a dry run is in progress so scripts can print, but not run, their commands
+
+An empty template hook can be found in :sampleconf:`hooks/template.sh`.
+
+.. _job-hook-type-postgres-checkpoint:
+
+``postgres-checkpoint`` Hook
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Connects to a Postgres server and executes the ``CHECKPOINT`` statement pre-snapshot.
+Checkpointing applies the WAL contents to all data files and syncs the data files to disk.
+This is not required for a consistent database backup: it merely forward-pays the "cost" of WAL replay to the time of snapshotting instead of at restore.
+However, the Postgres manual recommends against checkpointing during normal operation.
+Further, the operation requires Postgres superuser privileges.
+zrepl users must decide on their own whether this hook is useful for them (it likely isn't).
+
+.. ATTENTION::
+    Note that WALs and Postgres data directory (with all database data files) must be on the same filesystem to guarantee a correct point-in-time backup with the ZFS snapshot.
+
+DSN syntax documented here: `<https://godoc.org/github.com/lib/pq>`_
+
+.. code-block:: sql
+
+   CREATE USER zrepl_checkpoint PASSWORD yourpasswordhere;
+   ALTER ROLE zrepl_checkpoint SUPERUSER;
+
+.. code-block:: yaml
+
+  - type: postgres-checkpoint
+    dsn: "host=localhost port=5432 user=postgres password=yourpasswordhere sslmode=disable"
+    filesystems: {
+        "p1/postgres/data11": true
+    }
+
+.. _job-hook-type-mysql-lock-tables:
+
+``mysql-lock-tables`` Hook
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Connects to MySQL and executes
+
+* pre-snapshot ``FLUSH TABLES WITH READ LOCK`` to lock all tables in all databases in the MySQL server we connect to (`docs <https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock>`_)
+* post-snapshot ``UNLOCK TABLES``  reverse above operation.
+
+Above procedure is documented in the `MySQL manual <https://dev.mysql.com/doc/mysql-backup-excerpt/5.7/en/backup-methods.html>`_
+as a means to produce a consistent backup of a MySQL DBMS installation (i.e., all databases).
+
+`DSN syntax <https://github.com/go-sql-driver/mysql#dsn-data-source-name>`_: ``[username[:password]@][protocol[(address)]]/dbname[?param1=value1&...&paramN=valueN]``
+
+.. ATTENTION::
+    All MySQL databases must be on the same ZFS filesystem to guarantee a consistent point-in-time backup with the ZFS snapshot.
+
+.. code-block:: sql
+
+   CREATE USER zrepl_lock_tables IDENTIFIED BY 'yourpasswordhere';
+   GRANT RELOAD ON *.* TO zrepl_lock_tables;
+   FLUSH PRIVILEGES;
+
+.. code-block:: yaml
+
+  - type: mysql-lock-tables
+    dsn: "zrepl_lock_tables:yourpasswordhere@tcp(localhost)/"
+    filesystems: {
+      "tank/mysql": true
+    }