bookmarking: prune policy for bookmarks

refs #34
2025-08-09 07:05:47 +02:00 · 2018-02-17 20:48:31 +01:00
parent 8e34843eb1
commit aa92261ea7
15 changed files with 149 additions and 48 deletions
--- a/docs/configuration/jobs.rst
+++ b/docs/configuration/jobs.rst
@ -48,14 +48,14 @@ Example: :sampleconf:`pullbackup/productionhost.yml`.
    * - ``interval``
      - snapshotting interval
    * - ``prune``
-      - |prune| policy for filesytems in ``filesystems`` with prefix ``snapshot_prefix``
+      - |prune| for versions of filesytems in ``filesystems``, versions prefixed with ``snapshot_prefix``


 - Snapshotting Task (every ``interval``, |patient|)

  - A snapshot of filesystems matched by ``filesystems`` is taken every ``interval`` with prefix ``snapshot_prefix``.
  - A bookmark of that snapshot is created with the same name.
-  - The ``prune`` policy is triggered on filesystems matched by ``filesystems`` with snapshots matched by ``snapshot_prefix``.
+  - The ``prune`` policy is evaluated for versions of filesystems matched by ``filesystems``, versions prefixed with ``snapshot_prefix``.

 - Serve Task

@ -65,12 +65,6 @@ A source job is the counterpart to a :ref:`job-pull`.

 Make sure you read the |prune| policy documentation.

-Note that zrepl does not prune bookmarks due to the following reason:
-a pull job may stop replication due to link failure, misconfiguration or administrative action.
-The source prune policy will eventually destroy the last common snapshot between source and pull job.
-Without bookmarks, the prune policy would need to perform full replication again.
-With bookmarks, we can resume incremental replication, only losing the snapshots pruned since the outage.
-
 .. _job-pull:

 Pull Job
@ -99,7 +93,7 @@ Example: :sampleconf:`pullbackup/backuphost.yml`
    * - ``snapshot_prefix``
      - prefix snapshots must match to be considered for replication & pruning
    * - ``prune``
-      - |prune| policy for local filesystems reachable by ``mapping``
+      - |prune| policy for versions of filesystems of local filesystems reachable by ``mapping``, versions prefixed with ``snapshot_prefix``

 * Main Task (every ``interval``, |patient|)
 
@ -112,10 +106,11 @@ Example: :sampleconf:`pullbackup/backuphost.yml`
     #. If the local target filesystem does not exist, ``initial_repl_policy`` is used.
     #. On conflicts, an error is logged but replication of other filesystems with mapping continues.
  
-  #. The ``prune`` policy is triggered for all *target filesystems*
+  #. The ``prune`` policy is evaluated for all *target filesystems*

 A pull job is the counterpart to a :ref:`job-source`.

+Make sure you read the |prune| policy documentation.

 .. _job-local:

@ -163,8 +158,6 @@ Example: :sampleconf:`localbackup/host1.yml`
  #. The ``prune_rhs`` policy is triggered for all *target filesystems*

 A local job is combination of source & pull job executed on the same machine.
-Note that while snapshots are pruned, bookmarks are not pruned and kept around forever.
-Refer to the comments on :ref:`source job <job-source>` for the reasoning behind this.

 Terminology
 -----------
@ -188,3 +181,7 @@ patient task
        * waits for the last invocation to finish
        * logs a warning with the effective task duration
        * immediately starts a new invocation of the task
+
+filesystem version
+
+    A snapshot or a bookmark.
--- a/docs/configuration/prune.rst
+++ b/docs/configuration/prune.rst
@ -3,9 +3,9 @@
 Pruning Policies
 ================

-In zrepl, *pruning* means *destroying snapshots by some policy*.
+In zrepl, *pruning* means *destroying filesystem versions by some policy* where filesystem versions are bookmarks and snapshots.

-A *pruning policy* takes a list of snapshots and -- for each snapshot -- decides whether it should be kept or destroyed.
+A *pruning policy* takes a list of filesystem versions and decides for each whether it should be kept or destroyed.

 The job context defines which snapshots are even considered for pruning, for example through the ``snapshot_prefix`` variable.
 Check the respective :ref:`job definition <job>` for details.
@ -25,6 +25,7 @@ Retention Grid

    jobs:
    - name: pull_app-srv
+      type: pull
      ...
      prune:
        policy: grid
@ -34,6 +35,15 @@ Retention Grid
                                │
                                └─ 24 adjacent one-hour intervals

+    - name: pull_backup
+      type: source
+      interval: 10m
+      prune:
+        policy: grid
+        grid: 1x1d(keep=all)
+        keep_bookmarks: 144
+
+
 The retention grid can be thought of as a time-based sieve:
 The ``grid`` field specifies a list of adjacent time intervals:
 the left edge of the leftmost (first) interval is the ``creation`` date of the youngest snapshot.
@ -43,6 +53,11 @@ Each interval carries a maximum number of snapshots to keep.
 It is secified via ``(keep=N)``, where ``N`` is either ``all`` (all snapshots are kept) or a positive integer.
 The default value is **1**.

+Bookmarks are not affected by the above.
+Instead, the ``keep_bookmarks`` field specifies the number of bookmarks to be kept per filesystem.
+You only need to specify ``keep_bookmarks`` at the source-side of a replication setup since the destination side does not receive bookmarks.
+You can specify ``all`` as a value to keep all bookmarks, but be warned that you should install some other way to prune unneeded ones then (see below).
+
 The following procedure happens during pruning:

 #. The list of snapshots eligible for pruning is sorted by ``creation``
@ -54,14 +69,16 @@ The following procedure happens during pruning:
   #. the contained snapshot list is sorted by creation.
   #. snapshots from the list, oldest first, are destroyed until the specified ``keep`` count is reached.
   #. all remaining snapshots on the list are kept.
+#. The list of bookmarks eligible for pruning is sorted by ``createtxg`` and the most recent ``keep_bookmarks`` bookmarks are kept.
+
+.. _replication-downtime:

 .. ATTENTION::

-   The configuration of the first interval (``1x1h(keep=all)`` in the example) determines the **maximum allowable replication lag** because the source and destination pruning policies do not coordinate:
-   if replication does not work for whatever reason, source will continue to execute the prune policy.
-   Eventually, source destroys a snapshot that has never been replicated to destination, degrading the temporal resolution of your backup.
+    Be aware that ``keep_bookmarks x interval`` (interval of the job level) controls the **maximum allowable replication downtime** between source and destination.
+    If replication does not work for whatever reason, source and destination will eventually run out of sync because the source will continue pruning snapshots.
+    The only recovery in that case is full replication, which may not always be viable due to disk space or traffic constraints.

-   Thus, **always** configure the first interval to ``1x?(keep=all)``, substituting ``?`` with the maximum time replication may fail due to downtimes, maintenance, connectivity issues, etc.
-
-.. We intentionally do not mention that bookmarks are used to bridge the gap between source and dest that are out of sync snapshot-wise. This is an implementation detail.
+    Further note that while bookmarks consume a constant amount of disk space, listing them requires temporary dynamic **kernel memory** proportional to the number of bookmarks.
+    Thus, do not use ``all`` or an inappropriately high value without good reason.