As per some of the discussion items I modified the proposal to have a
special top level includes key as well as added references to handling
of specific file includes.
**Note: This is a WIP PR that presently only adds the documentation for
the features so that proposal can be discussed.**
Background
==========
While trying to use zrepl in my ansible driven home lab deployment I ran
into an interesting problem. My ZFS based servers subscribe to different
sometimes overlapping roles.
Example role distribution between two servers:
```
serverA:
- common
- web
- file
serverB:
- common
- git
- file
```
Each role wants to create and manage a ZFS dataset with its own
replication / backup policies:
- web: pool/web
- git: pool/git
- file pool/file
At present the creation of a ZFS dataset from each role role is somewhat
very easy, so to is the creation of the basic zrepl configuration file
from the "common" role.
However, when each role tries to register it's job(s) into the singular
zrepl configuration files things get tricky.
I could try adding a role at the end that hardcodes the datasets that
need to be backed up but that seems a bit hacky.
I could also use ansible's `lineinfile` task to try to idempotently add
each dataset's snapshot jobs to the zrepl configuration files but that
causes a problem:
Everytime the "common" role gets run the basic zrepl configuration file
gets re-created causing the web, git, and file roles to all register
changes as they have to re-insert all jobs back into the singular
configuration file.
Proposed Solution
=================
The proposed solution is to allow for the distribution of zrepl job
definition between multiple different YAML files that can be included
from the main zrepl configuration files.
```
global: ...
jobs:
include: jobs.d
```
This directive would be only acceptable in the main configuration file
and is mutually exclusive with any other job definitions in the file.
To keep things lean there will be no conflict resolution provided to
users, job names must be unique across all included job YAML files.
With this feature, the above problem becomes much simpler:
- Common: Sets up the global zrepl configuration and the include
directive
- web/git/file: Each manage their own datasets and create their
jobs.d/web.yml, jobs.d/git.yml, and jobs.d/file.yml.
This PR adds a new field optional field `timestamp_location` that allows
the user to specify a timezone different than the default UTC for use in
the snapshot suffix.
I took @mjasnik 's PR https://github.com/zrepl/zrepl/pull/785 and
refactored+extended it as follows:
* move all formatting logic into its own package
* disallow `dense` and `human` with formats != UTC to protect users from
stupidity
* document behavior more clearly
* regression test for existing users
Because some jobs add client identity to root_fs and other jobs don't do
that,
we can't reliable detect overlapping of filesystems. And and the same
time we
need an ability to use equal or overlapped root_fs for different jobs.
For
instance see this config:
```
- name: "zdisk"
type: "sink"
root_fs: "zdisk/zrepl"
serve:
type: "local"
listener_name: "zdisk"
```
and
```
- name: "remote-to-zdisk"
type: "pull"
connect:
type: "tls"
root_fs: "zdisk/zrepl/remote"
```
As you can see, two jobs have overlapped root_fs, but actually datasets
are not
overlapped, because job `zdisk` save everything under
`zdisk/zrepl/localhost`,
because it adds client identity. So they actually use two different
filesystems:
`zdisk/zrepl/localhost` and `zdisk/zrepl/remote`. And we can't detect
this
situation during config check. So let's just remove this check, because
it's
admin's duty to configure correct root_fs's.
---------
Co-authored-by: Christian Schwarz <me@cschwarz.com>
For this kind of debugging, we switched to env vars a while ago.
For example, ZREPL_RPC_DEBUG.
I don't think we have a substitute for the RPCLog stuff.
However, NetConnLogger is still in the codebase.
obsoletes https://github.com/zrepl/zrepl/pull/661
There were and still is too many words. It's a very white paper vibe.
Docs needs to be more brief, exact, and on-point.
closes https://github.com/zrepl/zrepl/pull/618
Previously with unordered list, and 'are considered'
left if unsure whether one or all files are 'considered'.
In reality, the first valid is used, so an ordered list and
perhaps better wording communicates this fact.
refs https://github.com/zrepl/zrepl/pull/618
Config:
```
- type: push
...
conflict_resolution:
initial_replication: most_recent | all | fali
```
The ``initial_replication`` option determines which snapshots zrepl
replicates if the filesystem has not been replicated before.
If ``most_recent`` (the default), the initial replication will only
transfer the most recent snapshot, while ignoring previous snapshots.
If all snapshots should be replicated, specify ``all``.
Use ``fail`` to make replication of the filesystem fail in case
there is no corresponding fileystem on the receiver.
Code-Level Changes, apart from the obvious:
- Rework IncrementalPath()'s return signature.
Now returns an error for initial replications as well.
- Rename & rework it's consumer, resolveConflict().
Co-authored-by: Graham Christensen <graham@grahamc.com>
Fixes https://github.com/zrepl/zrepl/issues/550
Fixes https://github.com/zrepl/zrepl/issues/187
Closes https://github.com/zrepl/zrepl/pull/592
fixes https://github.com/zrepl/zrepl/issues/504
Problem:
plain send + recv with root_fs encrypted + placeholders causes plain recvs
whereas user would expect encrypt-on-recv
Reason:
We create placeholder filesytems with -o encryption=off.
Thus, children received below those placeholders won't inherit
encryption of root_fs.
Fix:
We'll have three values for `recv.placeholders.encryption: unspecified (default) | off | inherit`.
When we create a placeholder, we will fail the operation if `recv.placeholders.encryption = unspecified`.
The exception is if the placeholder filesystem is to encode the client identity ($root_fs/$client_identity) in a pull job.
Those are created in `inherit` mode if the config field is `unspecified` so that users who don't need
placeholders are not bothered by these details.
Future Work:
Automatically warn existing users of encrypt-on-recv about the problem
if they are affected.
The problem that I hit during implementation of this is that the
`encryption` prop's `source` doesn't quite behave like other props:
`source` is `default` for `encryption=off` and `-` when `encryption=on`.
Hence, we can't use `source` to distinguish the following 2x2 cases:
(1) placeholder created with explicit -o encryption=off
(2) placeholder created without specifying -o encryption
with
(A) an encrypted parent at creation time
(B) an unencrypted parent at creation time
- Substitute full words for both string name 'gridspec' and short form 'grid spec'
- Fix alignment and make spacing more consistent
- Fix fall of snapshots into buckets for the example to really reflect right-exclusiveness
closes https://github.com/zrepl/zrepl/pull/535
This was merged to master prematurely as the job components are not decoupled well enough
for these signals to be useful yet.
This reverts commit 2c8c2cfa14.
closes#452
package is now at 95% code coverage and the additional tests codify
all behavior specified in the docs
There is a slight change in behavior:
Intervals are now [duration) instead of (duration].
If the leftmost interval is not keep=all, the most recently created
snapshot will be destroyed if there are other snapshots within
that first interval.
Since we recommend keep=all all over the docs, and zrepl 0.3
will put holds on that snapshot if it is being replicated,
I feel like this is an acceptable change in behavior.
refs #292
fixup of 0bbe2befce
This commit
- adds a configuration in which no step holds, replication cursors, etc. are created
- removes the send.step_holds.disable_incremental setting
- creates a new config option `replication` for active-side jobs
- adds the replication.protection.{initial,incremental} settings, each
of which can have values
- `guarantee_resumability`
- `guarantee_incremental`
- `guarantee_nothing`
(refer to docs/configuration/replication.rst for semantics)
The `replication` config from an active side is sent to both endpoint.Sender and endpoint.Receiver
for each replication step. Sender and Receiver then act accordingly.
For `guarantee_incremental`, we add the new `tentative-replication-cursor` abstraction.
The necessity for that abstraction is outlined in https://github.com/zrepl/zrepl/issues/340.
fixes https://github.com/zrepl/zrepl/issues/340
This is a stop-gap solution until we re-write the pruner to support
rules for removing step holds.
Note that disabling step holds for incremental sends does not affect
zrepl's guarantee that incremental replication is always possible:
Suppose you yank the external drive during an incremental @from -> @to step:
* restarting that step or future incrementals @from -> @to_later` will be possible
because the replication cursor bookmark points to @from until the step is complete
* resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to.
* in that case, the replication algorithm should determine that the resumable state
on the receiving side isuseless because @to no longer exists on the sending side,
and consequently clear it, and restart an incremental step @from -> @to_later
refs #288