Retry handling is broken since the gRPC changes (wrong error classification).
Will come back at some point, hopefully by merging the replication
driver retry infrastructure.
However, the simpler architecture allows an easy fix for the problem
that the pruner practically gave up on the first error it encountered.
fixes#123
* Remove explicity state machine code for all but replication.Replication
* Introduce explicit error types that satisfy interfaces which provide
sufficient information for replication.Replication to make intelligent
retry + queuing decisions
* Temporary()
* LocalToFS()
* Remove the queue and replace it with a simple array that we sort each
time (yay no generics :( )
Pruner now backs off as soon as there is an error, making that error the
Error field in the pruner report.
The error is also stored in the specific *fs that failed, and we
maintain an error counter per *fs to de-prioritize those fs that failed.
Like with replication, the de-prioritization on errors is to avoid '
getting stuck' with an individual filesystem until the watchdog hits.
ActiveSide.do() can only run sequentially, i.e. we cannot run
replication and pruning in parallel. Why?
* go-streamrpc only allows one active request at a time
(this is bad design and should be fixed at some point)
* replication and pruning are implemented independently, but work on the
same resources (snapshots)
A: pruning might destroy a snapshot that is planned to be replicated
B: replication might replicate snapshots that should be pruned
We do not have any resource management / locking for A and B, but we
have a use case where users don't want their machine fill up with
snapshots if replication does not work.
That means we _have_ to run the pruners.
A further complication is that we cannot just cancel the replication
context after a timeout and move on to the pruner: it could be initial
replication and we don't know how long it will take.
(And we don't have resumable send & recv yet).
With the previous commits, we can implement the watchdog using context
cancellation.
Note that the 'MadeProgress()' calls can only be placed right before
non-error state transition. Otherwise, we could end up in a live-lock.
A bookmark with a well-known name is used to track which version was
last successfully received by the receiver.
The createtxg that can be retrieved from the bookmark using `zfs get` is
used to set the Replicated attribute of each snap on the sender:
If the snap's CreateTXG > the cursor's, it is not yet replicated,
otherwise it has been.
There is an optional config option to change the behvior to
`CreateTXG >= the cursor's`, and the implementation defaults to that.
The reason: While things work just fine with `CreateTXG > the cursor's`,
ZFS does not provide size estimates in a `zfs send` dry run
(see acd2418).
However, to enable the use case of keeping the snapshot only around for
the replication, the config flag exists.