zrepl/replication
Christian Schwarz 69bfcb7bed daemon/active: implement watchdog to handle stuck replication / pruners
ActiveSide.do() can only run sequentially, i.e. we cannot run
replication and pruning in parallel. Why?

* go-streamrpc only allows one active request at a time
(this is bad design and should be fixed at some point)
* replication and pruning are implemented independently, but work on the
same resources (snapshots)

A: pruning might destroy a snapshot that is planned to be replicated
B: replication might replicate snapshots that should be pruned

We do not have any resource management / locking for A and B, but we
have a use case where users don't want their machine fill up with
snapshots if replication does not work.
That means we _have_ to run the pruners.

A further complication is that we cannot just cancel the replication
context after a timeout and move on to the pruner: it could be initial
replication and we don't know how long it will take.
(And we don't have resumable send & recv yet).

With the previous commits, we can implement the watchdog using context
cancellation.
Note that the 'MadeProgress()' calls can only be placed right before
non-error state transition. Otherwise, we could end up in a live-lock.
2018-10-19 17:23:00 +02:00
..
fsrep daemon/active: implement watchdog to handle stuck replication / pruners 2018-10-19 17:23:00 +02:00
internal replication: wakeup + retry handling: make wakeups work in retry wait states 2018-10-12 13:12:28 +02:00
pdu replication: regenerate pdu.pb with new protoc-gen-go 2018-10-13 17:23:39 +02:00
context.go gofmt 2018-08-25 21:30:25 +02:00
mainfsm.go daemon/active: implement watchdog to handle stuck replication / pruners 2018-10-19 17:23:00 +02:00
state_enumer.go replication: simpler PermanentError state + handle context cancellation 2018-10-19 17:23:00 +02:00