zrepl/util/watchdog/watchdog.go
Christian Schwarz 69bfcb7bed daemon/active: implement watchdog to handle stuck replication / pruners
ActiveSide.do() can only run sequentially, i.e. we cannot run
replication and pruning in parallel. Why?

* go-streamrpc only allows one active request at a time
(this is bad design and should be fixed at some point)
* replication and pruning are implemented independently, but work on the
same resources (snapshots)

A: pruning might destroy a snapshot that is planned to be replicated
B: replication might replicate snapshots that should be pruned

We do not have any resource management / locking for A and B, but we
have a use case where users don't want their machine fill up with
snapshots if replication does not work.
That means we _have_ to run the pruners.

A further complication is that we cannot just cancel the replication
context after a timeout and move on to the pruner: it could be initial
replication and we don't know how long it will take.
(And we don't have resumable send & recv yet).

With the previous commits, we can implement the watchdog using context
cancellation.
Note that the 'MadeProgress()' calls can only be placed right before
non-error state transition. Otherwise, we could end up in a live-lock.
2018-10-19 17:23:00 +02:00

42 lines
699 B
Go

package watchdog
import (
"fmt"
"sync"
"time"
)
type Progress struct {
lastUpd time.Time
}
func (p *Progress) String() string {
return fmt.Sprintf("last update at %s", p.lastUpd)
}
func (p *Progress) madeProgressSince(p2 *Progress) bool {
if p.lastUpd.IsZero() && p2.lastUpd.IsZero() {
return false
}
return p.lastUpd.After(p2.lastUpd)
}
type KeepAlive struct {
mtx sync.Mutex
p Progress
}
func (k *KeepAlive) MadeProgress() {
k.mtx.Lock()
defer k.mtx.Unlock()
k.p.lastUpd = time.Now()
}
func (k *KeepAlive) ExpectProgress(last *Progress) (madeProgress bool) {
k.mtx.Lock()
defer k.mtx.Unlock()
madeProgress = k.p.madeProgressSince(last)
*last = k.p
return madeProgress
}