- uses version metric for 'instances up'
- displays active task count
- displays send abstractions cache entry count
- in general, graphs have a shorter y axis for better overview
fixes#332
This is a stop-gap solution until we re-write the pruner to support
rules for removing step holds.
Note that disabling step holds for incremental sends does not affect
zrepl's guarantee that incremental replication is always possible:
Suppose you yank the external drive during an incremental @from -> @to step:
* restarting that step or future incrementals @from -> @to_later` will be possible
because the replication cursor bookmark points to @from until the step is complete
* resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to.
* in that case, the replication algorithm should determine that the resumable state
on the receiving side isuseless because @to no longer exists on the sending side,
and consequently clear it, and restart an incremental step @from -> @to_later
refs #288
- drop HintMostRecentCommonAncestor rpc call
- it is wrong to put faith into the active side of the replication to always make that call
(we might not trust it, ref pull setup)
- clean up step holds + step bookmarks + replication cursor bookmarks on
send RPC instead
- this makes it symmetric with Receive RPC
- use a cache (endpoint.sendAbstractionsCache) to avoid the cost of
listing the on-disk endpoint abstractions state on every step
The "create" methods for endpoint abstractions (CreateReplicationCursor, HoldStep) are now fully
idempotent and return an Abstraction.
Notes about endpoint.sendAbstractionsCache:
- fills lazily from disk state on first `Get` operation
- fill from disk is generally only attempted once
- unless the `ListAbstractions` fails, in which case the fill from
disk is retried on next `Get` (the current `Get` will observe a
subset of the actual on-disk abstractions)
- the `Invalidate` method is called
- it is a global (zrepl process-wide) cache
fixes#316
- discovered during investigation of #316
- this is not the fix for #316, as a malicious receiver who doesn't
implement the behavior added by this patch could still cause leakage
of step holds on the sender
refs #316
package trace:
- introduce the concept of tasks and spans, tracked as linked list within ctx
- see package-level docs for an overview of the concepts
- **main feature 1**: unique stack of task and span IDs
- makes it easy to follow a series of log entries in concurrent code
- **main feature 2**: ability to produce a chrome://tracing-compatible trace file
- either via an env variable or a `zrepl pprof` subcommand
- this is not a CPU profile, we already have go pprof for that
- but it is very useful to visually inspect where the
replication / snapshotter / pruner spends its time
( fixes#307 )
usage in package daemon/logging:
- goal: every log entry should have a trace field with the ID stack from package trace
- make `logging.GetLogger(ctx, Subsys)` the authoritative `logger.Logger` factory function
- the context carries a linked list of injected fields which
`logging.GetLogger` adds to the logger it returns
- `logging.GetLogger` also uses package `trace` to get the
task-and-span-stack and injects it into the returned logger's fields
- prior to this patch, context cancellation would leave rpc.Server open
- did not make problems because context was only cancelled by SIGINT,
which was immediately followed by os.Exit