Commit Graph

12 Commits

Author SHA1 Message Date
Christian Schwarz
2d8c3692ec rework resume token validation to allow resuming from raw sends of unencrypted datasets
Before this change, resuming from an unencrypted dataset with
send.raw=true specified wouldn't work with zrepl due to overly
restrictive resume token checking.

An initial PR to fix this was made in https://github.com/zrepl/zrepl/pull/503
but it didn't address the core of the problem.
The core of the problem was that zrepl assumed that if a resume token
contained `rawok=true, compressok=true`, the resulting send would be
encrypted. But if the sender dataset was unencrypted, such a resume would
actually result in an unencrypted send.
Which could be totally legitimate but zrepl failed to recognize that.

BACKGROUND
==========

The following snippets of OpenZFS code are insightful regarding how the
various ${X}ok values in the resume token are handled:

- 6c3c5fcfbe/module/zfs/dmu_send.c (L1947-L2012)
- 6c3c5fcfbe/module/zfs/dmu_recv.c (L877-L891)
- https://github.com/openzfs/zfs/blob/6c3c5fc/lib/libzfs/libzfs_sendrecv.c#L1663-L1672

Basically, some zfs send flags make the DMU send code set some DMU send
stream featureflags, although it's not a pure mapping, i.e, which DMU
send stream flags are used depends somewhat on the dataset (e.g., is it
encrypted or not, or, does it use zstd or not).

Then, the receiver looks at some (but not all) feature flags and maps
them to ${X}ok dataset zap attributes.

These are funnelled back to the sender 1:1 through the resume_token.

And the sender turns them into lzc flags.

As an example, let's look at zfs send --raw.
if the sender requests a raw send on an unencrypted dataset, the send
stream (and hence the resume token) will not have the raw stream
featureflag set, and hence the resume token will not have the rawok
field set. Instead, it will have compressok, embedok, and depending
on whether large blocks are present in the dataset, largeblockok set.

WHAT'S ZREPL'S ROLE IN THIS?
============================

zrepl provides a virtual encrypted sendflag that is like `raw`,
but further ensures that we only send encrypted datasets.

For any other resume token stuff, it shoudn't do any checking,
because it's a futile effort to keep up with ZFS send/recv features
that are orthogonal to encryption.

CHANGES MADE IN THIS COMMIT
===========================

- Rip out a bunch of needless checking that zrepl would do during
  planning. These checks were there to give better error messages,
  but actually, the error messages created by the endpoint.Sender.Send
  RPC upon send args validation failure are good enough.
- Add platformtests to validate all combinations of
  (Unencrypted/Encrypted FS) x (send.encrypted = true | false) x (send.raw = true | false)
  for cases both non-resuming and resuming send.

Additional manual testing done:
1. With zrepl 0.5, setup with unencrypted dataset, send.raw=true specified, no send.encrypted specified.
2. Observe that regular non-resuming send works, but resuming doesn't work.
3. Upgrade zrepl to this change.
4. Observe that both regular and resuming send works.

closes https://github.com/zrepl/zrepl/pull/613
2022-09-25 17:32:02 +02:00
Christian Schwarz
dc03db7423 rpc/grpcclientidentity/authlistener_grpc_adaptor: don't assume peer.Addr is set
On Illumos, getpeername doesn't work from Go on socketpair sockets.
That's why .RemoteAddr() returns nil on such a socket.
And that `nil` ultimately lands in the `p.Addr`.
So, `p.Addr.String()` would deref `nil`, leading to

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xaea33e]

    goroutine 614 [running]:
    github.com/zrepl/zrepl/rpc/grpcclientidentity.NewInterceptors.func1({0xf1e158, 0xc000631200}, {0xd514c0, 0xc000631230}, 0xc000032740, 0xc000524348)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/build/amd64/rpc/grpcclientidentity/authlistener_grpc_adaptor.go:121 +0x13e
    github.com/zrepl/zrepl/replication/logic/pdu._Replication_ListFilesystems_Handler({0xdb30c0, 0xc00001a630}, {0xf1e158, 0xc000631200}, 0xc00052b7a0, 0xc000522000)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/build/amd64/replication/logic/pdu/pdu_grpc.pb.go:186 +0x16a
    google.golang.org/grpc.(*Server).processUnaryRPC(0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200, 0xc000522150, 0x1497c78, 0x0)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:1217 +0xe28
    google.golang.org/grpc.(*Server).handleStream(0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200, 0x0)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:1540 +0xcb3
    google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc000373b70, 0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:878 +0xad
    created by google.golang.org/grpc.(*Server).serveStreams.func1
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:876 +0x1ec

fixes https://github.com/zrepl/zrepl/issues/598
2022-07-10 23:59:40 +02:00
Christian Schwarz
efe7b17d21 Update to protobuf v1.25 and grpc 1.35; bump CI to go1.12
From:
github.com/golang/protobuf v1.3.2
google.golang.org/grpc v1.17.0

To:
github.com/golang/protobuf v1.4.3
google.golang.org/grpc v1.35.0
google.golang.org/protobuf v1.25.0

About the two protobuf packages:
https://developers.google.com/protocol-buffers/docs/reference/go/faq
> Version v1.4.0 and higher of github.com/golang/protobuf wrap the new
implementation and permit programs to adopt the new API incrementally. For
example, the well-known types defined in github.com/golang/protobuf/ptypes are
simply aliases of those defined in the newer module. Thus,
google.golang.org/protobuf/types/known/emptypb and
github.com/golang/protobuf/ptypes/empty may be used interchangeably.

Notable Code Changes in zrepl:
- generate protobufs now contain a mutex so we can't copy them by value
  anymore
- grpc.WithDialer is deprecated => use grpc.WithContextDialer instead

Go1.12 is now actually required by some of the dependencies.
2021-01-25 00:39:01 +01:00
Christian Schwarz
10a14a8c50 [#307] add package trace, integrate it with logging, and adopt it throughout zrepl
package trace:

- introduce the concept of tasks and spans, tracked as linked list within ctx
    - see package-level docs for an overview of the concepts
    - **main feature 1**: unique stack of task and span IDs
        - makes it easy to follow a series of log entries in concurrent code
    - **main feature 2**: ability to produce a chrome://tracing-compatible trace file
        - either via an env variable or a `zrepl pprof` subcommand
        - this is not a CPU profile, we already have go pprof for that
        - but it is very useful to visually inspect where the
          replication / snapshotter / pruner spends its time
          ( fixes #307 )

usage in package daemon/logging:

- goal: every log entry should have a trace field with the ID stack from package trace

- make `logging.GetLogger(ctx, Subsys)` the authoritative `logger.Logger` factory function
    - the context carries a linked list of injected fields which
      `logging.GetLogger` adds to the logger it returns
    - `logging.GetLogger` also uses package `trace` to get the
      task-and-span-stack and injects it into the returned logger's fields
2020-05-19 11:30:02 +02:00
Christian Schwarz
5b52e5e331 rpc: fix missing logger context vars in control connection handlers
use ctxInterceptor in gRPC interceptors
also panic if the unimplemented stream interceptor is used
2020-02-14 21:42:03 +01:00
Christian Schwarz
2927d0ca15 rpc: use grpchelper package, add grpc.KeepaliveEnforcementPolicy, fix 'transport is closing' error
Symptom: zrepl log message:

    rpc error: code = Unavailable desc = transport is closing

Underlying Problem:

* rpc.NewServer was not using grpchelper.NewServer and not setting Server KeepaliveParams by itself
* and even grpchelper.NewServer didn't set a KeepaliveEnforcementPolicy
* However, KeepaliveEnforcementPolicy is necessary if the client keepalive is configured with non-default values
* .. which grpchelper.ClientConn does, and that method is used by rpc.NewClient

* rpc.Client was sending pings
* lacking server-side KeepaliveEnforcementPolicy caused grpc-hard-coded `pingStrikes` counter to go past limit of 2:
  021bd5734e/internal/transport/http2_server.go (L726)

How was this debugged:
* GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info PATH=/root/mockpath:$PATH zrepl daemon
* with a patch on grpc package to get more log messages on pingStrikes increases:

    diff --git a/internal/transport/http2_server.go b/internal/transport/http2_server.go
    index 8b04b039..f68f55ea 100644
    --- a/internal/transport/http2_server.go
    +++ b/internal/transport/http2_server.go
    @@ -214,6 +214,7 @@ func newHTTP2Server(conn net.Conn, config *ServerConfig) (_ ServerTransport, err
            if kep.MinTime == 0 {
                    kep.MinTime = defaultKeepalivePolicyMinTime
            }
    +       errorf("effective keepalive enforcement policy: %#v", kep)
            done := make(chan struct{})
            t := &http2Server{
                    ctx:               context.Background(),
    @@ -696,6 +697,7 @@ func (t *http2Server) handlePing(f *http2.PingFrame) {
            t.controlBuf.put(pingAck)

            now := time.Now()
    +       errorf("transport:ping handlePing, last ping %s ago", now.Sub(t.lastPingAt))
            defer func() {
                    t.lastPingAt = now
            }()
    @@ -713,11 +715,13 @@ func (t *http2Server) handlePing(f *http2.PingFrame) {
                    // Keepalive shouldn't be active thus, this new ping should
                    // have come after at least defaultPingTimeout.
                    if t.lastPingAt.Add(defaultPingTimeout).After(now) {
    +                       errorf("transport:ping strike ns < 1 && !t.kep.PermitWithoutStream")
                            t.pingStrikes++
                    }
            } else {
                    // Check if keepalive policy is respected.
                    if t.lastPingAt.Add(t.kep.MinTime).After(now) {
    +                       errorf("transport:ping strike !(ns < 1 && !t.kep.PermitWithoutStream) kep.MinTime=%s ns=%d", t.kep.MinTime, ns)
                            t.pingStrikes++
                    }
            }

fixes #181
2020-01-04 21:10:41 +01:00
Christian Schwarz
84eefa57bc rpc/grpcclientidentity: remove hard-coded deadline in listener adatper causing crash
Verified once again that grpc.DialContext is indeed non-blocking.
However, it checks in a defer stmt that the passed dial is not ctx.Done().
That is highly unusual if the dial is non-blocking.
But it might still happen, maybe because of machine suspend during the function call and before the defer stmt is executed.

panic:
context deadline exceeded
goroutine 49 [running]:
github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper.ClientConn(0x1906ea0, 0xc0003ea1e0, 0x1921620, 0xc0002da660, 0x0)
        /gopath/src/github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper/authlistener_grpc_adaptor_wrapper.go:49 +0x38c
github.com/zrepl/zrepl/rpc.NewClient(0x1906f00, 0xc0002d60f0, 0x1921620, 0xc0002da640, 0x1921620, 0xc0002da660, 0x1921620, 0xc0002da6a0, 0x1921620)
        /gopath/src/github.com/zrepl/zrepl/rpc/rpc_client.go:53 +0x199
github.com/zrepl/zrepl/daemon/job.(*modePush).ConnectEndpoints(0xc0000d1e90, 0x1921620, 0xc0002da640, 0x1921620, 0xc0002da660, 0x1921620, 0xc0002da6a0, 0x1906f00, 0xc0002d60f0)
        /gopath/src/github.com/zrepl/zrepl/daemon/job/active.go:105 +0x15d
github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do(0xc0000d6120, 0x1918720, 0xc00020f170)
        /gopath/src/github.com/zrepl/zrepl/daemon/job/active.go:356 +0x236
github.com/zrepl/zrepl/daemon/job.(*ActiveSide).Run(0xc0000d6120, 0x1918720, 0xc00009c660)
        /gopath/src/github.com/zrepl/zrepl/daemon/job/active.go:347 +0x289
github.com/zrepl/zrepl/daemon.(*jobs).start.func1(0xc0000fc880, 0x1921620, 0xc0002da120, 0x191a320, 0xc0000d6120, 0x1918720, 0xc0002d6a80)
2019-10-10 14:02:12 +02:00
Christian Schwarz
5b97953bfb run golangci-lint and apply suggested fixes 2019-03-27 13:12:26 +01:00
Christian Schwarz
afed762774 format source tree using goimports 2019-03-22 19:41:12 +01:00
Christian Schwarz
b85ec52387 rpc/ctrl: nicer perr info debug log messages 2019-03-13 19:20:04 +01:00
Christian Schwarz
07b43bffa4 replication: refactor driving logic (no more explicit state machine) 2019-03-13 15:00:40 +01:00
Christian Schwarz
796c5ad42d rpc rewrite: control RPCs using gRPC + separate RPC for data transfer
transport/ssh: update go-netssh to new version
    => supports CloseWrite and Deadlines
    => build: require Go 1.11 (netssh requires it)
2019-03-13 13:53:48 +01:00