zrepl/rpc
Christian Schwarz 2927d0ca15 rpc: use grpchelper package, add grpc.KeepaliveEnforcementPolicy, fix 'transport is closing' error
Symptom: zrepl log message:

    rpc error: code = Unavailable desc = transport is closing

Underlying Problem:

* rpc.NewServer was not using grpchelper.NewServer and not setting Server KeepaliveParams by itself
* and even grpchelper.NewServer didn't set a KeepaliveEnforcementPolicy
* However, KeepaliveEnforcementPolicy is necessary if the client keepalive is configured with non-default values
* .. which grpchelper.ClientConn does, and that method is used by rpc.NewClient

* rpc.Client was sending pings
* lacking server-side KeepaliveEnforcementPolicy caused grpc-hard-coded `pingStrikes` counter to go past limit of 2:
  021bd5734e/internal/transport/http2_server.go (L726)

How was this debugged:
* GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info PATH=/root/mockpath:$PATH zrepl daemon
* with a patch on grpc package to get more log messages on pingStrikes increases:

    diff --git a/internal/transport/http2_server.go b/internal/transport/http2_server.go
    index 8b04b039..f68f55ea 100644
    --- a/internal/transport/http2_server.go
    +++ b/internal/transport/http2_server.go
    @@ -214,6 +214,7 @@ func newHTTP2Server(conn net.Conn, config *ServerConfig) (_ ServerTransport, err
            if kep.MinTime == 0 {
                    kep.MinTime = defaultKeepalivePolicyMinTime
            }
    +       errorf("effective keepalive enforcement policy: %#v", kep)
            done := make(chan struct{})
            t := &http2Server{
                    ctx:               context.Background(),
    @@ -696,6 +697,7 @@ func (t *http2Server) handlePing(f *http2.PingFrame) {
            t.controlBuf.put(pingAck)

            now := time.Now()
    +       errorf("transport:ping handlePing, last ping %s ago", now.Sub(t.lastPingAt))
            defer func() {
                    t.lastPingAt = now
            }()
    @@ -713,11 +715,13 @@ func (t *http2Server) handlePing(f *http2.PingFrame) {
                    // Keepalive shouldn't be active thus, this new ping should
                    // have come after at least defaultPingTimeout.
                    if t.lastPingAt.Add(defaultPingTimeout).After(now) {
    +                       errorf("transport:ping strike ns < 1 && !t.kep.PermitWithoutStream")
                            t.pingStrikes++
                    }
            } else {
                    // Check if keepalive policy is respected.
                    if t.lastPingAt.Add(t.kep.MinTime).After(now) {
    +                       errorf("transport:ping strike !(ns < 1 && !t.kep.PermitWithoutStream) kep.MinTime=%s ns=%d", t.kep.MinTime, ns)
                            t.pingStrikes++
                    }
            }

fixes #181
2020-01-04 21:10:41 +01:00
..
dataconn rpc/dataconn + build: support GOOS={solaris,illumos} 2019-11-16 22:07:47 +01:00
grpcclientidentity rpc: use grpchelper package, add grpc.KeepaliveEnforcementPolicy, fix 'transport is closing' error 2020-01-04 21:10:41 +01:00
netadaptor format source tree using goimports 2019-03-22 19:41:12 +01:00
transportmux run golangci-lint and apply suggested fixes 2019-03-27 13:12:26 +01:00
versionhandshake run golangci-lint and apply suggested fixes 2019-03-27 13:12:26 +01:00
rpc_client.go rpc: actually return an error if ReqSend fails 2019-03-28 22:17:12 +01:00
rpc_debug.go run golangci-lint and apply suggested fixes 2019-03-27 13:12:26 +01:00
rpc_doc.go format source tree using goimports 2019-03-22 19:41:12 +01:00
rpc_logging.go run golangci-lint and apply suggested fixes 2019-03-27 13:12:26 +01:00
rpc_mux.go format source tree using goimports 2019-03-22 19:41:12 +01:00
rpc_server.go rpc: use grpchelper package, add grpc.KeepaliveEnforcementPolicy, fix 'transport is closing' error 2020-01-04 21:10:41 +01:00