Go upgrade:
- Go 1.23 is current => use that for release builds
- Go 1.22 is less than one year old, it's desirable to support it.
- The [`Go Toolchains`](https://go.dev/doc/toolchain) stuff is available
in both of these (would also be in Go 1.21). That is quite nice stuff,
but required some changes to how we versions we use in CircleCI and
the `release-docker` Makefile target.
Protobuf upgrade:
- Go to protobuf GH release website
- Download latest locally
- run `sha256sum`
- replace existing pinned hashes
- `make generate`
Deps upgrade:
- `go get -t -u all`
- repository moves aren't handled well automatically, fix manually
- repeat until no changes
Go 1.18 deprecated net.Error.Temporary().
This commit cleans up places where we use it incorrectly.
Also, the rpc layer defines some errors that implement
interface { Temporary() bool }
I added comments to all of the implementations to indicate
whether they will be required if net.Error.Temporary is ever
ever removed in the future.
For HandshakeError, the Temporary() return value is actually
important. I moved & rewrote a (previously misplaced) comment
there.
The ReadStreamError changes were
1. necessary to pacify newer staticcheck and
2. technically, an error can implement Temporary()
without being net.Err. This applies to some syscall
errors in the standard library.
Reading list for those interested:
- https://github.com/golang/go/issues/45729
- https://groups.google.com/g/golang-nuts/c/-JcZzOkyqYI
- https://man7.org/linux/man-pages/man2/accept.2.html
Note: This change was prompted by staticheck:
> SA1019: neterr.Temporary has been deprecated since Go 1.18 because it
> shouldn't be used: Temporary errors are not well-defined. Most
> "temporary" errors are timeouts, and the few exceptions are surprising.
> Do not use this method. (staticcheck)
It may be desirable to check that a config is valid without checking for
the existence of certificate files (e.g. when validating a config inside
a sandbox without access to the cert files).
This will be very useful for NixOS so that we can check the config file
at nix-build time (e.g. potentially without proper permissions to read cert
files for a TLS connection).
fixes https://github.com/zrepl/zrepl/issues/467
closes https://github.com/zrepl/zrepl/pull/587
We had too many spurious test failures in the past.
But on a developer machine, the tests don't usually fail because the
system isn't loaded as much.
So, only disable test on CircleCI.
From:
github.com/golang/protobuf v1.3.2
google.golang.org/grpc v1.17.0
To:
github.com/golang/protobuf v1.4.3
google.golang.org/grpc v1.35.0
google.golang.org/protobuf v1.25.0
About the two protobuf packages:
https://developers.google.com/protocol-buffers/docs/reference/go/faq
> Version v1.4.0 and higher of github.com/golang/protobuf wrap the new
implementation and permit programs to adopt the new API incrementally. For
example, the well-known types defined in github.com/golang/protobuf/ptypes are
simply aliases of those defined in the newer module. Thus,
google.golang.org/protobuf/types/known/emptypb and
github.com/golang/protobuf/ptypes/empty may be used interchangeably.
Notable Code Changes in zrepl:
- generate protobufs now contain a mutex so we can't copy them by value
anymore
- grpc.WithDialer is deprecated => use grpc.WithContextDialer instead
Go1.12 is now actually required by some of the dependencies.
GO111MODULE=on golangci-lint run ./...
endpoint/endpoint.go:487:9: S1039: unnecessary use of fmt.Sprintf (gosimple)
panic(fmt.Sprintf("ClientIdentityKey context value must be set"))
^
platformtest/platformtest_ops.go:259:41: S1039: unnecessary use of fmt.Sprintf (gosimple)
return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
^
platformtest/platformtest_ops.go:266:41: S1039: unnecessary use of fmt.Sprintf (gosimple)
return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
^
util/optionaldeadline/optionaldeadline_test.go:97:50: SA1029: should not use built-in type string as key for value; define your own type to avoid collisions (staticcheck)
pctx := context.WithValue(context.Background(), "key", "value")
^
rpc/rpc_debug.go:8:5: var `debugEnabled` is unused (unused)
rpc/dataconn/dataconn_debug.go:8:5: var `debugEnabled` is unused (unused)
rpc/dataconn/frameconn/frameconn.go:42:9: S1039: unnecessary use of fmt.Sprintf (gosimple)
panic(fmt.Sprintf("frame header is 8 bytes long"))
^
platformtest/platformtest_ops.go:322:40: S1039: unnecessary use of fmt.Sprintf (gosimple)
return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
- `timeoutconn` handles state, yet calls to Read/Write make a copy of
that state (non-pointer receiver) so any outbound calls will not have
the state updated
- Even without the copy issue, the renew methods can in edge cases set a
new deadline _after_ DisableTimeouts have been called, consider the
following racy behavior:
1. `renewReadDeadline` is called, checks `renewDeadlinesDisabled`
(not disabled)
2. `DisableTimeouts` is called, sets `renewDeadlinesDisabled`
3. `DisableTimeouts` invokes `c.SetDeadline`
4. `renewReadDeadline` invokes `c.SetReadDeadline`
To fix the above, the `Conn` receiver was made to be a pointer
everywhere and access to renewDeadlinesDisabled is now guarded
by an RWMutex instead of using atomics.
closes#415
* The SendStream.Close() was not called by dataconn.Server, which left the zfs send process dangling.
* When the source job's ctx interceptor closed the task, the dangling zfs send was detect by the trace package and panicked.
020-07-25T19:54:41-04:00 [ERRO][latitude][rpc.data][cyZj$J3Ca$J3Ca.CJwB]: cannot write send stream err="frameconn: shutting down"
panic: end task: 1 active child tasks: end task: task still has active child tasks
goroutine 196966 [running]:
github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1.1(0xc000320680, 0xcde000)
/home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace.go:221 +0x2f7
github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1()
/home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace.go:237 +0x38
github.com/zrepl/zrepl/daemon/logging/trace.WithTaskAndSpan.func1()
/home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace_convenience.go:41 +0x37
github.com/zrepl/zrepl/daemon/job.(*PassiveSide).Run.func1(0xdcf780, 0xc0000a3560, 0xdc65a0, 0xc00035e620, 0xc0000a34d0)
/home/jeremy/go/src/github.com/zrepl/zrepl/daemon/job/passive.go:194 +0x2e7
github.com/zrepl/zrepl/rpc.NewServer.func3(0xdcf780, 0xc0001ce4b0, 0xdc65e0, 0xc00035e600, 0xc0000a34d0)
/home/jeremy/go/src/github.com/zrepl/zrepl/rpc/rpc_server.go:82 +0xd5
github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn(0xc0000a2ba0, 0xc00018eca0)
/home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:149 +0x3be
github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve.func3(0xc0000b8180, 0xc0000a2ba0, 0xc00018eca0)
/home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:108 +0x5d
created by github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve
/home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:106 +0x24a
2020-07-25T19:58:55-04:00 [ERRO][latitude][rpc.data][Pt4F$gCWT$gCWT.fzhc]: cannot write send stream err="frameconn: shutting down"
panic: end task: 1 active child tasks: end task: task still has active child tasks
fixes#348
package trace:
- introduce the concept of tasks and spans, tracked as linked list within ctx
- see package-level docs for an overview of the concepts
- **main feature 1**: unique stack of task and span IDs
- makes it easy to follow a series of log entries in concurrent code
- **main feature 2**: ability to produce a chrome://tracing-compatible trace file
- either via an env variable or a `zrepl pprof` subcommand
- this is not a CPU profile, we already have go pprof for that
- but it is very useful to visually inspect where the
replication / snapshotter / pruner spends its time
( fixes#307 )
usage in package daemon/logging:
- goal: every log entry should have a trace field with the ID stack from package trace
- make `logging.GetLogger(ctx, Subsys)` the authoritative `logger.Logger` factory function
- the context carries a linked list of injected fields which
`logging.GetLogger` adds to the logger it returns
- `logging.GetLogger` also uses package `trace` to get the
task-and-span-stack and injects it into the returned logger's fields
- prior to this patch, context cancellation would leave rpc.Server open
- did not make problems because context was only cancelled by SIGINT,
which was immediately followed by os.Exit
* Add Close() in closeState to identify the first closer
* Non-first closers get an error
* Reads and Writes from the Conn get an error if the conn was closed
during the Read / Write was running
* The first closer starts _separate_ goroutine draining the c.frameReads channel
* The first closer then waits for the goroutine that fills c.frameReads
to exit
refs 3bfe0c16d0fixes#174
readFrames would block on `reads <-`
but only after that would stream.Conn.readFrames close c.waitReadFramesDone
which was too late because stream.Conn.Close would wait for c.waitReadFramesDone to be closed before draining the channel
^^^^^^ (not frameconn.Conn, that closed successfully)
195 @ 0x1032ae0 0x1006cab 0x1006c81 0x1006a65 0x15505be 0x155163e 0x1060bc1
0x15505bd github.com/zrepl/zrepl/rpc/dataconn/stream.readFrames+0x16d github.com/zrepl/zrepl/rpc/dataconn/stream/stream.go:220
0x155163d github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).readFrames+0xbd github.com/zrepl/zrepl/rpc/dataconn/stream/stream_conn.go:71
195 @ 0x1032ae0 0x10078c8 0x100789e 0x100758b 0x1552678 0x1557a4b 0x1556aec 0x1060bc1
0x1552677 github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).Close+0x77 github.com/zrepl/zrepl/rpc/dataconn/stream/stream_conn.go:191
0x1557a4a github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn.func1+0x5a github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:93
0x1556aeb github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn+0x87b github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:176