mirror of
https://github.com/zrepl/zrepl.git
synced 2025-01-02 12:28:51 +01:00
6
Debugging
Christian Schwarz edited this page 2021-02-28 23:50:56 +01:00
Table of Contents
Debugging Setup
- Create 2 VMs, configure a file-backed ZFS pool on each of them
- The VMs should share a private network or a bridge network with the host
- Write yourself some scripting to build the zrepl binary on the host and scp it into the guests
Tasks:
- Reduce network bandwidth between the VMs:
- Use https://github.com/thombashi/tcconfig instead of Linux
tc
tcset eth0 --rate 100Mbps --direction incoming --src-network 192.168.124.233/32 --overwrite
- Use https://github.com/thombashi/tcconfig instead of Linux
- Fake zfs errors
- Create a wrapper shell script and add its director to the zrepl daemon's
PATH
- Example Script:
root@zrepl-dev-debian-2:[~]: cat mockpath/zfs #!/usr/bin/env bash set -eu args=("$@") ZREPL_MOCK_ZFS_PATH=/usr/local/sbin/zfs #if echo "${args[@]}" | egrep 'list.*snapshot' > /dev/null; then # echo "sleeping ${args[@]}" >> /tmp/mocklog # sleep 700 #fi if echo "${args[@]}" | egrep 'recv -s' 2>&1 >/dev/null; then echo foo echo bar 1>&2 sleep 10 dd bs=1M count=1 exit 23 #echo "unenced is blocked to be received by this mock" 1>&2 #exit 23 sleep 1 fi exec "$ZREPL_MOCK_ZFS_PATH" "${args[@]}"
- Create a wrapper shell script and add its director to the zrepl daemon's
Memory Leaks & Goroutine Leaks
Good exapmles for situations where the following instructions helped with debugging:
- goroutine & memory leak caused by not draining a channel is 3bfe0c16d0233cac66a01a6f89959c34ef01c663 ("rpc/dataconn/stream: fix goroutine leaks & transitive buffer leaks")
- follow up ffea0241622e885617d8536c9c2e60c7b2248c28
Instructions
-
Run zrepl with autostarted pprof server on port
:12345
and prometheus endpoint on:22345
- configure prometheus endpoint in config
zrepl pprof listen on :12345
if the daemon is already running (for example when we want to capture a rare deadlock which would be resolved by restarting the daemon)- or
ZREPL_DAEMON_AUTOSTART_PPROF_SERVER=:12345 zrepl daemon
if restarting the daemon is a good idea
- or
-
watch 'curl localhost:22345/metrics | grep -v "#" | grep memstats'
=>go_memstats_heap_inuse_bytes
orgo_memstats_heap_alloc_bytes
should be rising -
watch curl http://localhost:12345/debug/pprof/goroutine?debug=1
9 @ 0x469db0 0x43d5f4 0x43d5ca 0x43d355 0xc767b0 0x498f51 # 0xc767af github.com/zrepl/zrepl/rpc/dataconn/stream.doWriteStream.func1+0x2ff /mnt/zrepl /rpc/dataconn/stream/stream.go:92
=> 9 kept increasing to higher numbers over time => look at
doWriteStream
impl and fix goroutine leak (see commit)