Debugging golang programs

I’ve been writing a lot of golang lately. I’ve hit painful problems in the past. Here are some debugging tips. Hopefully they help you out. I bet you don’t know #2.

#0 Use log.Printf:

This should go without saying, but I’m ashamed to say it’s what I use the most. We’ve only been C programming for 44+ years, and it’s still what is most useful!

#1 Use go run -race:

Since many problems are caused by random races, ensuring you use the built-in tools will help you catch some problems. One of these is a race detector. You can use it by adding the -race flag to the go run, go build or go test commands. In practice, it only ever caught beginner issues, and it hasn’t set off any alarms since. Maybe I need to write more test cases! Maybe you need to write more test cases!

#2 Use Control+Backslash:

Say what? If you press control+backslash, you will cause a core dump. Example:

james@computer:/tmp$ sleep 42h
^\Quit (core dumped)

Obviously there are nicer ways to kill a process (I for one, welcome our robotic overlords) but in times of emergency, use what you’ve got. The interesting thing, is that when you do this to a golang program, you’ll get much more interesting output:

james@computer:~/code/mgmt$ ./mgmt run --file examples/graph0.yaml 
16:19:26 main.go:65: This is: mgmt, version: 0.0.2-6-g6e68d6d
16:19:26 main.go:66: Main: Start: 1455571166809588335
16:19:26 main.go:196: Main: Running...
16:19:26 main.go:106: Etcd: Starting...
16:19:26 etcd.go:132: Etcd: Watching...
16:19:26 configwatch.go:54: Watching: examples/graph0.yaml
16:19:26 etcd.go:159: Etcd: Waiting 1000 ms for connection...
16:19:26 main.go:149: Graph: Vertices(2), Edges(1)
16:19:26 main.go:152: Graphviz: No filename given!
16:19:26 main.go:163: State: graphNil -> graphStarting
16:19:26 main.go:165: State: graphStarting -> graphStarted
16:19:26 file.go:340: File[file1]: Apply
^\SIGQUIT: quit
PC=0x482a53 m=2

goroutine 0 [idle]:
runtime.futex(0xecae00, 0x0, 0x7f6f5dd8fde8, 0x0, 0x0, 0x4828ac, 0x3c, 0x0, 0x43392b, 0xecae00, ...)
    /usr/lib/golang/src/runtime/sys_linux_amd64.s:289 +0x23
runtime.futexsleep(0xecae00, 0x0, 0xdf8475800)
    /usr/lib/golang/src/runtime/os1_linux.go:56 +0xf0
runtime.notetsleep_internal(0xecae00, 0xdf8475800, 0xc820000900)
    /usr/lib/golang/src/runtime/lock_futex.go:171 +0x12b
runtime.notetsleep(0xecae00, 0xdf8475800, 0x0)
    /usr/lib/golang/src/runtime/lock_futex.go:191 +0x6b
runtime.sysmon()
    /usr/lib/golang/src/runtime/proc1.go:3022 +0x4aa
runtime.mstart1()
    /usr/lib/golang/src/runtime/proc1.go:715 +0xe8
runtime.mstart()
    /usr/lib/golang/src/runtime/proc1.go:685 +0x72

goroutine 1 [select]:
main.waitForSignal(0xc820076300)
    /home/james/code/mgmt/main.go:48 +0x5da
main.run(0xc82011c0f0)
    /home/james/code/mgmt/main.go:198 +0x779
github.com/codegangsta/cli.Command.Run(0xb4e9d8, 0x3, 0x0, 0x0, 0xc8200799d0, 0x1, 0x1, 0xb4e9d8, 0x3, 0x0, ...)
    /home/james/code/src/gopath/src/github.com/codegangsta/cli/command.go:127 +0x1052
github.com/codegangsta/cli.(*App).Run(0xc8200a8500, 0xc820074100, 0x4, 0x4, 0x0, 0x0)
    /home/james/code/src/gopath/src/github.com/codegangsta/cli/app.go:159 +0xc2f
main.main()
    /home/james/code/mgmt/main.go:283 +0xced

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1721 +0x1

goroutine 21 [syscall]:
os/signal.loop()
    /usr/lib/golang/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
    /usr/lib/golang/src/os/signal/signal_unix.go:28 +0x37

goroutine 22 [select]:
main.run.func2(0xc82011c0f0, 0xc820122040, 0xc82011a5d0, 0xc82011e130, 0x5, 0xc820076360, 0xc82011e010)
    /home/james/code/mgmt/main.go:110 +0x1061
created by main.run
    /home/james/code/mgmt/main.go:168 +0x60c

goroutine 23 [select, locked to thread]:
runtime.gopark(0xc8a258, 0xc82002e728, 0xb4ec08, 0x6, 0x18, 0x2)
    /usr/lib/golang/src/runtime/proc.go:185 +0x163
runtime.selectgoImpl(0xc82002e728, 0x0, 0x18)
    /usr/lib/golang/src/runtime/select.go:392 +0xa64
runtime.selectgo(0xc82002e728)
    /usr/lib/golang/src/runtime/select.go:212 +0x12
runtime.ensureSigM.func1()
    /usr/lib/golang/src/runtime/signal1_unix.go:227 +0x353
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1721 +0x1

goroutine 7 [select]:
main.(*FileType).Watch(0xc820174180)
    /home/james/code/mgmt/file.go:156 +0x15f4
main.(*Graph).Start.func1(0xc82011e010, 0xc820010b80)
    /home/james/code/mgmt/pgraph.go:558 +0x7e
created by main.(*Graph).Start
    /home/james/code/mgmt/pgraph.go:560 +0x171

goroutine 25 [select]:
main.ConfigWatch.func1(0xc82009db40, 0x14, 0xc820076660)
    /home/james/code/mgmt/configwatch.go:74 +0x13c8
created by main.ConfigWatch
    /home/james/code/mgmt/configwatch.go:153 +0x67

goroutine 26 [sleep]:
time.Sleep(0x3b9aca00)
    /usr/lib/golang/src/runtime/time.go:59 +0xf9
main.(*EtcdWObject).EtcdWatch.func1(0x7f6f5c512c20, 0xc82009db80, 0xc820122040, 0xffffffffffffffff, 0xc820076360, 0xc820072af0)
    /home/james/code/mgmt/etcd.go:160 +0x7db
created by main.(*EtcdWObject).EtcdWatch
    /home/james/code/mgmt/etcd.go:205 +0xc2

goroutine 27 [chan send]:
main.(*EtcdWObject).EtcdChannelWatch.func1(0x7f6f5c512c78, 0xc820122140, 0x7f6f5c512ca0, 0xc8200786d0, 0xc8200766c0)
    /home/james/code/mgmt/etcd.go:109 +0xd9
created by main.(*EtcdWObject).EtcdChannelWatch
    /home/james/code/mgmt/etcd.go:111 +0x7b

goroutine 34 [syscall]:
syscall.Syscall6(0xe8, 0x4, 0xc82018dc24, 0x7, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/lib/golang/src/syscall/asm_linux_amd64.s:44 +0x5
syscall.EpollWait(0x4, 0xc82018dc24, 0x7, 0x7, 0xffffffffffffffff, 0x0, 0x0, 0x0)
    /usr/lib/golang/src/syscall/zsyscall_linux_amd64.go:365 +0x89
gopkg.in/fsnotify%2ev1.(*fdPoller).wait(0xc820146000, 0xc89a00, 0x0, 0x0)
    /home/james/code/src/gopath/src/gopkg.in/fsnotify.v1/inotify_poller.go:85 +0xbc
gopkg.in/fsnotify%2ev1.(*Watcher).readEvents(0xc820158000)
    /home/james/code/src/gopath/src/gopkg.in/fsnotify.v1/inotify.go:179 +0x1af
created by gopkg.in/fsnotify%2ev1.NewWatcher
    /home/james/code/src/gopath/src/gopkg.in/fsnotify.v1/inotify.go:58 +0x315

goroutine 8 [select]:
main.(*NoopType).Watch(0xc82001c230)
    /home/james/code/mgmt/types.go:362 +0x31a
main.(*Graph).Start.func1(0xc82011e010, 0xc820010b60)
    /home/james/code/mgmt/pgraph.go:558 +0x7e
created by main.(*Graph).Start
    /home/james/code/mgmt/pgraph.go:560 +0x171

goroutine 31 [syscall]:
syscall.Syscall6(0xe8, 0x9, 0xc8201f7c24, 0x7, 0xffffffffffffffff, 0x0, 0x0, 0xc82017c440, 0xc82017c420, 0x15)
    /usr/lib/golang/src/syscall/asm_linux_amd64.s:44 +0x5
syscall.EpollWait(0x9, 0xc8201f7c24, 0x7, 0x7, 0xffffffffffffffff, 0x1, 0x0, 0x0)
    /usr/lib/golang/src/syscall/zsyscall_linux_amd64.go:365 +0x89
gopkg.in/fsnotify%2ev1.(*fdPoller).wait(0xc82009dcc0, 0x8000, 0x0, 0x0)
    /home/james/code/src/gopath/src/gopkg.in/fsnotify.v1/inotify_poller.go:85 +0xbc
gopkg.in/fsnotify%2ev1.(*Watcher).readEvents(0xc820090cd0)
    /home/james/code/src/gopath/src/gopkg.in/fsnotify.v1/inotify.go:179 +0x1af
created by gopkg.in/fsnotify%2ev1.NewWatcher
    /home/james/code/src/gopath/src/gopkg.in/fsnotify.v1/inotify.go:58 +0x315

rax    0xfffffffffffffffc
rbx    0x7f6f5dd8fde8
rcx    0x482a53
rdx    0x0
rdi    0xecae00
rsi    0x0
rbp    0x0
rsp    0x7f6f5dd8fdb0
r8     0x0
r9     0x0
r10    0x7f6f5dd8fde8
r11    0x246
r12    0x7ffec7010a4f
r13    0x7f6f5dd90700
r14    0x800000
r15    0x0
rip    0x482a53
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

I find this particularly useful when you have a “stuck” goroutine. Killing the program usually makes it easy to find where everyone was waiting. AFAIK, everything is ordered by most recently used at the top. You’re welcome!

#3 Use a real debugger:

There is a GDB like golang debugger called “delve“. It’s probably something I would use more often, except I haven’t needed that much power, and it still has a number of rough edges. I’m sure if you could help improve that, it would help out the project. There was a talk at FOSDEM about it. If we’re lucky, we’ll eventually get video. I’m “looking forward” to using it more in the future.

Hope this helped,

Happy Hacking!

James

PS: The “other” slash, isn’t called “forward slash”, it’s just called “slash”

PPS: Yes, I say “golang”, no “go”. I think it’s shitty that google basically ripped off the go! name. I guess David lost this one.

Golang parallelism issues causing “too many open files” error

I’ve been hacking in golang for a while, but I’ll admit that I didn’t get too deep into some of the language nuances until more recently. Since some of them have started to bite me, here’s a little post-mortem of one of the problems I was having.

After hacking and testing code all day, I made a seemingly innocuous change, and when running my program, I saw the following error:

2015/07/10 14:34:12 too many open files

I didn’t know what I broke, but it was obviously my fault. I reverted my recent changes, but still the error persisted. Internet searches and many painful hours of debugging ensued.

I had definitely hit some sort of heisenbug.

I had definitely hit some sort of Heisenbug.

What had gone wrong? Digging around my system, I noticed something weird in my ps output:

$ ps auxww | grep go
james     3446  0.0  0.1 197392  9240 pts/4    Sl   11:48   0:00 go run ./main.go
james     3457  0.0  0.0   6268  1980 pts/4    Sl   11:48   0:00 /tmp/go-build030949627/command-line-arguments/_obj/exe/event
james     3487  0.0  0.1 197392  9184 pts/4    Sl   11:49   0:00 go run ./main.go
james     3501  0.0  0.0   6268  2040 pts/4    Sl   11:49   0:00 /tmp/go-build037131602/command-line-arguments/_obj/exe/event
james     3556  0.0  0.1 197392  9168 pts/4    Sl   11:49   0:00 go run ./main.go
james     3567  0.0  0.0   6268  1976 pts/4    Sl   11:49   0:00 /tmp/go-build957487534/command-line-arguments/_obj/exe/event
james     3788  0.0  0.0 197392  1636 pts/4    Sl   Jul04   0:07 go run ./main.go
james     3800  0.0  0.0   5180  1944 pts/4    Sl   Jul04   0:01 /tmp/go-build552106841/command-line-arguments/_obj/exe/event
[...]

Hoards and hoards of lingering go build artefacts, were still running. At one time I noticed over 42 of these! I quickly killed them all off:

# processes are named `event`, and I don't have any unrelated event processes running.
$ killall -9 event
kernel: ahh, much better! :)

Which brought my program back to life! Heisenbug gone… or was it? I soon noticed, that each time I ran my program, the left over process count would increment by one. What was causing this? After another session of debugging, I found that these leftovers were caused by a lack of clean up due to some buggy code. That buggy code is the interesting part. Let’s look at it:

for v := range obj.GetSomeChannel() {
    fmt.Printf("Processing: %v\n", v.Name)
    wg.Add(1)
    // BAD
    go func() {
        defer wg.Done()
        v.Start() // some method
        fmt.Printf("Finished: %v\n", v.Name)
    }()
}

I’m not sure how common this issue is, so if you’re not yet familiar with it, take a moment to try and figure it out.

Okay. The issue is that when you iterate through the loop, the v value which is passed in to the function (the closure) is actually referencing the memory space of v. As a result, whenever the v value changes (as it does in the loop) the v variable instantly contains the new value, and the go routine will see the value of whatever it happens to be when it uses it.

To get around this race (and it is a race) you need to copy in the value to the goroutine:

for v := range obj.GetSomeChannel() {
    fmt.Printf("Processing: %v\n", v.Name)
    wg.Add(1)
    // GOOD
    go func(v *Objtype) {
        defer wg.Done()
            v.Start() // some method
        fmt.Printf("Finished: %v\n", v.Name)

    }(v)
}

It so happens that v is a pointer, but that’s irrelevant. The value of the pointer still needs to be copied in to the goroutine that is being called to use it. In my case, v needs to be a pointer, because we want the same copy of our data to be used throughout the code.

Many thanks to bleidl for helping me with some of the analysis!

As a quick aside, I’m using this WaitGroup pattern, which replaced the much uglier version of this loop which I had previously written. For a language that claims to not be pattern and idiom heavy, there sure are a bunch that I’ve found so far, many of which come with gotchas.

Happy hacking!

James