Profiling In Go
It is possible to write slow programs despite increasingly powerful computers and profiling helps you identify slow sections of your program.
Assuming we have two functions that calculate the square of their index and adds the result to a slice, how do we know which is faster ?
func CalculateSquares(n int) []int {
squares := make([]int, n)
for i := 1; i <= n; i++ {
squares[i-1] = i * i
}
return squares
}
func CalculateSquaresMath(n int) []int {
squares := make([]int, n)
for i := 1; i <= n; i++ {
squares[i-1] = int(math.Pow(float64(i), 2))
}
return squares
}
We can use benchmarking to know which of the two is faster.
func BenchmarkCalculateSquaresMain(b *testing.B) {
for i := 0; i < b.N; i++ {
CalculateSquares(10)
}
}
goos: darwin
goarch: arm64
pkg: demo1
BenchmarkCalculateSquaresMain-10 55957647 21.29 ns/op 80 B/op 1 allocs/op
PASS
ok demo1 2.183s
func BenchmarkCalculateSquaresMath(b *testing.B) {
for i := 0; i < b.N; i++ {
CalculateSquaresMath(10)
}
}
goos: darwin
goarch: arm64
pkg: demo1
BenchmarkCalculateSquaresMath-10 11432787 105.4 ns/op 80 B/op 1 allocs/op
PASS
ok demo1 2.691s
From the benchmarking results we can see that CalculateSquaresMath
is slower than CalculateSquaresMain
but we don’t know why. We’ll have to use pprof to get better insight.
First we need to create cpu and memory profiles using the command below
go test -run='^$' -bench='BenchmarkCalculateSquaresMath' -cpuprofile='cpu.prof' -memprofile='mem.prof'
The use pprof
’s top method to get a high level view of where the function is spending most of its time.
go tool pprof cpu.prof
(pprof) top
Showing nodes accounting for 1110ms, 81.02% of 1370ms total
Showing top 10 nodes out of 100
flat flat% sum% cum cum%
270ms 19.71% 19.71% 270ms 19.71% math.archModf
170ms 12.41% 32.12% 740ms 54.01% math.pow
170ms 12.41% 44.53% 170ms 12.41% runtime.kevent
160ms 11.68% 56.20% 180ms 13.14% math.Frexp (inline)
100ms 7.30% 63.50% 100ms 7.30% runtime.pthread_cond_wait
80ms 5.84% 69.34% 80ms 5.84% math.IsNaN (inline)
50ms 3.65% 72.99% 130ms 9.49% runtime.mallocgc
40ms 2.92% 75.91% 40ms 2.92% runtime.pthread_cond_timedwait_relative_np
40ms 2.92% 78.83% 40ms 2.92% runtime.usleep
30ms 2.19% 81.02% 900ms 65.69% demo1.CalculateSquaresMath
From the results we see that most of the time is spent in the math.pow
function. Armed with this knowledge we can go ahead and optimise this part.