Practical Go benchmarks(stackimpact.com) |
Practical Go benchmarks(stackimpact.com) |
Therefore, 500,000 rounds of concat is about 500,000 allocations, while 200,000,000 rounds of builder is ~ 27.5 allocations (=log2(200000000)).
I would suggest a different benchmark to approximate real world usage:
func BenchmarkConcatString(b *testing.B) {
for n := 0; n < b.N; n++ {
var str string
str += "x"
str += "y"
str += "z"
}
}
func BenchmarkConcatBuilder(b *testing.B) {
for n := 0; n < b.N; n++ {
var builder strings.Builder
builder.WriteString("x")
builder.WriteString("y")
builder.WriteString("z")
builder.String()
}
}
Which still shows a significant performance advantage for using builder (-40% ns/op): BenchmarkConcatString-4 20000000 93.5 ns/op
BenchmarkConcatBuilder-4 30000000 54.6 ns/op _ = builder.String()
The compiler should not optimize that out. package a
import "testing"
import "strings"
var strA, strB string
var x, y, z = "x", "y", "z"
func BenchmarkConcatString(b *testing.B) {
for n := 0; n < b.N; n++ {
strA = x + y + z
}
}
func BenchmarkConcatBuilder(b *testing.B) {
for n := 0; n < b.N; n++ {
var builder strings.Builder
builder.WriteString(x)
builder.WriteString(y)
builder.WriteString(z)
strB = builder.String()
}
}
Result: goos: linux
goarch: amd64
BenchmarkConcatString-2 20000000 83.7 ns/op
BenchmarkConcatBuilder-2 20000000 102 ns/opPerhaps the use of += as separate statements is the difference, but one would hope that gc wasn't so fragile as to be unable to identify those sequences as identical.
---
I don't quite understand why string literals wouldn't be even easier to optimize but there it is.
edit: doesn't sound like it does by default
They way he uses b.N is wrong. b.N is different for different loops so he's e.g. timing 100 iterations of string '+' with a 1000 iterations of builder.WriteString()
Also the compiler can completely null out no-op functions (without side effects) so in benchmarks it's a good idea to assign the value being calculated into e.g. a global variable.
The corrected code is: https://gist.github.com/kjk/6a7d7135ae1e5fa6cd1f0db23d2eaf4d
An example of correctly benchmarking:
func BenchmarkConcatString(b *testing.B) {
for n := 0; n < b.N; n++ {
var str string
for i := 0; i < 100; i++ {
str += "x"
}
gStr = str
}
}
After fixes it paints significantly different picture: go test -bench=. -benchmem
goos: darwin
goarch: amd64
BenchmarkConcatString-8 300000 5148 ns/op 5728 B/op 99 allocs/op
BenchmarkConcatBuffer-8 1000000 1046 ns/op 368 B/op 3 allocs/op
BenchmarkConcatBuilder-8 1000000 1177 ns/op 248 B/op 5 allocs/opYou are also allocating a new string/buffer/builder for every run - which is not useful if you want to just benchmark concat.
func generateSlice(n int) []int {
s := make([]int, n)
for i := 0; i < n; i++ {
s = append(s, rand.Intn(1e9))
}
return s
}
As it is now, the function creates a slice with n zeros followed by n random numbers. I suppose you meant to say make([]int, 0, n). You could just as well assign directly to each slice element instead of using append, which would be more efficient.I made the exact same mistake quite a few times myself.
Nice collection of microbenchmarks though. Interesting to see magnitude differences from e.g. regexp compile
BenchmarkCryptoRand27-8 5000000 388 ns/op
BenchmarkCryptoRand28-8 3000000 356 ns/op
BenchmarkCryptoRand29-8 5000000 335 ns/op
BenchmarkCryptoRand30-8 5000000 327 ns/op
BenchmarkCryptoRand31-8 5000000 331 ns/op
BenchmarkCryptoRand32-8 5000000 322 ns/op
BenchmarkCryptoRand33-8 3000000 480 ns/op
BenchmarkCryptoRand34-8 3000000 474 ns/op
for benchmarks like func BenchmarkCryptoRand32(b *testing.B) {
for n := 0; n < b.N; n++ {
_, err := crand.Int(crand.Reader, big.NewInt(32))
if err != nil {
panic(err)
}
}
}
This is because the crypto/rand library is very very careful to give you unbiased random numbers. func benchmarkHash(b *testing.B, h hash.Hash) {
data := make([]byte, 1024)
rand.Read(data)
b.ResetTimer()
b.SetBytes(len(data))
for n := 0; n < b.N; n++ {
h.Write(data)
h.Sum(nil)
}
}I can't say I write much code that does one thing many times in a really tight loop. It would be a lot more interesting if the code combined multiple functions into the loop body in a better attempt to simulate "real-world usage patterns."
I can assure you that someone is going to use these numbers to argue that crypto.Rand needs to be replaced by math.Rand BECAUSE SPEED, or that MD5 should be preferred over SHA2/3.
The nanoseconds, bytes, and allocs per operation are the important part.