Practical Go benchmarks

Practical Go benchmarks(stackimpact.com)

148 points by minaandrawos 8 years ago | 23 comments

aodin 8 years ago |

The majority of the performance difference between strings concat and builder in your example is explained by memory allocation. Every loop of concat will result in a new allocation, while the builder - which uses []bytes internally - will only allocate when length equals capacity, and the newly allocated slice will be approx. twice the capacity of the old slice (see: https://golang.org/src/strings/builder.go?#L62).

Therefore, 500,000 rounds of concat is about 500,000 allocations, while 200,000,000 rounds of builder is ~ 27.5 allocations (=log2(200000000)).

I would suggest a different benchmark to approximate real world usage:

    func BenchmarkConcatString(b *testing.B) {
        for n := 0; n < b.N; n++ {
            var str string
            str += "x"
            str += "y"
            str += "z"
        }
    }

    func BenchmarkConcatBuilder(b *testing.B) {
        for n := 0; n < b.N; n++ {
            var builder strings.Builder
            builder.WriteString("x")
            builder.WriteString("y")
            builder.WriteString("z")
            builder.String()
        }
    }

Which still shows a significant performance advantage for using builder (-40% ns/op):

    BenchmarkConcatString-4     20000000            93.5 ns/op
    BenchmarkConcatBuilder-4    30000000            54.6 ns/op

marcus_holmes 8 years ago | |

Won't the compiler just ignore the "builder.String()" line unless the return value is actually used?

zaarn 8 years ago | | |

Easy fix, use the anonymous variable;

    _ = builder.String()

The compiler should not optimize that out.

tapirl 8 years ago |

I would mention that, gc (the official Go compiler) makes special optimization for string concatenation operation (+). If the number of strings to be concatenated is known at compile time, using + to concatenate strings is the most efficient.

    package a
    
    import "testing"
    import "strings"
    
    var strA, strB string
    var x, y, z = "x", "y", "z"
    
    func BenchmarkConcatString(b *testing.B) {
        for n := 0; n < b.N; n++ {
            strA = x + y + z
        }
    }
    
    func BenchmarkConcatBuilder(b *testing.B) {
        for n := 0; n < b.N; n++ {
            var builder strings.Builder
            builder.WriteString(x)
            builder.WriteString(y)
            builder.WriteString(z)
            strB = builder.String()
        }
    }

Result:

    goos: linux
    goarch: amd64
    BenchmarkConcatString-2    	20000000	        83.7 ns/op
    BenchmarkConcatBuilder-2   	20000000	       102 ns/op

BeeOnRope 8 years ago | |

Note that this is directly contradicted by another commnent[1] on this post, where three fixed strings are concatenated with +=, yet that was still slower.

Perhaps the use of += as separate statements is the difference, but one would hope that gc wasn't so fragile as to be unable to identify those sequences as identical.

---

[1] https://news.ycombinator.com/item?id=16533650

tapirl 8 years ago | | |

The optimization made by gc is only valid for the form: s0 + s1 + .... + sn.

lugg 8 years ago | | |

I suspect / understood that as being the optimization is based on the fact that those are concatenated references and not string literals.

I don't quite understand why string literals wouldn't be even easier to optimize but there it is.

foota 8 years ago | |

Does Go intern strings? That could mess things up with this bench.

edit: doesn't sound like it does by default

kjksf 8 years ago |

String benchmarks are so broken.

They way he uses b.N is wrong. b.N is different for different loops so he's e.g. timing 100 iterations of string '+' with a 1000 iterations of builder.WriteString()

Also the compiler can completely null out no-op functions (without side effects) so in benchmarks it's a good idea to assign the value being calculated into e.g. a global variable.

The corrected code is: https://gist.github.com/kjk/6a7d7135ae1e5fa6cd1f0db23d2eaf4d

An example of correctly benchmarking:

    func BenchmarkConcatString(b *testing.B) {
	for n := 0; n < b.N; n++ {
		var str string
		for i := 0; i < 100; i++ {
			str += "x"
		}
		gStr = str
	}
    }

After fixes it paints significantly different picture:

    go test -bench=. -benchmem
    goos: darwin
    goarch: amd64
    BenchmarkConcatString-8    	  300000	      5148 ns/op	    5728 B/op	      99 allocs/op
    BenchmarkConcatBuffer-8    	 1000000	      1046 ns/op	     368 B/op	       3 allocs/op
    BenchmarkConcatBuilder-8   	 1000000	      1177 ns/op	     248 B/op	       5 allocs/op

i0exception 8 years ago | |

His use of b.N is correct. The code you have is simply multiplying N by 100 with the inner for loop - so your times are 100x of what each "concat" operation (+,WriteString) takes.

You are also allocating a new string/buffer/builder for every run - which is not useful if you want to just benchmark concat.

dmitrim 8 years ago | |

Thanks for pointing it out. Should clearly not depend on the number of iterations. It's fixed now.

fauigerzigerk 8 years ago | | |

I think there's another bug in the generateSlice function if the intention is to create a slice with n random numbers.

    func generateSlice(n int) []int {
        s := make([]int, n)
        for i := 0; i < n; i++ {
            s = append(s, rand.Intn(1e9))
        }
        return s
    }

As it is now, the function creates a slice with n zeros followed by n random numbers. I suppose you meant to say make([]int, 0, n). You could just as well assign directly to each slice element instead of using append, which would be more efficient.

I made the exact same mistake quite a few times myself.

bpicolo 8 years ago |

While I don't doubt that strings.Builder does is quicker than += concat for many iterations, to make it a fair comparison you probably need to pull out the string at the end rather than just writing to the buffer. It's also not obvious for example what the difference is with just 2 strings to join if I need to join two strings together 40 trillion times or whatnot.

Nice collection of microbenchmarks though. Interesting to see magnitude differences from e.g. regexp compile

Vendan 8 years ago |

Fun fact: the crypto rand "number" benchmark depends on the number you pass into it:

    BenchmarkCryptoRand27-8   	 5000000	       388 ns/op
    BenchmarkCryptoRand28-8   	 3000000	       356 ns/op
    BenchmarkCryptoRand29-8   	 5000000	       335 ns/op
    BenchmarkCryptoRand30-8   	 5000000	       327 ns/op
    BenchmarkCryptoRand31-8   	 5000000	       331 ns/op
    BenchmarkCryptoRand32-8   	 5000000	       322 ns/op
    BenchmarkCryptoRand33-8   	 3000000	       480 ns/op
    BenchmarkCryptoRand34-8   	 3000000	       474 ns/op

for benchmarks like

    func BenchmarkCryptoRand32(b *testing.B) {
        for n := 0; n < b.N; n++ {
            _, err := crand.Int(crand.Reader, big.NewInt(32))
            if err != nil {
                panic(err)
            }
        }
    }

This is because the crypto/rand library is very very careful to give you unbiased random numbers.

friday99 8 years ago |

The string benchmark has the issue that the amount of work done varies with each pass through the loop since the string just keeps getting appended to. A proper benchmark like the ones in the comments here do the same amount of work for every loop.

jossctz 8 years ago |

Note that you can also get the number of bytes processed per second by calling the SetBytes method. This is very useful on some bench (hashing, base64, ...):

  func benchmarkHash(b *testing.B, h hash.Hash) {
  	data := make([]byte, 1024)
  	rand.Read(data)  
  
  	b.ResetTimer()
  	b.SetBytes(len(data))
  	for n := 0; n < b.N; n++ {
  		h.Write(data)
  		h.Sum(nil)
  	}
  }

pbnjay 8 years ago |

> The following benchmarks evaluate various functionality with the focus on real-world usage patterns.

I can't say I write much code that does one thing many times in a really tight loop. It would be a lot more interesting if the code combined multiple functions into the loop body in a better attempt to simulate "real-world usage patterns."

dmitrim 8 years ago | |

Good point, thanks! The idea behind these benchmarks is to make the results usable in real-world programs, rather than benchmarking real-world programs. I rephrased that sentence to avoid any confusion.

antoaravinth 8 years ago |

I always wanted to ask this. I'm a full stack developer with good knowledge on Java and JavaScript. I'm currently reading Golang especially for its concurrency idioms. It is good and easy to write concurrent code but people always come and say about actors which are very good when compared with channels. I have never used actors before.. Whats your thoughts on this?

majewsky 8 years ago |

Even though this is clearly a benchmarking game, I don't like that it does not explain how the things benchmarked against each other sometimes have drastically different usecases.

I can assure you that someone is going to use these numbers to argue that crypto.Rand needs to be replaced by math.Rand BECAUSE SPEED, or that MD5 should be preferred over SHA2/3.

Xeoncross 8 years ago |

It's worth noting that the first number in a benchmark result is how many loops (for n := 0; n < b.N) that Go used to find the results.

The nanoseconds, bytes, and allocs per operation are the important part.

func BenchmarkConcatString(b *testing.B) { for n := 0; n < b.N; n++ { var str string str += "x" str += "y" str += "z" } } func BenchmarkConcatBuilder(b *testing.B) { for n := 0; n < b.N; n++ { var builder strings.Builder builder.WriteString("x") builder.WriteString("y") builder.WriteString("z") builder.String() } }

package a import "testing" import "strings" var strA, strB string var x, y, z = "x", "y", "z" func BenchmarkConcatString(b *testing.B) { for n := 0; n < b.N; n++ { strA = x + y + z } } func BenchmarkConcatBuilder(b *testing.B) { for n := 0; n < b.N; n++ { var builder strings.Builder builder.WriteString(x) builder.WriteString(y) builder.WriteString(z) strB = builder.String() } }

go test -bench=. -benchmem goos: darwin goarch: amd64 BenchmarkConcatString-8 300000 5148 ns/op 5728 B/op 99 allocs/op BenchmarkConcatBuffer-8 1000000 1046 ns/op 368 B/op 3 allocs/op BenchmarkConcatBuilder-8 1000000 1177 ns/op 248 B/op 5 allocs/op

BenchmarkCryptoRand27-8 5000000 388 ns/op BenchmarkCryptoRand28-8 3000000 356 ns/op BenchmarkCryptoRand29-8 5000000 335 ns/op BenchmarkCryptoRand30-8 5000000 327 ns/op BenchmarkCryptoRand31-8 5000000 331 ns/op BenchmarkCryptoRand32-8 5000000 322 ns/op BenchmarkCryptoRand33-8 3000000 480 ns/op BenchmarkCryptoRand34-8 3000000 474 ns/op

func BenchmarkCryptoRand32(b *testing.B) { for n := 0; n < b.N; n++ { _, err := crand.Int(crand.Reader, big.NewInt(32)) if err != nil { panic(err) } } }

func benchmarkHash(b *testing.B, h hash.Hash) { data := make([]byte, 1024) rand.Read(data) b.ResetTimer() b.SetBytes(len(data)) for n := 0; n < b.N; n++ { h.Write(data) h.Sum(nil) } }