Gonum – Numerical Computing for Go

Gonum – Numerical Computing for Go(gonum.org)

140 points by dwhitena 8 years ago | 45 comments

How does this play with Go's scheduler? My understanding is that the Go scheduler is not preemptive, and goroutines are switched out at yield point, like the start of a function body. So tight loops that don't call other functions can effectively hog the OS thread until it leaves that loop body (No idea what happens when doing FFI, maybe that's done in a separate thread pool?). For most cases where you would use Go you aren't generally doing a bunch of CPU-bound work so that doesn't matter, but here you might run into some hiccups. I'm specifically thinking of a case where you use this library to do some heavy matrix operations as part of a web service, and those tight loops hog the OS threads and hurt your bandwidth and p90 latency.

My question to the developer: is that issue something you've encountered with this library? If not, did you design the library to periodically yield in tight loops, or am I just completely wrong about the Go scheduler?

anonacct37 8 years ago | |

You're right in that people have seen high p90 latency as a result of things like base64 encoding large blocks.

But one thing to remember is that go inserts gc and pre-emption points at function call sites. So basically as long as a function is occasionally called you're good.

Cgo threading does complicate the matter. My understanding is that cgo calls are done in a threadpool with a larger stack size. I don't know the details about how that threadpool is managed. Not sure if this would help or hurt your concern.

Also, don't forget GOMAXPROCS. There's nothing stopping you from letting the go runtime spin up arbitrarily large number of OS threads.

So it's not an ideal situation, but if you're careful I don't think tight loops are likely to torpedo an otherwise sound go project.

howeman 8 years ago | |

I don't use Gonum with a webserver + large calculations so I can't definitively answer. No one has reported problems, but that could be a lack of usage. One thing though is that matrix multiplication (which is a kernel for higher-level operations) is written in a blocked format, and the code can be pre-empted on any of those blocks, so I wouldn't suspect it's a problem.

openasocket 8 years ago | | |

Yeah, skimming your source it seems most of your loops involve calling some function, and even if that's inlined I believe the Go compiler will put a speculative yield call in there.

I suppose my hypothetical would be an issue if you used a non-Go BLAS implementation, as calling out to C will hog the OS thread. But this is a known issue (e.x. https://www.cockroachlabs.com/blog/the-cost-and-complexity-o...).

sbinet 8 years ago | | |

also, there's been work to make for-loops preemptible:

- https://github.com/golang/go/issues/10958

- https://go-review.googlesource.com/c/go/+/33910

- https://go-review.googlesource.com/c/go/+/36206

- https://go-review.googlesource.com/c/go/+/46410

- https://go-review.googlesource.com/c/go/+/43050

Thaxll 8 years ago | |

I'm not sure to understand that p90 latency problem, the cpu is used somewhere anyway so even if you use another language you won't be able to server a request while doing some intense cpu work?

infogulch 8 years ago | | |

The cpu will pause it to give all threads some cpu time. The difference is that it's the OS doing the work of cleaning up between threads, as opposed to the go runtime pausing and switching. Keeping it all in Go is faster, but it doesn't have the capability to pause, cleanup, and prepare for re-execution in the middle of a block of code that the OS does.

paultopia 8 years ago |

A solid, featureful & performant numerics library seems like a really good match for Go---if it can match numpy but also provide benefits like the safety of types, binary compilation, and better performance in non-numeric code, that's a really exciting case for sliding away from python?

microtonal 8 years ago | |

I have done some numeric programming in Go and compared to Python it's really hampered by the lack of operator overloading.

Of course, it just provides convenience, but it's what makes writing stuff in numpy, Tensorflow, Eigen, etc elegant.

paulsutter 8 years ago | | |

As of reading your comment, I'm 100% convinced that Go needs generics. I'm a longtime Go advocate, love coding in Go, but until now thought the lack of generics is just fine.

Lately I do a lot of numpy/tensorflow, and have begun to really dislike the slowness of python. It would be great to do that work in Go specifically.

Recurecur 8 years ago | | |

I think Julia is a far better candidate for high performance numerics than Go. It's just a better designed language in general, it is already higher performance, and it's far more expressive than Go.

When the Julia AOT compilation story is complete, and it's well along now, Julia should dominate a whole lot of Go use cases...

egl2016 8 years ago |

"By default, blas64 and lapack64 call the native Go implementations of the routines. Alternatively, it is possible to use C-based implementations of the APIs through the respective cgo packages and "Use" functions."

Performance comparison? Algorithmic equivalence? How close are the results numerically (e.g. how do they compare on badly conditioned matrices)?

howeman 8 years ago | |

The algorithms are (basically) equivalent, and are translations from the Fortran (though row major instead of column major). As far as I know there are no major differences in the answers, though for extremely poorly conditioned matrices (1e14 or so) you shouldn't expect consistent answers across any implementation.

The performance story is complex. Typically we're the same speed on small matrices (and using Go is faster if you include the cgo overhead). We currently have significant speed penalties on large matrices (300x300 or so), but Kunde21 is working on assembly kernels for the BLAS functions to close that gap

openasocket 8 years ago | | |

I'm surprised your performance is anywhere near that of standard BLAS implementations. The Golang compiler doesn't have support for explicit SIMD or auto-vectorization, so that's a big performance gain just sitting there.

sbinet 8 years ago |

I have been using Gonum for some time now (also contributed, mostly in the plotting area).

Last summer, I tried an experiment: have a student migrate a little python-based analysis to a Go-based one. The analysis was fitting some cosmological constants out of the so called Hubble diagram.

I was pleased to see that, in the span of 2-3 months, the student who had limited knowledge in programming (a bit of python), managed to pull off the minimization of a 740 supernovae dataset with a 2220x2220 nuisance parameters matrix.

and the run time was 2x faster than the python one (with scipy/minuit for the minimization, so everything in C/C++, really).

success. :)

(and this motivated us to completely switch to Go as a teaching language for our master in particle physics / cosmology.)

Recurecur 8 years ago | |

I suggest you take a look at Julia. I think it's a much better fit for that type of work...

pbnjay 8 years ago |

I can't wait till the Go team starts giving these packages more love from the performance perspective. Now that the compiler has an SSA backend we might start seeing more SIMD and other optimizations, but it's still a ways to go before performance is comparable to bare C libraries for heavy computation and tight inner loops.

brian-armstrong 8 years ago |

Honestly, I don't think that Go is the right language for this. I've used Go quite a lot and it feels like it mostly just gets in your way. You can't really do memory management, which will likely impede performance for numerical work. There's no operator overloading either.

C++, for all its flaws, seems to just generally be a more well-conceived language and more generalist than Go. The only place I really feel like Go works is specifically in the context of moving bytes from one socket to another.

d4l3k 8 years ago | |

What kind of memory management do you need that Go doesn't provide?

It's not too hard to write Go to minimize allocations (and most short lived allocations end up on the stack anyways unlike other languages [1]). If you really need a lot of allocations you can always use https://golang.org/pkg/sync/#Pool to avoid GC overhead.

[1] https://groups.google.com/d/msg/golang-nuts/KJiyv2mV2pU/wdBU...

brian-armstrong 8 years ago | | |

But if you're doing that, you might as well just use a language with RAII, good scoping and unique pointers. A language like... C++

Khanthulhu 8 years ago |

What's the use case for this? Machine learning? But data? General math use?

jbochi 8 years ago | |

Here at The New York Times we are using it to power some of our recommendation algorithms. We are actually training the models with Python and serving them with Go using gonum.

Our library was just open sourced (and still in my personal account, until we add more documentation): https://github.com/jbochi/facts

avyfain 8 years ago | | |

This sounds really cool. Anywhere I could read more about the Python -> Go integration? Or are you just exporting the raw weight matrices?

Khanthulhu 8 years ago | | |

Beautiful. Thanks for the reply

howeman 8 years ago | |

General math use, like numpy/scipy.

optimuspaul 8 years ago |

wonder how this compares to numpy/scipy in terms of features and performance. Looks pretty comprehensive.

howeman 8 years ago | |

We aren't at full feature parity, but we're pretty close. There are some big things we are missing (ODE, FFT), and we have a bunch of things they don't have (statistical distance measures being one example). We are trying to be pure-go, so it's not at simple as providing a wrapper API. Working on it though!

dm319 8 years ago | | |

statistical distance measures? is that like tSNE and similar?