Go+: Go designed for data science

Go+: Go designed for data science(goplus.org)

199 points by angrymouse 5 years ago | 170 comments

yashap 5 years ago |

Go is a great language, but it seems terribly suited to data science. The popular data science languages are Python, R, Julia, and to a lesser extent Scala. They’re all extremely flexible languages, where you can easily write high level abstractions/DSLs, and they all have very strong functional programming support, because data science tends to be extremely functional. They also tend to be very concise languages.

Go is at the complete opposite end of the spectrum - not flexible at all, it’s purposefully difficult and awkward to write high level abstractions/DSLs, there’s very poor functional programming support, and it’s very verbose. There are great reasons for these restrictions, they’re intentional design decisions, but they also make it a very poor fit for data science IMO.

sabellito 5 years ago | |

Not trying to start anything, but what's functional about Python? It doesn't have/support tail recursion, a strong type system, pattern matching, immutability-by-default for lists and dictionaries.

From where I'm standing, python has some features that kinda look like functional programming concepts, but overall is an OO imperative language, like Ruby and many others.

My understanding for its preference from the DS community is due more for its library support in that domain.

c3534l 5 years ago | | |

> strong type system, pattern matching, immutability-by-default for lists and dictionaries

As a side note, its really interesting just how much the popular conception of "functional" has changed. 10 years ago, I don't think anyone would have listed any of those as being important or suggestive of functional programming. Nowadays, "functional" means "like Haskell" instead of "like Lisp." I think we need to be careful when we talk about functional programming because so many ideas have jumped the paradaigm and it means so many different things to different people.

daturkel 5 years ago | | |

Python is not fully a functional programming language but it supports some functional patterns. There's a nice mini-ebook by David Mertz on functional programming in Python, and it used to be freely available but I can't find it at the moment. However, he wrote an article version here: https://developer.ibm.com/languages/python/articles/l-prog/

Also, pattern matching is coming to python in 3.10. You can read about it here: https://www.python.org/dev/peps/pep-0634/

crazypython 5 years ago | | |

> a strong type system

It's a myth that dynamic languages can't have strong types. Python aborts almost immediately whenever it can. For instance, adding a number to a string? Exception. Accessing undefined properties?

Furthermore there's a language-standard static type checker, mypy.

> pattern matching

We have that in Python 3.10.

> immutability-by-default for lists and dictionaries

We do have tuples and frozendict.

Arguably its implementations of functional features are much weaker than "truly" functional ones such as Lisp, Haskell, OCaML or F#.

Nican 5 years ago | | |

I think the appeal is with Jupyter [1] notebooks. Python is not about performance. Usually numpy (or other libraries) that does the heavy lifting on another language anyway.

But having the Jupyter notebooks allows for intractability with the data. Make changes, and see how it affects every step after it.

[1] https://jupyter.org/

yashap 5 years ago | | |

- map/reduce/filter/for-comps in the standard library. Go doesn't support this style of programming, and because of the lack of generics, you can't write generic data structures with these types of methods either. It's all loops and mutation in Go

- first class functions. Go does have these

- concise lambda syntax, that makes them nice/easy to use. Go has first class functions, but a very verbose/awkward lambda syntax

- can easily create your own generic data structures with functional interfaces (can't do this in Go b/c no generics)

- Python is pretty strongly typed, and if you meant statically typed, there's now optional static type checking in Python, similar to TypeScript (not as robust/well implemented though)

- Python has decent immutability support. For example, dataclasses (https://docs.python.org/3/library/dataclasses.html) with frozen=True are a lot like immutable classes in more purely functional languages (i.e. case classes in Scala). Tuples and named tuples. There are libs out there for frozen (a.k.a. immutable) dicts, lists, etc.

- Python is about to get pattern matching in 3.10

- functools (https://docs.python.org/3/library/functools.html)

- etc.

You can absolutely use Python in a very mutable-OO style, but it also has pretty good functional programming support. If you look at most Python data science code, it's written pretty functionally.

I'd say most important for data science applications is the ability to create generic data structures with functional interfaces - you can't do this in Go, makes it really awkward to write a lot of the foundational vector, data frame, etc. libraries, that basically all higher level data science libs depend on.

quixoticelixer- 5 years ago | | |

Functional languages don't need a strong type system

tmpz22 5 years ago | |

IDK if its Go's problem honestly. Data modeling is hard. Its hard for a reason. If a language like python makes it seem easy, its still hard but your perception and attitude towards it has changed because some of the busy work has been taken out of it - possibly in a way that costs you down the road.

Let's be honest programming languages are the punching bags of developers.

teleforce 5 years ago | |

There are mainly two types of data scientists, A and B [1].

Those B types are probably want to use Go for building data analytics pipeline similar to Pachyderm[2]. If you want to go the way of the compiled language for data science and numerical analysis the best bet now is probably Fortran. The fact that Swift for Tensorflow project was started and terminated recently really showed that there is a need for a proper and modern compiled language for data science and numerical analysis.

There is, however, a dark horse in the data science and numerical analysis in the programming languages race that perhaps can satisfy both type A and B data scientists. The dark horse is D language. It supports functional, object oriented, borrow checker, inline assembler, REPL, metaprogramming, CTFE, open and multi-methods, just to name several modern features suitable for data science and numerical analysis but admittedly the eco-system is rather poor as of now (e.g. no library for Arrow). It also very fast to compile and run even with GC (the GC is also configurable) and you can selectively opt out for no GC inside the same code base if blazing speed is your things.

But the glimpse of what it is capable of are there already albeit still in infancy compared to the mature languages like Matlab, R or Fortran [3][4]. But hey, Rome was not built in a day.

[1]https://www.quora.com/What-is-data-science/answer/Michael-Ho...

[2]https://www.pachyderm.com/

[3]https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-data...

[4]http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

pjmlp 5 years ago | | |

That need is fulfilled by languages like Fortran, which is quite modern with OOP and generics, the age of punch cards is long gone.

Or HPC languages like Chapel.

Not only they are compiled, they offer first class support for distributed HPC and GPGPU computing.

Go is nowhere close to offer such capabilities.

zwaps 5 years ago | | |

Why not Julia?

tapirl 5 years ago | |

> Go is at the complete opposite end of the spectrum - not flexible at all,

You must be kidding. Go is the flexible one (not one of) in static popular languages. It is even more flexible than many dynamic languages. It supports function types as first-class citizen, closures, value methods as functions, type methods as functions, type deduction, .... IMHO, the main sell point of Go is not simplicity, but overall balance and flexibility: https://github.com/go101/go101/wiki/The-main-sell-point-of-G...

> there’s very poor functional programming support,

This is true currently, but this is not caused by lack of flexibility, it is caused by lack of custom generics instead.

yashap 5 years ago | | |

Fair enough, flexible is an extremely loose term. I was referring mostly to the ability to a language that's flexible enough to let library/tool authors create their own very high level abstractions and DSLs. In Go, lack of custom generics often makes this very difficult. You look at the kind of APIs offered by mega-popular data science toolkits like pandas and Spark, it's really hard to offer something similar in Go. You end up with a lot of inferface{} types everywhere, vectors/series/whatever carrying their type in a struct field, etc.

joppy 5 years ago |

The first things I would look for in a data science language are multidimensional arrays, linear algebra packages, data frame and time series libraries ... none of which feature on this page.

fractionalhare 5 years ago | |

Yeah I'm confused. The only "data science" I can see here is the the title.

How is list comprehension a data science primitive? How did this get over 4,000 stars on GitHub with a glaring lack of basic data science functionality? Is this used by actual practitioners?

fixIt83 5 years ago | | |

GitHub stars are bookmarks for me, not an indicator of usefulness.

It does say it’s under heavy development.

Maybe 4.3k+ GitHub users just want to make sure they get updates?

wener 5 years ago | | |

Chinese based github project, the stars mostly are hyped.

kvnhn 5 years ago | |

Thank you! I've seen this language/extension/library pop up a few times and I don't see, even remotely, how it could displace the Python data science stack. The biggest competitor to Python in this space, IMO, is Julia. Go+ seems light-years behind, and heading in the wrong direction entirely.

edumucelli 5 years ago | | |

R is the competitor, actually many of things in the Python data stack are directly copied from R: seaborn's ~ operator, dataframe, ...

Abishek_Muthian 5 years ago | |

Apart from Gonum[1] numerical libraries, I haven't found specific data science related Go libraries in my search for it for some hobby projects when compared to Python ecosystem.

Interestingly Prose[2] A Go library for text processing yielded better results for named-entity extraction when compared to NLTK in my tests in terms of accuracy and obviously performance.

Perhaps Go is not being applied enough in the Data Science/ML and for fields where it's applied (Network) Math in the standard library seems to be sufficient.

[1] https://github.com/gonum/gonum

[2] https://github.com/jdkato/prose

aldanor 5 years ago | |

Yea, my list would also be:

- ndim arrays with broadcasting

- time series

- plotting

- linalg: blas/mkl

- storage - hdf5, zarr, arrow, parquet, netcdf

I don't see any of those either in go+.

carbocation 5 years ago |

I write a lot of Go, and I spend most of my time doing analysis (usually in R, occasionally in python). I'm interested to understand whether there was a specific motivating example that drove the creation of this new go-like language.

This is Hacker News, so there definitely doesn't need to be anything beyond "I could, so I did." But if this actually solves some problem better than existing solutions, it would be cool to read about. Edit: Without a motivating example, it's hard to imagine that people will want to pickup a Go-like (but not exactly Go) language for data science.

iujjkfjdkkdkf 5 years ago | |

> Without a motivating example, it's hard to imagine that people will want to pickup a Go-like (but not exactly Go) language for data science

Exactly. I use almost exclusively python (including for data science- or ML really). I've been wanting an excuse to learn Go by doing a project with it. But learning some third Go-like language would be a tougher sell for me, unless there is really something it does better than python, because it still doesnt give me the benefit of learning Go.

But like someone else said, "because you can" is usually a good enough reason to build or learn a new language, so I'm sure it's still worth it for many.

CameronNemo 5 years ago | |

If you are looking to learn a language specific to data science, Julia is fairly mature.

resonantjacket5 5 years ago | |

It seems it compiles down to golang usually? As in the read me they run ``` gop go tutorial/ # Convert all Go+ packages in tutorial/ into Go packages go install ./... ```

It's more like typescript for javascript than a completely separate language.

MrPowers 5 years ago |

Go has a ton of potential in the data science space.

A basic DataFrame library would go a long way. Doesn't have to be as full featured as Pandas. Just something that's maintainable and portable.

I wrote a blog post a few months ago on the current Go DataFrame libraries (gota, qframe, dataframe-go): https://mungingdata.com/go/dataframes-gota-qframe/. None of the current offerings are integrated with Arrow.

An Arrow-backed Go DataFrame library that can read / write Parquet files could really jumpstart data science in Go (really data engineering in Go, which is where they should probably focus first).

kzrdude 5 years ago |

The env gop run shebang line is not posix-compliant; posix only requires support for a single argument in the shebang and this one has two arguments (gop run).

</irrelevant unix nerd mumbling>

nine_k 5 years ago | |

POSIX is thirty three years old.

Can we please consider certain modest improvements?

10000truths 5 years ago | | |

The latest revision of the POSIX standard is only 4 years old.

m45t3r 5 years ago | |

There is env -S that supports multiple arguments. This was always an extension available in BSD I think, and it is available now in recent versions of GNU's env.

d110af5ccf 5 years ago | | |

GNU Coreutils env supports the -S option as of v8.30. Ubuntu 18.04 LTS appears to be on v8.28, but 19.04 supports it. (https://stackoverflow.com/q/4303128)

(Also, I didn't think the shebang was specified by POSIX at all? Am I wrong?)

iagovar 5 years ago |

If anyone is looking for an alternative to R or python, there's Julia already.

sundarurfriend 5 years ago | |

If this project can maintain Go's fast compile times and ability to make reliable, concice binaries, those would be two big pluses in areas where Julia is currently weak. That would make this a good choice in projects where those are high priorities.

mountainriver 5 years ago | |

I think what people want is all the great things Go brings to the table but just geared a little more towards data work.

Julia offers a lot in the data world and not much in the engineering world.

CameronNemo 5 years ago | |

Also Rust has a good datagrams library now, polars. Not as mature an ecosystem as Julia, but hopefully it improves in the future.

hu3 5 years ago | | |

I find Rust's borrow checker too clunky for exploratory work. It breaks my flow and imposes higher cognitive load. The slow compiler doesn't help either.

micro_cam 5 years ago |

I used to do a lot of machine learning code in go and think it has great potential as a compiled, static language with similar ease of development to python.

However it is hard to get around the lack of operator overloading and (to a lesser extent at least to me) generics. I love the simplicity of the language and understand their feeling that operator overriding is too often abused but at the same time not being able to use algebraic operators for matrix and tensor libraries makes them really hard to use.

The compacting garbage collector can also make it hard to pass pointers to memory to non go libraries which is key in data science.

If this project could address those things I think it could have real potential

dunefox 5 years ago |

There are quite a few languages I would like to use before Go. Especially F# seems very interesting for DS.

bachmeier 5 years ago | |

Even been wanting for some time to check out the F# R Type Provider: http://bluemountaincapital.github.io/FSharpRProvider/ Unfortunately it appears to be Windows-only, and my curiosity hasn't yet reached the point that I'd boot into Windows.

dunefox 5 years ago | | |

Sadly, it also doesn't work with .net core yet. Otherwise this would be a pretty convincing point for F#.

great_reversal 5 years ago |

Why can't you just build libraries to make Go a better language for data science? There's already Go support for a Jupyter Notebooks kernel: https://github.com/gopherdata/gophernotes

srer 5 years ago | |

We could build such libraries, and people have built some.

However the task at it's heart is a vast duplication of work, and while Go has a lot of things going for it, it doesn't seem enough to sway many data scientists into reinventing their wheels in Go.

I don't blame them. Rewrites being difficult to justify or motivate when you already have a compelling implementation is part of the reason why we have significant amounts of FORTRAN77 code still kicking around today. It is also why for many things we opt to just write wrappers around existing C libraries to call them from other languages.

It has many shortcomings, but overall I prefer the sharing of a library across languages, each with it's own bindings that can attempt to make it more idiomatic to that specific language. The Go culture/community doesn't favor this approach, the Python community embraces it.

umvi 5 years ago |

I just barely picked up Go, and my first impression is that it's very... opinionated.

It wants me to do if/else guards a certain way, you have to capitalize first letters of "exported" functions, it won't let me import `fmt` unless I use it, etc. I'm not sure I like it.

philosopher1234 5 years ago | |

The opinions are by design. By removing flexibility, you can increase uniformity. Instead of having 12 different styles of code, there can be 1. It removes cognitive load, so you can spend your mental energy on solving problems.

umvi 5 years ago | | |

And that's fine if go were the only language I ever used, but it's jarring going from unopinionated languages to a highly opinionated one. I have to have a special set of "go rules" in my mind to be sure to follow when using go which has the effect of increasing cognitive load. "oh right, go wants me to compress my if else clauses and put the brackets a certain way"

jy3 5 years ago | |

I hope you have the presence of mind to realize the amazing benefit this has for the entire Go codebase in existence.

amelius 5 years ago |

Nobody in data science wants fragmentation. Therefore, any aspiring new platform would need to bring some serious benefits to the table. I'm not sure what they are here.

unreliableNar8r 5 years ago |

I wish them all the best in this but it seems like an uphill battle, and doesn't seem to have a clear use case to me. For lightweight to medium projects R and Python are so well supported it's hard to reject them as the null. If you're doing exploratory stuff and want visuals, it's the same story with Rmd and Jupyter. For more behind-the-scenes production pipeline stuff there is already Scala which has inroads with Spark. If you really want to use something new, Julia is starting to mature and has all sort of plotting and linear algebra support. To me it seems Go would aim more to compete with Scala I suppose? I suppose then it might come down to plotting.

In terms of being a general-purpose DS language, I can't imagine using anything that doesn't have a clear strategy to A) get a dataset into a DataFrame or similar, B) get my collaborators a plot in a way that is quick and easy, and C) a lesser extent, some kind of notebook/reporting tool.

They do say there is a lot of development going on but it seems like a space with a lot of great incumbents and a rapidly maturing up-and-comer in Julia.

edit: typo

tpmx 5 years ago |

Seems like there's a potential trademark risk if Google decides it wants to protect the Go trademark.

https://news.ycombinator.com/item?id=20023137

daemonk 5 years ago |

There are a lot of numerical structures missing from this. Not sure if you can really advertise it as for data science without some kind of dataframe structure.

fractionalhare 5 years ago | |

DataFrames? It doesn't even seem to have specialized array primitives like Series or NumPy. What the heck?

chartpath 5 years ago |

Agree with all the comments like "where are the nd arrays?"

BUT, they have list comprehensions!! One of the main things I miss coming from Python.

conradludgate 5 years ago | |

Once generics get introduced into the language, you can write a generic map function which can take a []T and a func(T) U to return a []U. While it's not as elegant as a list comprehension, it's nicer than writing a for loop every time. Although, I can't remember what the performance impact of closures are in go, so this might not be a cheap operation.

nemo1618 5 years ago | | |

In practice, no one will do this, unless there happens to be a function with the correct signature already available. The lambda syntax is so verbose that it's easier to just write the for loop.

Another problem is that tons of Go functions return (value, error), and it's not clear how such functions should interact with a "map" function. Return all the errors in a separate slice? Stop at the first error? What if you only want to stop when the error is io.EOF? etc.

I think we'll only see map/filter/reduce if the language is changed to specifically accommodate them. I've experimented with doing this myself, which people tend to view as heresy: https://twitter.com/lukechampine/status/1367279449302007809?...

jhgb 5 years ago | | |

Why not something like a channel map? Give it a channel and a function and you get another channel with a goroutine running in the background.

quixoticelixer- 5 years ago |

Oh jesus christ fuck no

Ambix 5 years ago |

Really cool thing! When it will be ready for production use?

InvOfSmallC 5 years ago |

What's the point?

JediPig 5 years ago |

not even half baked. its a webpage with a single feature that is broken.

donutloop 5 years ago |

Many of these features should be part of the upcoming GO 2

dm319 5 years ago |

When I realised I couldn't divide a time period by a number or integer, the penny dropped that different languages excel at different things.

icholy 5 years ago | |

Are you talking about `time.Duration`? Because you can definitely do that.

dm319 5 years ago | | |

Has that changed? I couldn't before!

robbyt 5 years ago | | |

But don't try to use BC dates with time.Duration, because they don't work!