From Python to Elixir Machine Learning

From Python to Elixir Machine Learning(thestackcanary.com)

270 points by manusachi 2 years ago | 88 comments

Something not mentioned is Explorer (https://hexdocs.pm/explorer/Explorer.html) which is a very nice dataframe library backed by Rusts Polars https://www.pola.rs/ with great integrations for Livebook https://livebook.dev/.

el_oni 2 years ago | |

I found explorer quite frustrating. I've used polars in python and loved it, but I brought in some financial data and couldn't strip off "£" from the start of a string so I could go on to cast it to a number.

As far as I could tell I would have to bring that data into elixir, do the text processing and put it back into explorer, which to me defeated the whole point of a dataframe library.

I imagine it's good for precleaned data, using it with the built in datasets has been fine

josevalim 2 years ago | | |

José from the Nx/Explorer team here.

Feel free to open up an issue! We have been focused more on high-level features (such as integration with S3, Postgres, Snowflake, SQLite, etc) and therefore we are missing many functions that already exist on Polars. Good news is that it is very quick to add them, so just let us know. :)

bongobingo1 2 years ago | | |

Seems that for that case you would need to use `mutate_with`, which as you say does pass data between systems.

My understanding is you can use any Series operations without penalty (ie. they get passed into some rust NIF call), https://hexdocs.pm/explorer/Explorer.Series.html#functions-s..., which does include whitespace trimming, but not arbitrary strings, but I imagine it wouldn't be too much of a jump to add arbitrary constant strings. Might just need to expose `str.slice` or `str.replace`.

The docs do imply that `mutate_with` operates lazily, so you only pay the transfer cost once per row, no matter how many mutations you're applying, but whether that's performant enough depends would be case by case.

ritchie46 2 years ago | | |

You can use str.slice or str.extract to clean the data:

https://pola-rs.github.io/polars/py-polars/html/reference/ex...

thibaut_barrere 2 years ago |

Don't miss the interesting book (which I just bought) in beta here:

https://pragprog.com/titles/smelixir/machine-learning-in-eli...

ricketycricket 2 years ago | |

Recommendation seconded. I'm only to chapter three due to time constraints, but it is well written and approachable for someone lacking ML experience.

pawelduda 2 years ago |

I use Elixir for a long time and wouldn't recommend it for ML over Python. Yes, maybe Elixir has some advantages and solutions for pain points because of VM architecture and other subjective reasons but doesn't come close to what Python offers in terms of tooling, support, community. Also, if you juggle more complex data in Elixir, it's something that you need to get used to coming from Python, it will be much much different and therefore harder to grok for someone not used to the style.

You can always make Elixir app talk to Python ML backend and get the best of both worlds if you desire.

bcardarella 2 years ago | |

Nx solves all of the issues you're citing. That is how you do ML in Elixir

dpflan 2 years ago | |

Have you used Numerical Elixir (Nx)?

pawelduda 2 years ago | |

I can no longer edit parent post so to further elaborate on tooling: even if Elixir was comparable to Python (let's assume it's true as of today), there is a lot going on in AI, so that won't necessarily be true tomorrow. There is no doubt Elixir will always be slower to catch up to recent development because of smaller community. This also applies to how quickly issues will be resolved as they appear. It's likely OK if you're toying with a hobby ML project on your own, but not something you'd want in a startup with stakeholders who have expectations in regards to timeline, and so on...

I like Elixir for web development otherwise, it is a much more stable domain so above doesn't apply (although I've seen some claim otherwise, which is telling how much more of an issue it would be for niche ML use case).

I'd be very happy to be proven wrong by some case studies of how companies leveraged Elixir in real ML projects and concluded it is superior to Python.

lawn 2 years ago | | |

If that reasoning would hold true then there would never be a new best/most popular language for a specific domain, but there are many counter examples.

And just as Elixir is (in my opinion) preferable to Python for web development, it's possible that the same may happen with AI.

pmarreck 2 years ago | | |

your reasoning is basically "it's not popular enough"

your flaw of reasoning can be trivially pointed out simply by explaining that once upon a time, Python was NOT "the language for machine learning". Essentially, NO "X is the solution for Y" started out that way. Which is why appeals to popularity are a fallacy.

nesarkvechnep 2 years ago | |

So you wouldn’t recommend it because it’s different than Python?

matt_daemon 2 years ago |

I wish there wouldn’t be such a song and dance about “moving away from Python”. There’s nothing wrong with creating ML tools in Elixir, but it’s always Python is slow, Python has no concurrency support, blah blah

thibaut_barrere 2 years ago | |

I come from Ruby but the reactions can be similar, happy to give my data point.

The thing is Elixir is really good at an increasing number of things.

If you need to write a HTTP proxy in the middle of your application, since Elixir processes & incoming HTTP workers are cheap, you do not need to go evented: it just works.

If you need to have reactive web apps with automated changes pushed to the client, it's the same: there is no need to external tools (e.g. any cable) at certain scale.

If you need to do some scripting, there is `Mix.install/2` for single-file dependencies description & use.

If you start crawling too much web pages or process to many APIs, the concurrency support kicks in and there is less need to scale (or later), turning into fewer machines, fewer ops problems (or delayed) etc.

And now you start being able to use MachineLearning, deploy the same type of code on GPU, embed Machine Learning models right in the middle of your web app without much work, etc, which in turns makes it a nice platform for apps / SaaS.

Elixir really is becoming a Swiss-army knife which scales easily :-)

knewter 2 years ago | | |

And nerves makes it easy to do IoT stuff / almost-embedded

gv83 2 years ago | | |

still not easy at the most important thing of all: being approachable, instrumented and intuitive to people less dedicated to programming.

I say this as someone that likes elixir, but after seeing it failing miserably at my org, I'm very skeptical it can be thrown around like a spring or node or django project. It needs real support from the org and requires module design skills that are not present in most random devs from a random org.

OJFord 2 years ago | |

"Everything is working fantastically with our python ML project but we're rewriting it in Elixir anyway" would be a weird article wouldn't it?

throwawaymaths 2 years ago |

lots of negativity in here from people who have invested a lot of effort learning python, despite that there's also people who complain a lot about ML/DS in python. Is it an expression of sunk cost fallacy? If you don't support people who are trying to make a difference, nothing will change.

mrdoops 2 years ago | |

It's a weird mental block too - learning a language like Elixir, being so well designed and documented, is as comfortable as it gets.

Devs we hire without direct Elixir experience pick it up really quick (within a couple weeks). The energy needed to "get good" with Elixir is really not much considering it provides veritable super powers on the backend and introduces a whole category of concurrency concepts that are not easy to grasp elsewhere.

throwawaymaths 2 years ago | | |

specifically over python, immutability in function passing is amazing for beginners and dealing with beginners.

How confident are you that the junior you just hired is operating correctly in the other_code module they're responsible for

    import other_code, as: other

    def f:
        my_dict = {"foo": 1, "bar": 2}
        other.function(my_dict)
        return my_dict["foo"] # ==>  you might be wrong about what's in my_dict

codybontecou 2 years ago | | |

Do you know which resources your devs are using to learn Elixir from scratch?

paulluuk 2 years ago |

Show me I can save a considerable amount of time training a 70b parameter huggingface model by switching to Elixir, then we can talk about me learning a new language.

josevalim 2 years ago | |

The training costs and performance at that scale are mostly driven by hardware and I personally don't think language will make a large difference (I may be wrong).

Most of the Elixir Nx efforts are on inference, especially on how you can embed and scale it using concurrent and distributed patterns (see this post/video [1]). It may not be what you are looking for but we have more folks deploying than training models, so maybe they will find incentives to give Elixir a try. :)

[1]: https://news.livebook.dev/distributed2-machine-learning-note...

sodapopcan 2 years ago | |

Seems like you didn't read even the first line as this article is directed at people already using Elixir who are currently doing ML stuff with Python.

paulluuk 2 years ago | | |

I did actually read most of the article. Did you read the section immediately after that?

> Why is Python not Sufficient?

It then proceeds to make a case why Python would not have enough speed or support for parallel processing, which is what I'm disputing.

madspindel 2 years ago |

Instead of moving to Elixir I believe it makes more sense to wait/move to Mojo when it's ready:

https://www.modular.com/mojo

weatherlight 2 years ago | |

Elixir (NX) is ready now, Mojo might be, at some time in the future.... also Mojo won't have the BEAM or ergonomics around Functional Programming.

nologic01 2 years ago | | |

ML is not an island, it is part of a much broader "data science" universe that is currently served fairly well (still imperfectly) by the Python/C++ stack and is not easy to replicate.

Throwing BEAM or FP acronyms around won't really strike a chord with people working with data and models.

Mojo will (as per promise) tap into the wider ecosystem. Other platforms are more than welcome to try but this ultimately requires a huge community of scientists / developers to become a real alternative.

formulathree 2 years ago | | |

Python syntax has really good ergonomics around functional programming. I hardly write loops when I use the language now.

mrdoops 2 years ago | |

Mojo is pretty cool but its a difficult road to take - especially to implement the breadth of accrued functionality of a 20 year old language like Python.

Elixir & Python are not an apples to apples comparison - there are fundamental differences in the programming model (functional, immutability, etc) and runtime (preemptive scheduling + OTP) that is the reason it has distinct advantages not available without heavy cost trade-offs elsewhere.

Either way once Mojo is production ready Elixir will be able to use it as well like it does Rust, Zig, or Python.

pjmlp 2 years ago | |

Given how Swift for Tensorflow went, how openai was equally pushing for it, I would rather take a "wait and see" approach to Mojo.

AlchemistCamp 2 years ago | | |

Do you mean OpenAI or do you maybe mean Fast.ai?

throwawaymaths 2 years ago | |

Since it aims to be transparently compatible with python Mojo will very likely suffer from the worst part of python: packaging

Last week heard a story about an ML dev that would literally rebuild his system every week because python would break it

ilikehurdles 2 years ago | |

No source code, no thanks.

mkranjec 2 years ago |

At first I thought what does Nx (nrwl/nx) has to do with machine learning, then I figured context is about Elixir lib named Nx.

Naming things is hard.

dsiegel2275 2 years ago | |

Agreed, naming things is hard. That and off by one errors are the three hardest things in software development, IMO.

ranjanprj 2 years ago |

I just ran a 2006 Java code to detect and read car license plate, and it ran in the first run on Java 17. The code is 17 years old and runs just fine without any issues. I think Java folks messed up AI/ML space due to licensing. But I still think it's best PL for AI/ML.

frakt0x90 2 years ago | |

What makes it the best language for AI and ML? I've never heard that take so just curious your thoughts. It's nice that legacy code still runs but that's usually not people's concern in ML.

itronitron 2 years ago | | |

I (and my colleagues) have worked on statistical, data analysis, and ML in Java since before data science was even a career field, and before Python became popular. In my opinion Java has better IDE support, more stable and proven libraries, and high performance.

Java does occasionally require that a person might have to implement their own code after reading a research paper, but I've always enjoyed that part of the job.

I've never understood Python's popularity except that I've heard some people say that it's used at Google.

realusername 2 years ago | |

You would be happy in Elixir as well, the language is pretty much frozen and very small bug fixes and features are added.

It's definitely not a fast moving language.