From Python to Elixir Machine Learning(thestackcanary.com) |
From Python to Elixir Machine Learning(thestackcanary.com) |
As far as I could tell I would have to bring that data into elixir, do the text processing and put it back into explorer, which to me defeated the whole point of a dataframe library.
I imagine it's good for precleaned data, using it with the built in datasets has been fine
Feel free to open up an issue! We have been focused more on high-level features (such as integration with S3, Postgres, Snowflake, SQLite, etc) and therefore we are missing many functions that already exist on Polars. Good news is that it is very quick to add them, so just let us know. :)
My understanding is you can use any Series operations without penalty (ie. they get passed into some rust NIF call), https://hexdocs.pm/explorer/Explorer.Series.html#functions-s..., which does include whitespace trimming, but not arbitrary strings, but I imagine it wouldn't be too much of a jump to add arbitrary constant strings. Might just need to expose `str.slice` or `str.replace`.
The docs do imply that `mutate_with` operates lazily, so you only pay the transfer cost once per row, no matter how many mutations you're applying, but whether that's performant enough depends would be case by case.
https://pola-rs.github.io/polars/py-polars/html/reference/ex...
https://pragprog.com/titles/smelixir/machine-learning-in-eli...
You can always make Elixir app talk to Python ML backend and get the best of both worlds if you desire.
I like Elixir for web development otherwise, it is a much more stable domain so above doesn't apply (although I've seen some claim otherwise, which is telling how much more of an issue it would be for niche ML use case).
I'd be very happy to be proven wrong by some case studies of how companies leveraged Elixir in real ML projects and concluded it is superior to Python.
And just as Elixir is (in my opinion) preferable to Python for web development, it's possible that the same may happen with AI.
your flaw of reasoning can be trivially pointed out simply by explaining that once upon a time, Python was NOT "the language for machine learning". Essentially, NO "X is the solution for Y" started out that way. Which is why appeals to popularity are a fallacy.
The thing is Elixir is really good at an increasing number of things.
If you need to write a HTTP proxy in the middle of your application, since Elixir processes & incoming HTTP workers are cheap, you do not need to go evented: it just works.
If you need to have reactive web apps with automated changes pushed to the client, it's the same: there is no need to external tools (e.g. any cable) at certain scale.
If you need to do some scripting, there is `Mix.install/2` for single-file dependencies description & use.
If you start crawling too much web pages or process to many APIs, the concurrency support kicks in and there is less need to scale (or later), turning into fewer machines, fewer ops problems (or delayed) etc.
And now you start being able to use MachineLearning, deploy the same type of code on GPU, embed Machine Learning models right in the middle of your web app without much work, etc, which in turns makes it a nice platform for apps / SaaS.
Elixir really is becoming a Swiss-army knife which scales easily :-)
I say this as someone that likes elixir, but after seeing it failing miserably at my org, I'm very skeptical it can be thrown around like a spring or node or django project. It needs real support from the org and requires module design skills that are not present in most random devs from a random org.
Devs we hire without direct Elixir experience pick it up really quick (within a couple weeks). The energy needed to "get good" with Elixir is really not much considering it provides veritable super powers on the backend and introduces a whole category of concurrency concepts that are not easy to grasp elsewhere.
How confident are you that the junior you just hired is operating correctly in the other_code module they're responsible for
import other_code, as: other
def f:
my_dict = {"foo": 1, "bar": 2}
other.function(my_dict)
return my_dict["foo"] # ==> you might be wrong about what's in my_dictMost of the Elixir Nx efforts are on inference, especially on how you can embed and scale it using concurrent and distributed patterns (see this post/video [1]). It may not be what you are looking for but we have more folks deploying than training models, so maybe they will find incentives to give Elixir a try. :)
[1]: https://news.livebook.dev/distributed2-machine-learning-note...
> Why is Python not Sufficient?
It then proceeds to make a case why Python would not have enough speed or support for parallel processing, which is what I'm disputing.
Throwing BEAM or FP acronyms around won't really strike a chord with people working with data and models.
Mojo will (as per promise) tap into the wider ecosystem. Other platforms are more than welcome to try but this ultimately requires a huge community of scientists / developers to become a real alternative.
Elixir & Python are not an apples to apples comparison - there are fundamental differences in the programming model (functional, immutability, etc) and runtime (preemptive scheduling + OTP) that is the reason it has distinct advantages not available without heavy cost trade-offs elsewhere.
Either way once Mojo is production ready Elixir will be able to use it as well like it does Rust, Zig, or Python.
Last week heard a story about an ML dev that would literally rebuild his system every week because python would break it
Naming things is hard.
Java does occasionally require that a person might have to implement their own code after reading a research paper, but I've always enjoyed that part of the job.
I've never understood Python's popularity except that I've heard some people say that it's used at Google.
It's definitely not a fast moving language.
Yeah I noticed that, it looks like the numeric manipulations are well represented, less so for strings.
I'll dig out the the code later on and get an issue raised.
new_column = Series.transform(df["column"], fn arg -> ... end)
DF.put(df, "column", new_column)
which _is annoying_ since you are not supposed to use it. The correct way is to extend the Series API, which we will be very happy to!And yes, when it comes to AI, things also could change in favour of Elixir - I would be pretty happy about it.
Other languages have certain features that make extension and integration feel like first-class concerns which lowers the barrier to contributions from a wider range of people and also helps keep e.g. dependencies and build processes relatively simple.
Python was not designed for ML, it happened to it, the way Android happened to Java etc. Loosely speaking the Mojo project serves a function similar to that of Kotlin in the Android mobile world. Trying to remedy some recognized friction points while maintaining the benefits of a widely established ecosystem.
Obviously not holding a crystal ball: if the ML hype mutates into something more permanent and very widely embedded across different verticals (not just the big tech sponsored pytorch / tensorflow platforms and use cases) and if the Python/C++ combo becomes a recognized bottleneck then the conditions might spark another approach.
* Exercism track: https://exercism.org/tracks/elixir
* Sasa Juric's book: https://www.manning.com/books/elixir-in-action-third-edition
* Dave Thomas's Elixir Course: https://codestool.coding-gnome.com/courses/elixir-for-progra...
* Phoenix Guides: https://hexdocs.pm/phoenix/overview.html
* Ecto Guides: https://hexdocs.pm/ecto/getting-started.html
The above covers the language basics/ideas/concepts and the main tooling (Phoenix/Ecto) if you're looking to build apps or get an Elixir job. I definitely recommend the Phoenix Guides or similar - they're very high quality and kept up to date with any new releases or changes while books can sometimes get out of date.
The current workarounds to make this happen in python are quite ugly imho, e.g. Pytorch spawns multiple python processes and then pushes data between the processes through shared memory, which incurs quite some overhead. Tensorflow on the other hand requires you to stick to their Tensor-dsl so that it can run within their graph engine. If native concurrency were a thing, data loading would be much more straightforward to implement without such hacks.
1. Loading data
2. Running algorithms that benefit from shared memory
3. Serving the model (if it's not being output to some portable format)
There are also general benefits of using one language across a project. Because Python is weak on these things, we end up using multiple languages.
If I really had the use case and needed threads, I'd much rather use C++ bindings in a Python package than rebuilding the whole thing. Guess it depends on the scale we are talking about.
[0] https://pythonspeed.com/articles/faster-multiprocessing-pick...
Go and elixir provide some parallelism but the primary focus for both languages is concurrency.
Module design doesn't seem any harder than class design in JS or Python.
Do you mean the language is generally harder for non-developers? Or that Elixir is harder for JS/Python developers to pick up and write good code? Or something else?
Writing well designed Elixir code does seem to require a fairly different approach from most common OO languages, at least at a surface level. (Although IMO that's more because you can copy OO patterns you've seen before without thinking much about why they're good patterns than because good design in Elixir is much different from OO)
having said that, designing a proper elixir module (which basically is a bunch of functions that operates on a certain data structure usually represented by map of a certain shape) carries a certain level of fundamental understanding of the operation you're doing, which in my experience is one of the hardest things to get correctly.
It also requires a different way of exploring code as you can't do the familiar `.` and see what happens, you can't just do `price.toCentsValue` but need to do `price |> Price.toCentsValue` and so you need to know the existence of the `Price` module, which might not exist and be buried in `Cart` or as an helper in some controller because you did not understand clearly the domain and the modules responsibilities. Attaching behaviour to data is powerful, explorable, and most people are used to it, even if it's the wrong place in principle, with modules this is flipped, it's now data that must thread through operations, and it's not super easy to grasp.
Also, tooling is not that good (dialyzer sucks, intellij plugin not that good, vscode lsp good but still not a proper experience from people coming from c#/java, type annotations are not that readable...), pattern matching and destructuring on fn arguments confuses people and it's not super easy to read, and a million other papercuts related to tooling and syntax.
We don't have many elixir codebases (let's say around 15-25%) and I've seen incessant whining about "we don't want to maintain elixir" simply because the majority of people cannot be bothered to learn another mountain of quirks and papercuts (every lang has them) plus also losing the familiar way of working they already have, and having to remember that for the spot ticket that appears once in a blue moon on jira for elixir. That's why I think elixir needs extra support from the organization, basically in mandating it to be the primary language, teaching people proper design + proper code navigation and structure techniques, etc.
I hope I've been clear in my long winded ramblings; and I still wish a great future for elixir, so it becomes more approachable in average places like mine
Not sure, I think you're picking and choosing things to ramble about. While you can't just do `price.toCentsValue`, and have to call `Price` module, in a OO language, you would need to instantiate `price = new Price(amount 10)` or something before calling `toCentsValue`. This means you're aware of the Price class, same as being aware of the Price module. If anything being aware of the module infact is better IMO since it allows you trace through explicitly what the code is going to do. Your point is correct if you're working through macros though.
so basically you're all doing things you hate, which is literally the worst possible work environment, and you're trying to use this as a data point for why Elixir "isn't for average coders". I have news for you, dude, you're not even at the "average coders" level, you just work in a "code mill", I'm guessing India.
it doesn’t take much to yeet some python scripts (and more) at the wall and get something to stick. SO, GPT, pick your poison, it’s painfully easy. solved problems as far as the eye can see, with a little glue or tape to hold it together.
elixir demands more of an investment. more than approximately zero is quite a bit if you already have momentum.
(elixir can absolutely be yeeted. not arguing otherwise.)
Most orgs that fail on elixir fail due to management.
It is too much a "language of experts" at the moment, although it is not caused by the language itself, more by the topics covered in general.
You really do have to love a paradigm to work with it, because paradigm shifts have a cost. Elm's paradigm shift is "do everything declaratively/immutably/non-side-effecty" and the massive (IMHO) benefit of going about things that way is "no runtime errors" (!!!) in addition to quite performant code.
But yes, the cost is there, and it is that you sometimes just want a side-effect to get something done, you sometimes just want to call into another library to get something done, etc.
The thing is, if Elm allowed this, or made it easier, you'd lose the Elm guarantee of "no runtime errors". Which, frankly, is a pretty big one- just inspect any popular domain's web page and you're likely to see dozens of JS errors that are simply hidden from most users, contributing to a "janky" web experience.
I literally just inspected this very page I'm typing this comment on and I see:
This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>”.
Error: Promised response from onMessage listener went out of scope
The cost of perfection is high. You really have to love the ideal.But yes, if no one does, then Elm is a boondoggle.
I'm dealing with something similar with NixOS. NixOS's big guarantee (which is also big) is basically "no build or runtime failures that are due to misconfigured dependencies". But there's a steep learning curve and scattered documentation. The core idea is amazing, though, and that's what I love. The rest I tolerate while climbing the mountain.
nevermind the random stab, it's just to make you understand how random some answers are. in any case, no one is trying to touch your precious language, if you can't understand the context in which elixir failed (hot startup that HAD to hire everyone and its dog in 3 months + polyglot environment), not my problem
about your specific counterpoint, I think it's partially valid; I don't instantiate modules, I might not have named structs but just operate on Maps so I don't know the supposed module name, I might have misplaced the function in another module (while it's very intuitive to stick it to the object in oop)...and a million other things that are way too different for the average joe.
We could certainly debate the objective merits of Elixir vs other languages (which seems to be the primary discourse in this thread and elsewhere) but hearing subjective developer experiences is also critical.
It doesn't matter if Elixir is objectively better or not, if lots of devs who are exposed to it are having a hard time it's worth asking why.
In addition to developers not wanting to expend additional effort to learn a new language or paradigm, I believe there are gaps in the ecosystem especially around medium-advanced topics.
In other words, what resources do you wish existed?
also, if you don't have strong org support, you risk getting onboard people that just want to work with that specific technology that WILL gtfo as soon as they need to change team to one that doesn't use it, or if the technology is sunset, etc., so it's even more risky to have an exotic stack in the middle of more common stacks
Elixir is risky, so let's move off of it ==> devs leave. Elixir seems risky.
And then the devs/ecosystem gets blamed.