PyPy.js: First Steps(rfk.id.au) |
PyPy.js: First Steps(rfk.id.au) |
Emscripten is an LLVM-to-JavaScript transpiler.
Is there any way the two of those could be hooked up?
I mean, you're not going to be using tkinter or running Python networking modules on the browser, anyway. If you could compile some of the scientific and numeric Python that exists into JS, it would allow a lot of scientific computing to be distributed and run on the web, client-side, including most of the machine learning and NLP code that exists now in Python.
From just an hour of research, it doesn't look like it would be very hard. But I only know enough about the two projects to be dangerous.
At the very least, it's an interesting idea.
EDIT: TO clarify, I'm referring to "distributed" as in distribute an asset (in this case a JS script) to a computer to be run, not distribute a work load.
Anyone figured out how to get a comparable benchmark out of it without loading a full browser?
The reason I ask is that http://repl.it/languages/Python loads quickly and can actually run in my iPhone 4S Mobile Safari. At 139 MB uncompressed currently, I'm not sure this new project will ever result in something that would let me write python code natively in a <script>. I know that he's currently running with node.js, so no browser required, but if this project is only ever going to be yet another desktop/server interpreter, I would be gobsmacked if it could remotely approach PyPy or CPython performance - even once they get JIT going.
The article involves compilation of the PyPy interpreter (not CPython) into JS.
This is why I asked about someone attempting a benchmark comparison between the two.
Still, it's a fun project. Good luck to the author.
EDIT: If this seems too mind-bending, think that even machine code is not really machine code: the CPU actually JITs the "native" code (say, in the x86-64 ISA) into a "more native" code that is what is actually executed by the CPU (for Intel CPUs, these are "micro-ops"), and in doing that it uses a lot of compilation tricks (such as trace caches), including optimizations driven by runtime feedback (you can think of branch prediction this way).
Of course Javascript is a much thicker abstraction, but conceptually it is not much dissimilar.
Also, asm.js code is structured in a way that makes it obvious the code will not change over time (it's in a closure, where functions cannot be modified), again, in order to make optimizing large projects easier. When JITing however you do want to add new code all the time.
But this could work great without generating asm.js code. The VM itself is C code that can be compiled wholesale into asm.js (when the Lua VM was compiled that way it was quite fast, about 50% of native speed), and it would then JIT at runtime normal JS and call into that.
If the CPython API is so easy to get going on emscripten I wonder if anyone has tried using emscripten to compile Nuitka (http://nuitka.net/) output to Asm.js yet?
No reason this approach cannot JIT into JS just like clojurescript and dart.
Without the type system however, Python in the browser is next to useless in practice.
1. I think you meant "of".
2. log(781250 / 877) is closer to 3 than 2.
In a comment, I found out about another project ShedSkin which seems to be like RPython, i.e. it compiles a subset of Python to C++. https://code.google.com/p/shedskin/ http://shed-skin.blogspot.de/ https://news.ycombinator.com/item?id=6091123
(This may have been tried in the past; in the post "10 years of PyPy" it's mentioned that there was once a JavaScript backend but it was removed because it was a horrible idea: http://morepypy.blogspot.com.au/2013/02/10-years-of-pypy.htm...)
[1] https://github.com/sq/JSIL/wiki/JavaScript-Performance-For-M... [2] https://bugzilla.mozilla.org/show_bug.cgi?id=885526
lua.vm.js is running at 50% of Lua, not LuaJIT. Add to that that you can't touch the DOM from asm, and the ported VM basically has it's own heap and no way to collect cycles between Lua and JS and it's not a good solution. Much better to rely on the JS VM for memory management.
It's possible, of course, but I've never seen it. And I certainly haven't found any way to achieve it with a compiler. If you have, let me know...
It doesn't require special static compilation to optimize small amounts of JITed code, all modern JS engines can do that extremely well. asm.js is not necessary there.
AFAIK the duetto approach brings no benefits in this case.
You'd need to also port the above libraries to JS, which would be a pretty large undertaking.
Finally, grid computing is not any easier in JS than in python, since python already has lots of bindings for the required tools (e.g. take a look at http://star.mit.edu/cluster/ which even comes preinstalled on EC2 AMIs).
Edit: I just noticed that you suggested using JS for distributed computing in lieu of python. Care to suggest a scenario where JS would be a better suited language? Why would I want to run scientific computing in browsers?
Usually, hacking like this leads to nothing, but sometimes it leads to everything.
Care to suggest a reason why I shouldn't do it if I feel like it? I'm really gritting my teeth in forbearance here.
PS: I wasn't even thinking of grid computing when I mentioned this. I meant "distributed" in the sense of distributing an asset, not a work load. Nor was I thinking of whether or not JS was a "better language". I was just wondering if it could be done.
Edit: I can't reply to your comment below, so I'll write a few words here. First, apologies if you weren't trying to be snarky. Phrases like "Care to suggest...?" can in certain contexts imply disdain and snark. If I misread your intent, then my apologies.
Second, Numba actually is a kind of a JIT, so it's interesting on several levels. I don't think you'd get any added benefit from the JS JIT in terms of performance, but instead it would be another way to distribute scientific Python scripts. For example, as far-fetched as it is right now, this would allow ipython notebooks to be distributed and run in their "static" form" without needing a local or remote Python server (note that here again I'm not meaning any kind of distributed or grid computing).
Finally, scientific Python may actually not be the least amenable code to bring into JS, since pure computational code has limited I/O and thus we don't have to worry about the DOM. Also, note that the C libraries that are used in pandas and elsewhere could in theory be compiled into JS using Emscripten. A large task, to be sure, but an interesting idea.
Unfortunately, I'm too busy at the moment to start up a project like this, but it's definitely going on my list of things to hack around with.
You brought up scientific python specifically, but this tend to involve (in my experience) overwhelmingly native code with small amounts of plumbing written in python. Numba is already a restricted subset of python, and without porting at least parts of numpy, you are further restricted to just a subset of that subset (by giving up vectorized numpy ops).
I totally agree that playing around with it will be fun, I'm merely pointing out the fact that you chose perhaps the least amenable kind of python code for this particular task.
Again, by all means go for it! I agree that there are lessons to take away even if it doesn't lead to anything specific.
Edit: you also seemed to imply that there is some inherent benefit in moving scientific/numeric computation from python to JS (and the browser). I was honestly interested in the reasoning behind this. I know that numeric python could do well with a proper JIT for example and a lot of research went into JS VMs (pypy doesn't count yet).
The entirety of the repl.it emscripten CPython project is 24MB, uncompressed. This includes the entire standard library that it ships with and all the '_underscore.so.js' emscripten compiled shared objects. Compressed via zip it's 4.7MB. For comparison, this is almost identical (within a few MB) to a clean install of python locally on my workstation, size-wise. I am assuming at this point that it's most if not all of the standard modules included at repl.it.
(And for reference: The core CPython engine, translated minus modules, weighs in at 4.6MB uncompressed and 800KB compressed)
Downloading the prebuilt PyPy project from http://pypy.org/download.html I see that uncompressed the project is 55MB in size.
Removing all .txt and pure .py files, I'm left with 37MB (and that's being generous) of 'code' files that potentially are being translated. And that's with shared objects - not a static compile - so there's possibly duplicated code in there that wouldn't be present in a single monolithic executable.
I stand by my assertion that 139MB is significantly different in size and that the translation is what accounts for the majority of that size (84MB if I'm generous, 102MB if I'm slightly more realistic).
As much as there may be a speed benefit, eventually, if everything works out here, the current size of the project definitely moved it out of the realm of anything I'd want to attempt loading into a browser.
I agree that 139M is pretty ridiculous for any practical purpose! I'm going to work on lazily loading just the files that are needed, which should make a big difference.
I was aware of the ridiculous size of emscripten's output. When I experimented with emscripten a couple months ago, "Hello World" in C produced about 2000 lines of JavaScript. Another ~700 line program in C ran about 150,000 to 200,000 lines, depending on various settings.
pypy.js on the other hand does include the full standard library.
1: http://pypy.org/compat.html - Standard library modules supported by PyPy. Note that large parts of python library are implemented in pure python, so they don't have to be listed there.