PyPy v5.8 released

PyPy v5.8 released(morepypy.blogspot.com)

162 points by pettou 9 years ago | 70 comments

I have a question and then a general vent

1. Does anyone know the latest update on NumPyPy? PyPy for me is just not a usable proposition because I heavily use Numpy (and Scipy et al). So I am forced to use slow Python + fast Numpy or slow Numpy + fast Python. Very saddening. The C-Extension is just so off the pace, NumPyPy was meant to solve that quandry.

And I know some smart Alec will trot out the usual 'downshift into C' line that everyone (including Guido) use as the final goto solution for performance but that is simply a disgrace in 2017. Even JavaScript is fast. Why can I not choose to write Python and it be fast?? And yet Python 3 is getting slower. Don't agree? Look at these benchmarks of Python heaps written in Python (not using the C based builtin heapq) https://github.com/MikeMirzayanov/binary-heap-benchmark Python generally is off the pace but Python 3 is about twice as slow as 2 and miles off JavaScript.

But PyPy is proof that Python can be fast. It makes quote/unquote "Pure Python" within striking distance of Go and and when I run that test suit on PyPy, its similar to the Node.js score. Why does this matter? Because I want to write bloody Python not C.

And it is so tantalisingly close - look at a blog post like: https://dnshane.wordpress.com/2017/02/14/benchmarking-python... The performance of the Fibonacci Heap that someone wrote in quote/unquote "Pure Python", when run in CPython can never compete with HeapQ (the C based builtin lib), but on PyPy it can. Fast code written in Python. So what are the problems holding back PyPy? I think possibly money and number of devs working on stuff. Javascript had Mozilla, Google, Microsoft and Apple in a browser war + loads of open source input.

But is the biggest stumbling block not Guido himself and the core Python devs? Do they just philosophically not agree with PyPy or is it just disinterest?

Well whatever it is, it is heart-breaking to want to write fast code in my favourite language and leverage all its power including Numpy/Scipy etc and not be able to. And yes my use-case is perhaps quite unique, a very CPU intensive service that ideally computes and returns a real-time calculation (that includes 500k function calls) in 10-50ms.

But getting fast Numpy in the PyPy mix (i.e all the speed of the JIT + no worse Numpy) would be a HUGE step forward for me in PyPy adoption. What is the latest? How can I help?

fijal 9 years ago | |

in short - funding. If we can find someone who wants fast numpy AND fast python under the same hood, we can combine the approaches of cpyext and numpypy and make it fast. The project is just too big to do on spare time. I've been trying to find some funding for that for quite a while, but I haven't been able to find any sizable backer just yet.

Cheers, Maciej Fijalkowski

boultonmark 9 years ago | | |

Maciej, how much would that require, ballpark? I think this is something there would be massive support for. My company would support it

Redoubts 9 years ago | | |

When I go to the numpypy page, it says I can't donate directly to numpypy efforts.

http://pypy.org/numpydonate.html

nimish 9 years ago | | |

How do I get involved?

orf 9 years ago | |

Newsflash: code like this[1] will never be fast in CPython, and if you write a lot of code like that and are sad when it's slow then you need a different language, especially if you expect it to be as fast as a JIT compiled language like js on v8. Or use something like Cython.

That benchmark is pretty meaningless anyway, IMO. Here are some halfway decent, official and up to date benchmarks comparing python 2 and python 3[2].

Python 3 is slower in some areas, noticilby startup time, but it's not all doom and gloom. It's faster in a lot of places. And productivity is hard to benchmark, but IMO py3 is way better in the area.

1. https://github.com/MikeMirzayanov/binary-heap-benchmark/blob...

2. https://speed.python.org/comparison/?exe=12%2BL%2B3.6%2C12%2...

fijal 9 years ago | | |

What's wrong with that code? Runs acceptably fast under pypy

boultonmark 9 years ago | | |

Ah the 'smart alec' has appeared. I'm not stupid, I know code like that won't be faster in Python but PyPy shows that it can be a hell of a lot faster than CPython and right up there with Node.js and the travesty is that CPython is so far off the pace and getting slower

mattbillenstein 9 years ago | |

I think the core devs are doing a fine job with CPython as the reference implementation and developing PyPy takes a different set of expertise; JIT and compilers specifically.

I think sponsorship of PyPy would be welcome -- but it seems non-obvious where that would come from.

Javascript has the fortune of being the language that drives a very important platform -- and Chrome has been a particularly strategic investment for Google to have more control over the web than it ever has before. Java has Android... Python unfortunately doesn't have that sort of standing in any area that I'm aware of.

pjmlp 9 years ago | | |

Java has much more than Android.

Besides the enterprise space, there are lots of embedded devices running it.

mastazi 9 years ago | |

- Statement: "Python 3 is getting slower" (tense: present continuous).

- Proof: a benchmark based on Python 3.3 (Python 3.3 was released in 2012).

red2awn 9 years ago | |

> Do they just philosophically not agree with PyPy or is it just disinterest?

I think Guido wants the reference implementation to have a simple and straightforward codebase, a JIT is anything but that.

boultonmark 9 years ago | | |

And that would be fine, and my mind goes to similar examples like Lua where the reference and the JIT versions co-exist but PyPy has not had the impact that LuaJIT has had (for example) on the LUA community.

petters 9 years ago | |

I agree writing C extensions is not a solution. It is evidently too hard. Even the standard pickle library in Python 3 has a memory corruption bug. http://bugs.python.org/issue23655

smortaz 9 years ago | |

We [msft python team] tried to get PyPy some funding, but it didn't go very far. I'll keep trying. We've also started this project to enable jitting for CPython:

https://github.com/Microsoft/Pyjion

The core of it is there and it's been accepted into the main trunk, but we haven't had the cycles to do quality codegen yet.

stuaxo 9 years ago | | |

Pyjion looks good, once it is easy to build in Linux it will probably attract some users and even possibly some extra developers.

flavio81 9 years ago | |

I understand that Numpy is fast in CPython because it relies on high speed code being done in C.

What you could do is much simpler --- :

Split your Python application in two parts:

1. Keep your functions that make heavy use of Numpy and Scipy under CPython; expose your algorithms/functions as a web service/REST service/etc running under CPython.

2. The rest of the application, which of course needs to call the functions in (1), can be written in PyPy and call the web service in (1). This is where you would put the general-purpose stuff like web, graphics, database access, and of course all symbolic manipulations that do not require Numpy/Scipy.

I think this is a simple, workable solution.

gshulegaard 9 years ago | |

> And I know some smart Alec will trot out the usual 'downshift into C' line that everyone (including Guido) use as the final goto solution for performance but that is simply a disgrace in 2017.

Easy gluing of other languages together has long been something I considered a strength...but I suppose to each their own.

> Why can I not choose to write Python and it be fast??

Well there are lots of reasons...including implementation issues and I don't know them all...but I think Python has a very clear productivity niche. Personally, I am ok with Python trading performance for productivity. For the most part, I haven't had Python be so much of a bottleneck that writing a very small part of logic to be performant hasn't solved my use case.

> And yet Python 3 is getting slower. Don't agree?

Yeah I don't agree...that benchmark uses Python 3.3. The corner on Python 3 performance over Python 2 started turning around 3.4. Perhaps a talk from this years PyCon would help illustrate:

https://www.youtube.com/watch?v=d65dCD3VH9Q

> But PyPy is proof that Python can be fast.

Indeed, I would even say that Cython is even more proof that there are frontiers of performance that could be explored. But with PyPy (as with Cython) their are sacrifices you have to make.

Personally, I think the most promising performance improvement that is tantalizingly close for me is Larry Hasting's Gilectomy project:

> https://www.youtube.com/watch?v=pLqv11ScGsQ

But at the same time, I am not sure that Python ever needs to be fast running in CPython. With `WASM` perhaps it is better to just compile Python.

I don't know, performance in Python has always been a mixed bag...but personally I think it doesn't get much focus because it doesn't really serve Python's target niche. I don't know if there ever will be (or should) be 1 language to do everything...and as it is Python is a good "productivity" focused language to have in your toolbox so-to-speak.

takeda 9 years ago | |

If your code looks like this: https://github.com/MikeMirzayanov/binary-heap-benchmark/blob...

Then you probably should not use Python, python is more of a glue language which you should strive to make your program looking like a business logic, in real word to solve this problem you would write code such as this:

    import time

    if __name__ == "__main__":
        start = time.clock()
        N = 10000000
        h = list(range(N))
        h.sort()
        for i, v in enumerate(h):
            assert(i == v)
        print("Done in %f" % ((time.clock() - start) * 1000))

    $ python3.6 heap.py
    Done in 2389.877000

Or if heap needs to be used:

    import time
    from heapq import heapify, heappop

    if __name__ == "__main__":
        start = time.clock()
        N = 10000000
        h = list(range(N))
        heapify(h)
        for i in range(len(h)):
            assert(i == heappop(h))
        print("Done in %f" % ((time.clock() - start) * 1000))

    $ python3.6 heap.py
    Done in 10716.348000

Micro benchmarks are silly because you'll never do those things in real code.

brachi 9 years ago | |

You can read this great article just released about the Python 2017 language summit: "Keeping Python Competitive" [1]. There you can read opinions by many core developers. Pypy is also discussed

[1] https://lwn.net/Articles/723949/

nimish 9 years ago | |

I desperately want numpy, pandas and scipy on pypy3

Then I can ditch Cpython and its stockholm syndrome slowness forever

neutronicus 9 years ago | |

Unlike you, Python is _not_ my favorite language, but the matplotlib lock-in is real.

Hopefully a matplotlib-equivalent will materialize on Clojure (where Linear Algebra is plenty fast and the language itself is fast-enough out of the box) so I can be done with Python forever.

pjmlp 9 years ago | |

I get the feeling that CPython developers don't care that much about PyPy, I might be wrong though.

Hence why I am looking forward to Julia making inroads into Python's domain, as that might be the butterfly effect that triggers PyPy adoption.

mkj 9 years ago | |

I can't answer how numpypy is going, but Numba works pretty well for me to write fast numeric code in python. A bit restricted language and installing llvm is a bit of a hassle, but overall it's great.

ant6n 9 years ago | | |

That's CPython and the reason you're not noticing its slow is probably because most of your time is spent computing inside compiled library functions.

fnbr 9 years ago | |

What are you currently using to solve that problem? I've ran into that problem too, and had to use C-extensions to make my code faster, which isn't ideal.

cwyers 9 years ago | |

Even Javascript is fast? Microsoft, Apple, Google and Mozilla have all invested heavily in making that happen.

pjmlp 9 years ago |

Awesome work, congratulations on bringing Python forward.

Still wishing one day PyPy might become the canonical implementation.

mattbillenstein 9 years ago |

PyPy is great -- while I still use CPython for our more complex webapp and associated tools that have heavy dependencies on C-extensions; I increasingly use PyPy for the more mundane cpu/data heavy lifting I do. It's typical to get 2X the performance (comparable to some compiled languages) and still use much of our utility code, configs, etc.

rubber_duck 9 years ago | |

>comparable to some compiled languages

Given that python programs usually run an order of magnitude slower than compiled languages even a 2x performance increase doesn't put it in the "comparable" range from my experience. Not bashing python - I use it regularly - but for computational stuff it's a hog unless you're just passing stuff to C libs - like I have a resource build pipeline that does some blender 3D model transformations - code is written in python and takes forever - equivalent code in C++ would take roughly 1/100 of the time and performance would be irrelevant but atm. we're seriously considering rewriting parts in C++ to reduce build times.

lqdc13 9 years ago | | |

Blender Python lib by default is not optimized much. It has nothing to do with Python as a language.

Use numpy for matrices. If you have to implement an algo with a hot inner loop, use cython or numba.

I've never seen 100x difference in Python-C++ rewrite if Python was optimized already.

Here is a good article about some of the options: https://rare-technologies.com/word2vec-in-python-part-two-op...

Johnny_Brahms 9 years ago | | |

As long as you use the python procedures that are written in C you will not really get much by using pypy. Try it yourself, write some IO and string heavy code and compare. [1]

But as you say: for numeric computations python is slow as molasses.

[1] or just look at something like https://github.com/juditacs/wordcount/blob/master/README.md . The simple py2 version is 2.5 times slower than a java version someone spent a lot of time writing, and less than 2 times slower than a reasonably straightforward C program.

mattbillenstein 9 years ago | | |

It all depends on the task -- if your program is calling into optimized C-extensions anyway, converting the whole thing into a compiled language of course offers less of a speedup.

robocaptain 9 years ago |

Coming from someone who uses python but doesn't really follow alternative compilers, PyPy sounds great. What are some of the downsides, if any? Are you sacrificing library compatibility for faster core+standard libs?

bdarnell 9 years ago | |

In addition to being incompatible with (some) third-party libraries, pypy tends to use significantly more memory than cpython. It's also slower than cpython for scripts that don't run long enough to warm up the JIT, so you probably wouldn't want to use it by default. (Disclaimer: I'm basing this on experience with older versions of pypy and haven't verified it recently)

lanstin 9 years ago | | |

He memory thing is still an issue. I had to go thru. Lot of tuning on max GC size to keep it runnable for long times. Too low and it is slow and too high and it kills the box.

gdulli 9 years ago | |

For workloads heavy on JSON operations PyPy has been slower than CPython for me in the past. Because it didn't benefit from the C implementation of simplejson.

lanstin 9 years ago | |

In addition to the missing libraries, which has gotten a lot better in the last few years, it crashes a bit more.

I have a syslog proxy that has one huge incoming stream like 50k msgs/second. CPU Could not keep up with CPython but PyPy runs fine and crashes on some low level JIT assertion every so often. I have it setup to use PyPy on the high volume instances and CPython on the low volume instances.

ipmb 9 years ago | |

JIT warm-up time can be problematic. Often (usually?), the code starts slower than CPython, but improves as the code paths are executed.

ant6n 9 years ago | |

Numpy support is iffy. How is the Python 3 support these days?

mattbillenstein 9 years ago | |

Your favorite C-extension may not support it.

make3 9 years ago |

Awesome news, congrats to the team :) On an unrelated note, I wish Google gave them money to make it work with Tensorflow.

dr_zoidberg 9 years ago |

Why are they still comparing to Python 2.7.2? I couldn't find benchmarks against Python 3.5 for their Py3 interpreter.

All the times I tried PyPy I came into a hurdle where one of the libraries I needs doesn't work (or underperforms) in PyPy, the most important ones being Numpy and OpenCV.

So in the end I just gave up with them, and stuck with Python 2/3 and Cython, which solved my speed problems without having to do all the work of C-extensions from the ground up.

Edit: the one benchmark I found covering PyPy3 is this: https://pybenchmarks.org/u64q/benchmark.php?test=all&lang=py...

It shows PyPy3 5.7.1 being about 8x faster to 100x slower than CPython 3.6.1.

For comparison, PyPy2 5.7.1 ranges from 10x faster to a bit over 30x slower to than CPython 2.7.13.

BlackFingolfin 9 years ago | |

The benchmark showing a factor 100x compares a pure python implementation of "pidigits" running in PyPy3 vs one that uses GMP (via gmpy2) in python. I am actually impressed it's only a factor 100. And with progress made by cffi, I'd hopeful that a GMP-using PyPy version could be written rhat matches the speed of the code using gmpy2.

The next benchmark "only" runs 6x slower in PyPy ; still bad, but that paints quite a differen picture.

dr_zoidberg 9 years ago | | |

If it's running pidigits inside PyPy then it shows how much slower the C interface (not PyPys cffi, but it's CPython compatible interface) is compared to CPythons.

For the record, I'm not claiming PyPy3 to be 100x slower, these are benchmarks and they're hairy beasts that we have to shave and try to see what they tell us about performance.

My point is that PyPy3 is still behind in language features (in CPython we have a lot of nice things from 3.6 and 3.7 coming soon, while PyPy still lags behind having the complete 3.5 feature set), and they haven't optimized it as much as their 2.7 branch. But the PyPy people are always showing those "7x faster than CPython" claims (which come from an average of benchmarks, which seems to have been cherry picked to avoid the ones in which they're actually slower than CPython).

On the other hand, with Cython moving over to Py3 was never an issue (it actually helped in some cases), and it always helped to deliver better performance, and in just the right spot where it's needed. True, you have to know about profiling and identifying where to use Cython, but at my workplace it's been a far better tool to solve our performance needs.

Tobu 9 years ago |

Any word on the "single codebase" aspect of supporting both major Python versions? I remember suggesting it years ago at a time when the team wanted to do Mercurial backporting instead. What changed their mind?

That looks like it could fix the lag on CPython releases, so it's a big feature.

oblio 9 years ago |

I guess the next release is the one that should support Python 3. At least as a non-beta feature.

wyldfire 9 years ago | |

FYI at least for everything I've thrown at it, PyPy 3.x works really well.

ipunchghosts 9 years ago |

Here is the dumbest question in the world: our application has a gui which is pyqt, can we use pypy? Aside from pyqt, its completely vanilla python.

flavio81 9 years ago | |

Most code that is vanilla Python will work just fine with PyPy. However, in real life, your code most likely also uses (imports) other Python libraries, you just need to make sure they work OK with PyPy.

Many, many, many libraries already work fine with PyPy, so give it a shot.