Async I/O for Python 3

234 points by amirhhz 13 years ago | 88 comments

rdtsc 13 years ago |

Sigh, another Async framework.

Yes it has nice features such can replace the reactor/hub thing. Has futures/promises/deferreds. That has all been done before in Twisted. Yields are cute and there was monocle, I wouldn't say it exactly took off : https://github.com/saucelabs/monocle

Twisted has inlineCallbacks that use yields as well. Just import Twisted into stdlib then and use that.

I am surprised that gevent was dismissed. Ok, there is also eventlet, if someone doesn't like gevent. Monkey patching is scary? Is it really _that_ scary? Most sane and portable IO code probably runs on that today. Why? Because there is no need to create a parallel world of libraries. Write a test does it pass? Does it handle your use case? I'll take not knowing if my green threads switch tasks and add another green thread lock, instead of doubling my code size with yields, callbacks and futures.

Let's talk about Twisted (sorry couldn't resist WAT video reference). I remember for years searching for parallel libraries to parse exotic protocols. Regular Python library is there, but no, can't use that, sorry. Gotta go find or write one that returns Deferreds. You add a single Twisted module in your code, good luck! -- it ripples all the way to the top through your API and you are doomed being locked into the Twisted world forever.

When gevent and eventlet came around it was like a breath of fresh air. This is what sane concurrent IO looks like in Python:

http://eventlet.net/doc/examples.html

My fear is that many will just say fuck it, I'll just use Go/Rust/Erlang for IO bound concurrent problems.

It is nice having a benevolent dictator, except when he goes a little crazy, then dictatorship doesn't sounds so much fun anymore.

hosay123 13 years ago | |

Sane is in the eye of the beholder.. gevent looks nice, but I'd be very diffident when it comes to actually supporting it in production. It monkey patches the standard library and messes with CPython internals to achieve what it does, infinitely increasing the chance it will conflict with some other piece of code (for example, that bizarre ancient internal propriety library you're using that started life in Fortran, etc)

In the case of async I'm glad to see a from scratch implementation for the standard library. It's a weird area that necessitates some constructions that there is no really standard Python style for. You only need to look at Twisted and e.g. its method chaining to realize this stuff would need a thorough sanity rework before it ever became standard anyway.

Also, most other implementations take the approach of building their own little world. This is definitely true of Twisted. You write code for Twisted, not for Python. Gevent at least doesn't suffer from this.

rdw 13 years ago | | |

As someone who's contributed to Eventlet, I've always felt that the only way that it could get over this scary hurdle (and it is legitimately scary) is for it to be integrated within Python itself. Almost all of the weird problems come from fighting with the baked-in assumptions of the Python runtime. Eventlet does try to alleviate the scariness a little bit by allowing you to import "greened" modules rather than changing the global versions, but that has its own problems.

If it were integrated with Python, there would be no monkeypatching, no special magic, it would be just how things work. That said, I'm not at all surprised that Guido doesn't favor a coroutine-based solution; his opposition to general coroutines is as famous as his opposition to anonymous functions. (to clarify: I don't think the @coroutine decorator creates "real" coroutines, any more than generators were already coroutines)

Ixiaus 13 years ago | | |

I used gevent at first for a project that needed Async I/O and it worked really well, but then I switched to Erlang and I realized how poor a choice Python is for such tasks. The language really needs to be designed from the start for it (like Go, Rust, Erlang &c... Haskell wasn't designed from start for it but because of it's functional purity, bolting it on was "natural" but - it isn't so for Python IMHO).

RyanGWU82 13 years ago | | |

My company used gevent in production at very large scale, and we were extremely happy with it. In fact, we ported our existing Django and Flask applications to run under gevent, which was a surprisingly fast process. (Weeks, not months, to port rather large codebases.) We did have to be careful with third-party libraries, like Zookeeper clients, but that was worth the tradeoff. We got the performance of an evented structure without having to rewrite a ton of code.

pekk 13 years ago | | |

Have you actually got any war stories about gevent, or is this all guesswork about how you think it stands to reason that it would be bad?

Do you think this would be relevant if greenlets were adopted as a part of Python?

pjscott 13 years ago | | |

If your bizarre ancient internal thing doesn't need to do async I/O, then don't import it monkey-patched. I use Eventlet heavily in production, and this tends to be pretty easy.

adw 13 years ago | | |

Fortran libraries; you mean Scipy? A bunch of the Python numerics world (one of the big growth cases for Python!) are Fortran. LAPACK/ARPACK are still unmatched.

pjscott 13 years ago | |

YES. Thank you for saying this. Some of the async code that I maintain uses Twisted and some of it uses Eventlet, and the difference between them is night and day. The code using Eventlet is so much cleaner, so much easier to maintain, and (oddly enough) so much less magical than the Twisted stuff. This was written by the same people, and they're all really good programmers, so the obvious confounding variables are not an issue here. Eventlet and Gevent are just so much better.

Worried about monkey-patching? Then only monkey-patch the parts you need to be asynchronous. Worried about magic that you don't understand? Have a look at the code; the magic is actually pretty straightforward after you've paid a little attention to the man behind the curtain.

If you're interested in async stuff for Python, I urge you to have a look at Eventlet or Gevent.

simeonf 13 years ago | |

>Sigh, another Async framework.

>Yes it has nice features such can replace the reactor/hub thing. Has futures/promises/deferreds. That has all been done before in Twisted.

Yeah but - this will be in stdlib. And I think the hope is one-event-loop-to-rule-them-all will let the various frameworks play nice with each other. For instance - Glyph just mentioned to me that he doesn't use IPython any more to work on Twisted code because IPython now has a Tornado event loop which can conflict with the Twisted code he's playing with...

I don't think the hope is that this will be better than twisted or gevent in terms of implementation (obviously the API will be nicer than twisted given Python 3) just that it will be the standard by virtue of being in stdlib.

pekk 13 years ago | | |

Python 3 won't make a Twisted-like API any better than Twisted's API is.

stuaxo 13 years ago | |

Eh, the whole point is that this is at a lower level to all of those.

All the above frameworks will be ported to this and become interoperable - in a similar way to how wsgi works for the web.

ekimekim 13 years ago | | |

It won't be useful if the standard interface is, basically, Twisted.

Stack-based solutions like gevent, eventlet get screwed over by this.

twism 13 years ago | |

Tornado has a well documented (and least complicated regarding gevent & twisted are concerned IMHO) but not well known/used "inline callback" with yields to simplify async code.

http://www.tornadoweb.org/en/branch2.3/gen.html

From there, I made a wrapper to wrap potentially blocking logic in threads so I could have code that isn't async/non-blocking behave like it was.

https://gist.github.com/anonymous/5190528

njharman 13 years ago | |

Did you actually read the slides? Most of your comments and complaints are addressed and sound like non-nonsensical fud in context.

I wasn't a fan of stdlib async until reading these slides.

pekk 13 years ago | | |

This isn't FUD. The API itself (as mentioned in the slides, and the draft PEP) looks like Twisted.

If you are a fan of this API, try Twisted

eurleif 13 years ago | |

One thing I love about gevent is that you can share code between async and non-async. Most of my project benefits from async IO, but there's one part that needs to use a lot of CPU within a single process. So that part uses multithreaded Jython, the rest uses gevent, the common code is shared, and it all just works.

rdtsc 13 years ago | | |

Exactly, I was surprised how in the whole "ideas" mailing list discussion Guido had and in other forums that is dismissed as "meh" not even mentioned.

Discussions quickly turn theoretical and academic. "But you don't know when your green threads will switch, man, so I'll add yields in there for you". Yes, and then also make sure there is a complete universe of libraries.

Python is awesome not just because it is fun to write little hello world examples in (so is Logo), it is awesome because it is easy to GetShitDone(TM) quickly. The big part of GetShitDone(TM) quickly is reusing libraries not rewriting everything from scratch.

Using an exotic database for some reason -- great. Found a Python library to interact with it -- great. Oh but my framework is based on Deferreds and this one was written without Deferreds or this one returns Futures. Sorry, go write your own from scratch.

This has been the story of my life for 5+ years search or re-writing Twisted version of already existing libraries.

Now at least just adopt Twisted and go with it if they are going this route. But now, they are 'standardizing' on something new. I think had they done this in 2007, yeah rock on, that would have made sense. They didn't. What saved and kept Python on the back-end during the past 5 or so years was greenlet (eventlet and gevent). Guido is kicking all those people in nuts and saying, "no", we'll do Twisted now (with some changes).

simonw 13 years ago | |

If you're using monkey patching in the standard library for a language, something has gone very wrong.

jessaustin 13 years ago | | |

This is really a matter of taste. You should at least be aware that you're monkeypatching, and code and test accordingly. Many people have good results from monkeypatching, and even more have good results from calling well-written-and-tested libraries that monkeypatch.

fzzzy 13 years ago | | |

But it's just the tiniest bit of monkey patching! A wafer thin monkey patch!

codygman 13 years ago | |

Unicode support, concurrency/parallelism, and monkey patching were very big reasons for me using Go rather than python.

aba_sababa 13 years ago | |

The only reason I'm not doing that is because I'm anchored by Python's stats libraries.

visitorx 13 years ago | |

seconded, I doubt many a sane person would use Python for IO bound concurrency, they should definitely be looking at Rust (in the future slightly) and Erlang (currently the defacto standard for easy-of-use concurrency).

CoffeeDregs 13 years ago |

Perhaps the video makes more clear the rationale. E.g.

  Possible solution: "Standardizing gevent solves all its problems". 
  One of the responses: "I like to write clean code from scratch".
  Another: "I really like clean interfaces".

So I'd prefer that the BDFL work with the gevent folks to get it cleaned up and integrated while adjusting it to expose a "clean interface".

Perhaps the whole thing will make more sense once Guido provides more detail, but I'm underwhelmed and confused.

ekimekim 13 years ago | |

I find that unlikely: Guido doesn't like gevent.

Though yes, it's a nice thought.

fzzzy 13 years ago | | |

Guido has been resisting the stackless stack slicing assembly technique since I first learned about Python and Stackless Python in 1999. That's obviously never going to change.

Locke1689 13 years ago |

As Guido mentions, @coroutine/yield from is very similar to C#'s async implementation (with some differences like type safety).

Since Guido has the barest of descriptions on how this works, you may find the C# async description useful. [1]

[1] http://msdn.microsoft.com/en-us/library/vstudio/hh191443.asp...

johnsoft 13 years ago | |

Just to check if I'm understanding the presentation right, will the implementation involve compiler magic to turn this:

    @coroutine
    def getresp():
        s = socket()
        yield from loop.sock_connect(s, host, port)
        yield from loop.sock_sendall(s, b'xyzzy')
        data = yield from loop.sock_recv(s, 100)
        # ...

into this, similar to how C# does it? (let's pretend multi-line lambdas exist for a minute)

    def getresp():
        s = socket()
        loop.sock_connect(s, host, port).add_done_callback(lambda:
            loop.sock_sendall(s, b'xyzzy').add_done_callback(lambda:
                data = loop.sock_recv(s, 100).add_done_callback(lambda:
                    # ...
                )
            )
        )

Or will the `yield from`s bubble up all the way to the event loop and avoid the need for that?

rdtsc 13 years ago | | |

No magic there.

It is Eventlet and Gevent have that magic. Here is how that looks:

    def getresp():
        s = socket()
        s.connect((host,port))
        s.sendall(s,b'xyzzy')
        data = s.recv(s,100)

Compare that to any of the above. This is what is thrown away in favor of 'yield from' and @coroutine mess coupled with a completely parallel set of IO libraries.

ufo 13 years ago | | |

I don't understand your question. From the implementation perspective Python doesn't rewrite things to continuation-passing-style but the end result should be the same.

masklinn 13 years ago | | |

Python does not do AST-rewriting at compilation, `yield` and `yield from` will handle stack reification for freezing and thawing of coroutines.

So chains of `yield`s and `yield from`s will bubble to the event loop.

fzzzy 13 years ago | | |

The yield from has to be explicitly bubbled all the way up the call chain to the main loop.

pekk 13 years ago |

Still puzzled why the primary inspiration for this is Twisted, which has some really ugly APIs.

timc3 13 years ago | |

Some people stick with what they know. Twisted is horrible to work with though

zubinmehta 13 years ago | | |

that rhymed!

ceronman 13 years ago | |

Twisted with inlineCallbacks is actually quite nice IMO. It's very similar to C# async/await.

jlgreco 13 years ago | | |

Twisted's inlineCallbacks singlehandedly turn Twisted in my mind from an abomination into something that is a joy to work with. In lei of a more hands off Erlang/Go approach, I am convinced that style is the only way to go.

lucian1900 13 years ago | |

Twisted has excellent APIs. Its Transport and Protocol abstractions in particular are extremely handy.

ekimekim 13 years ago |

A minor gripe (seperate from all my other gripes, which other people have already talked about):

"...run code in another thread - sometimes there is no alternative - eg. getaddrinfo(), database connections"

Just thought I'd mention that async-supporting DNS libs do exist (eg. gevent ships with C-ares), and in particular I've used async postgres database connections in both C and gevent. The code to gevent-ise psycopg2 connections is about 10 or 15 lines, iirc.

masklinn 13 years ago | |

> The code to gevent-ise psycopg2 connections is about 10 or 15 lines, iirc.

Because psycopg2 has supported async OOTB since 2.2 by exposing a pollable socket: http://initd.org/psycopg/docs/advanced.html#asynchronous-sup...

There are limitations though, as noted by the docs: COPY and LOs don't work.

judah 13 years ago |

>> "@coroutine / yield-from are very close to async / await in C# 5"

Cool to see languages learning from one another.

VeejayRampay 13 years ago |

Reading that presentation, it seems that Python has way too many Asyncronous I/O libraries/frameworks on its hands (not to be inflammatory though, I see it as a chance).

I really wonder why that is not the case in Ruby. I mean there are some, but there's mostly confidential and there doesn't seem to be much interest around them. Especially not to the point that the project leader would take a stab at it.

Good on Python anyway, competition is good.

NDizzle 13 years ago |

This powerpoint viewer would be much nicer to use if I could hit spacebar to skip down a page.

kzahel 13 years ago |

Does anyone have a link to the video? Or for various PyCon 2013 videos in general?

wting 13 years ago | |

AFAIK, all PyCon 2013 talks were recorded and are being processed. Once they're finished the videos will be uploaded here:

http://pyvideo.org/category/33/pycon-us-2013

opminion 13 years ago |

Wait, Guido is proposing implementing INTERCAL'S COME FROM? (as yield from)?

csears 13 years ago | |

For anyone curious, INTERCAL was originally a joke language which included a COMEFROM instruction that acted like GOTO in reverse: http://en.wikipedia.org/wiki/COMEFROM

Python's "yield from" hands off execution to a sub-generator: http://www.python.org/dev/peps/pep-0380/

opminion 13 years ago | |

This was not meant to be a silly joke, but a serious statement. The use of capitals is warranted by the language's syntax.

nixarn 13 years ago |

I think it's a great idea. I haven't tried Twisted and having to install some 3rd party component to get it working doesn't sound tempting, however being supported by default, does.

chris_mahan 13 years ago |

dropbox.com is blocked at work. Anyone have an alternate link? I can see it on my cell phone, but generally pptx don't display well on 4 in screens.

timc3 13 years ago |

If Guido still worked at Google would this have been hosted on google docs and I would have been able to read it without Office?

pekk 13 years ago | |

I am reading it in the browser on Linux, so I'm pretty sure Office isn't required

timc3 13 years ago | | |

Doesn't work on this ipad. And I hate using a laptop in the bath.

jurre 13 years ago | |

I'm reading it in the browser in Chrome on OSX, you don't need office!

benatkin 13 years ago | |

It's the dark side of dogfooding.

hosay123 13 years ago | |

Did you even click the link? Renders in the browser here.

benatkin 13 years ago |

Why does Guido think this is general purpose enough to add to Python but that the scientific features to make it competitive with R aren't? Is he envious of node.js?

cdavid 13 years ago | |

The scientific community is not that interested in merging into the stdlib.

Also, the main point of this is to allow for different async libs to find some common ground to stop the madness of having twisted-specific, tornado-specific, etc... The scientific community does not have this pb because everybody uses numpy.

coldtea 13 years ago | |

The scientific features are only of interest to the (drum roll) scientific community.

Async interests potentially everybody, including the server guys, the backend guys AND the scientific community.

Not to mention that this is a few contained classes, whereas the scientific stuff is tons and tons of code to be included into Python, including lots of Fortran and C, that would more than triple the size of the standard library.

Lastly, node.js? Lots of languages have a good story for async, from C# and Scala, to Go and Rust...

xradionut 13 years ago | |

Guido lets Enthought, Continuum and programmers of that ilk take care of the science side of Python.

Demiurge 13 years ago | |

I think because scientific features are not fundamental tools of expression, while he is working a language that is trying to be the foundation (most general) for the more specific libraries or tools.