Async I/O for Python 3 (dropbox.com) |
Async I/O for Python 3 (dropbox.com) |
Yes it has nice features such can replace the reactor/hub thing. Has futures/promises/deferreds. That has all been done before in Twisted. Yields are cute and there was monocle, I wouldn't say it exactly took off : https://github.com/saucelabs/monocle
Twisted has inlineCallbacks that use yields as well. Just import Twisted into stdlib then and use that.
I am surprised that gevent was dismissed. Ok, there is also eventlet, if someone doesn't like gevent. Monkey patching is scary? Is it really _that_ scary? Most sane and portable IO code probably runs on that today. Why? Because there is no need to create a parallel world of libraries. Write a test does it pass? Does it handle your use case? I'll take not knowing if my green threads switch tasks and add another green thread lock, instead of doubling my code size with yields, callbacks and futures.
Let's talk about Twisted (sorry couldn't resist WAT video reference). I remember for years searching for parallel libraries to parse exotic protocols. Regular Python library is there, but no, can't use that, sorry. Gotta go find or write one that returns Deferreds. You add a single Twisted module in your code, good luck! -- it ripples all the way to the top through your API and you are doomed being locked into the Twisted world forever.
When gevent and eventlet came around it was like a breath of fresh air. This is what sane concurrent IO looks like in Python:
http://eventlet.net/doc/examples.html
My fear is that many will just say fuck it, I'll just use Go/Rust/Erlang for IO bound concurrent problems.
It is nice having a benevolent dictator, except when he goes a little crazy, then dictatorship doesn't sounds so much fun anymore.
In the case of async I'm glad to see a from scratch implementation for the standard library. It's a weird area that necessitates some constructions that there is no really standard Python style for. You only need to look at Twisted and e.g. its method chaining to realize this stuff would need a thorough sanity rework before it ever became standard anyway.
Also, most other implementations take the approach of building their own little world. This is definitely true of Twisted. You write code for Twisted, not for Python. Gevent at least doesn't suffer from this.
If it were integrated with Python, there would be no monkeypatching, no special magic, it would be just how things work. That said, I'm not at all surprised that Guido doesn't favor a coroutine-based solution; his opposition to general coroutines is as famous as his opposition to anonymous functions. (to clarify: I don't think the @coroutine decorator creates "real" coroutines, any more than generators were already coroutines)
Do you think this would be relevant if greenlets were adopted as a part of Python?
Worried about monkey-patching? Then only monkey-patch the parts you need to be asynchronous. Worried about magic that you don't understand? Have a look at the code; the magic is actually pretty straightforward after you've paid a little attention to the man behind the curtain.
If you're interested in async stuff for Python, I urge you to have a look at Eventlet or Gevent.
>Yes it has nice features such can replace the reactor/hub thing. Has futures/promises/deferreds. That has all been done before in Twisted.
Yeah but - this will be in stdlib. And I think the hope is one-event-loop-to-rule-them-all will let the various frameworks play nice with each other. For instance - Glyph just mentioned to me that he doesn't use IPython any more to work on Twisted code because IPython now has a Tornado event loop which can conflict with the Twisted code he's playing with...
I don't think the hope is that this will be better than twisted or gevent in terms of implementation (obviously the API will be nicer than twisted given Python 3) just that it will be the standard by virtue of being in stdlib.
All the above frameworks will be ported to this and become interoperable - in a similar way to how wsgi works for the web.
Stack-based solutions like gevent, eventlet get screwed over by this.
http://www.tornadoweb.org/en/branch2.3/gen.html
From there, I made a wrapper to wrap potentially blocking logic in threads so I could have code that isn't async/non-blocking behave like it was.
I wasn't a fan of stdlib async until reading these slides.
If you are a fan of this API, try Twisted
Discussions quickly turn theoretical and academic. "But you don't know when your green threads will switch, man, so I'll add yields in there for you". Yes, and then also make sure there is a complete universe of libraries.
Python is awesome not just because it is fun to write little hello world examples in (so is Logo), it is awesome because it is easy to GetShitDone(TM) quickly. The big part of GetShitDone(TM) quickly is reusing libraries not rewriting everything from scratch.
Using an exotic database for some reason -- great. Found a Python library to interact with it -- great. Oh but my framework is based on Deferreds and this one was written without Deferreds or this one returns Futures. Sorry, go write your own from scratch.
This has been the story of my life for 5+ years search or re-writing Twisted version of already existing libraries.
Now at least just adopt Twisted and go with it if they are going this route. But now, they are 'standardizing' on something new. I think had they done this in 2007, yeah rock on, that would have made sense. They didn't. What saved and kept Python on the back-end during the past 5 or so years was greenlet (eventlet and gevent). Guido is kicking all those people in nuts and saying, "no", we'll do Twisted now (with some changes).
Possible solution: "Standardizing gevent solves all its problems".
One of the responses: "I like to write clean code from scratch".
Another: "I really like clean interfaces".
So I'd prefer that the BDFL work with the gevent folks to get it cleaned up and integrated while adjusting it to expose a "clean interface".Perhaps the whole thing will make more sense once Guido provides more detail, but I'm underwhelmed and confused.
Though yes, it's a nice thought.
Since Guido has the barest of descriptions on how this works, you may find the C# async description useful. [1]
[1] http://msdn.microsoft.com/en-us/library/vstudio/hh191443.asp...
@coroutine
def getresp():
s = socket()
yield from loop.sock_connect(s, host, port)
yield from loop.sock_sendall(s, b'xyzzy')
data = yield from loop.sock_recv(s, 100)
# ...
into this, similar to how C# does it? (let's pretend multi-line lambdas exist for a minute) def getresp():
s = socket()
loop.sock_connect(s, host, port).add_done_callback(lambda:
loop.sock_sendall(s, b'xyzzy').add_done_callback(lambda:
data = loop.sock_recv(s, 100).add_done_callback(lambda:
# ...
)
)
)
Or will the `yield from`s bubble up all the way to the event loop and avoid the need for that?It is Eventlet and Gevent have that magic. Here is how that looks:
def getresp():
s = socket()
s.connect((host,port))
s.sendall(s,b'xyzzy')
data = s.recv(s,100)
Compare that to any of the above. This is what is thrown away in favor of 'yield from' and @coroutine mess coupled with a completely parallel set of IO libraries.So chains of `yield`s and `yield from`s will bubble to the event loop.
"...run code in another thread - sometimes there is no alternative - eg. getaddrinfo(), database connections"
Just thought I'd mention that async-supporting DNS libs do exist (eg. gevent ships with C-ares), and in particular I've used async postgres database connections in both C and gevent. The code to gevent-ise psycopg2 connections is about 10 or 15 lines, iirc.
Because psycopg2 has supported async OOTB since 2.2 by exposing a pollable socket: http://initd.org/psycopg/docs/advanced.html#asynchronous-sup...
There are limitations though, as noted by the docs: COPY and LOs don't work.
Cool to see languages learning from one another.
I really wonder why that is not the case in Ruby. I mean there are some, but there's mostly confidential and there doesn't seem to be much interest around them. Especially not to the point that the project leader would take a stab at it.
Good on Python anyway, competition is good.
Python's "yield from" hands off execution to a sub-generator: http://www.python.org/dev/peps/pep-0380/
Also, the main point of this is to allow for different async libs to find some common ground to stop the madness of having twisted-specific, tornado-specific, etc... The scientific community does not have this pb because everybody uses numpy.
Async interests potentially everybody, including the server guys, the backend guys AND the scientific community.
Not to mention that this is a few contained classes, whereas the scientific stuff is tons and tons of code to be included into Python, including lots of Fortran and C, that would more than triple the size of the standard library.
Lastly, node.js? Lots of languages have a good story for async, from C# and Scala, to Go and Rust...
Tulip separates API from event loop mechanism.
As someone mentioned, if they instead standardized on greenlet then monkeypatching talk wouldn't make sense.
But that's just a nitpick.
Right on. That's the great part -- both a simple way to program and re usability of libraries.
So far, I see library ecosystem fragmentation as the biggest issue of all and nobody seems to want to talk to it.
Academically all the yields and co-routines look so cool, in practice when you need 5 libraries to help with some task, and now you have to re-write them -- not so cool.
A big reason I use (and enjoy using) python is because it doesn't feel like a "bolted on" solution. All current concurrency options for python feel bolted on to me personally.
The python internals weren't designed for this, which is the reason they have to use monkey patching. It doesn't mean that you can't make something work, but it means the language/interpreter sure aren't going to help you make it work.
I just couldn't write something which depends on speed and concurrency in python right now, knowing there are solutions much better designed for the problem. Python holds a special place in my heart, but unicode and concurrency aren't so good right now.
However, this future async support and unicode support in python 3 very much excites me!
But how does a tangled mess of callback1 callback2 callback3 feel when all you want to do is do a couple of db reads and writes while processing a simple shopping cart. Is that Pythonic?
> The python internals weren't designed for this, which is the reason they have to use monkey patching.
So fix the internals. Here is a practical way people use Python every day, make that the default, don't revert to some academic or callback mechanism.
> I just couldn't write something which depends on speed and concurrency in python right now, knowing there are solutions much better designed for the problem.
See that is what saddens me. gevent and eventlet do let you write reasonably good and concise IO concurrent code. Some have run large sites and deployments with it. I haven't found any major slowdowns or downsides to switch yet. Because I it is easy and simple to experiment, I'll always try Python first, even though later I might switch to Go or Erlang.
I actually did try out a solution based on gevent and eventlet before switching to Go.
Way back in the day [1] the proposal for merging Stackless into mainline Python involved removing Python's stack state from the C stack. However there are complications with calling from C extensions back into Python that ultimately killed this approach.
After this Stackless evolved to be a much less modified fork of the Python codebase with a bit of platform specific assembly that performed "stack slicing". Basically when a coro starts, the contents of the stack pointer register are recorded, and when a coro wishes to switch, the slice of the stack from the recorded stack pointer value to the current stack pointer value is copied off onto the heap. The stack pointer is then adjusted back down to the saved value and another task can run in that same stack space, or a stack slice that was stored on the heap previously can be copied back onto the stack and the stack pointer adjusted so that the task resumes where it left off.
Then around 2005 the Stackless stack slicing assembly was ported into a CPython extension as part of py.lib. This was known as greenlet. Unfortunately all the original codespeak.net py.lib pages are 404 now, but here's a blog post from around that time that talks about it [2].
Finally the relevant parts of greenlet were extracted from py.lib into a standalone greenlet module, and eventlet, gevent, et cetera grew up around this packaging of the Stackless stack slicing code.
So you see, using the Stackless strategy in mainline python would have either required breaking a bunch of existing C extensions and placing limitations on how C extensions could call back into Python, or custom low level stack slicing assembly that has to be maintained for each processor architecture. CPython does not contain any assembly, only portable C, so using greenlet in core would mean that CPython itself would become less portable.
Generators, on the other hand, get around the issue of CPython's dependence on the C stack by unwinding both the C and Python stack on yield. The C and Python stack state is lost, but a program counter state is kept so that the next time the generator is called, execution resumes in the middle of the function instead of the beginning.
There are problems with this approach; the previous stack state is lost, so stack traces have less information in them; the entire call stack must be unwound back up to the main loop instead of a deeply nested call being able to switch without the callers being aware that the switch is happening; and special syntax (yield or yield from) must be explicitly used to call out a switch.
But at least generators don't require breaking changes to the CPython API or non-portable stack slicing assembly. So maybe now you can see why Guido prefers it.
Myself, I decided that the advantages of transparent stack switching and interoperability outweighed the disadvantages of relying on non-portable stack slicing assembly. However Guido just sees things in a different light, and I understand his perspective.
[1] http://www.python.org/dev/peps/pep-0219/
[2] http://agiletesting.blogspot.com/2005/07/py-lib-gems-greenlets-and-pyxml.html> or custom low level stack slicing assembly that has to be maintained for each processor architecture.
Yeah I would personally also say that the advantages of writing the equivalent assembly for a handful of architectures outweigh re-writing / re-inventing high level library code dealing with concurrent IO.
I think Guido and a few others have decided to focus on some things rather than others and the Python ecosystem will suffer in the long run based on these decision. The ability to write beautiful concurrent IO code without a whole new async framework, is better, even if means breaking some C extensions, writing assembly, or not supporting exotic CPUs.
Programming world is not going to get less concurrent over time. Concurrency will spread more and more. There is already some momentum with gevent and eventlet and it is something that Node.js doesn't have -- and I see Guido turning away and making something worse.
Still, it is easier to build a fast server that can handle streams in Node than it is in Python. Async Python? I'll just stick to async JS in that case.
Except one thing, as you pointed out, people use Node.js -- 1) it is JS 2) V8 is fast.
Twisted's inlinecallback's and tornado's gen module gets rid of all the async spaghetti code. This is hard to do with nodejs but I still chose nodejs because the available libraries made my project quicker to develop.
"More control over state" makes me feel funny inside; if you haven't actually used Haskell then you would probably see that pretty much every language right now is immature in comparison to Haskell when it comes to "control over state".
I also think mutability makes reasoning about large concurrent and/or parallel programs much more difficult.
I have been using Twisted for 5 years full time and have also used eventlet and gevent. From talking to others, I have found few who enjoyed or loved Twisted. It was pretty much the only sane way to do concurrent, performant IO for a while. But then when green thread approach came about, I had never looked back.
All was well. Then one day Node.js showed up, and it seems it has started to eat Python's lunch -- fast, scripted development on the server side, with some reasonable concurrency. And it was faster too.
Python devs looked at it and couldn't believe their eyes. And I speculate many have concluded it was because everyone was in love with a callback based async IO paradigm. So that's my guess why we are seeing this proposal.
Basically yes, Python had Twisted for years, it had Diesel, Monocle, Tornado, and some other ones. I am aware of those and as you've read my comment you saw that I used Twisted enough to know its ins and outs (5 years).
> There are also async frameworks that make writing async code the same as synchronous code.
Yes there is inlineCallbacks and I used. Node.js also has async (https://github.com/caolan/async). But you don't address the main problem that I raised -- fragmentation of libraries. Python is great because it comes with batteries, and then you can find even more batteries everywhere, _except_ if you use an async framework like Twisted, which, percolates all the way through you API code. Once your socket.recv() returns a Deferred(), that deferred will bubble up all the way to the user interface. So you now you end up searching or recreating a parallel set of libraries.
> Twisted has a lot of libraries but it has so much going on that many developers find it too complex.
It is too complex with too many libraries for those who want to take it up but it is not complex and doesn't have enough libraries if you are in it already -- every library you use has to be Twisted now. That's the danger of inventing a new framework.
Yes it will be standard, but there is already a practical standard -- eventlet and gevent. This is somethin Node.js doesn't have. I will personally take monkey-patching and the danger that my function will context switch inside while doing IO over using Twisted. I saw a practical benefit from it at least.
Before it used to be code like:
def processShoppingCart(...):
d.addCallback(_cb1)
return d # (d is a Deferred)
def _cb1(...):
d2.addCallback(_cb2)
return d2 # d2 is a another Deferred
etc.Which with generators and inlineCallbacks turns into
@someDecoratorThatEnableUsingInlineCallbacks
def processShoppingCart(...):
.. do some work ..
yield <some_io_function_like_check_db>
.. do some more work ..
yield <some_io_other_io_function>
....
You get the idea. I am not too familiar with Node.js but I imagine it will help quite a bit. http://taskjs.org/
Also, if you want to see generators in v8 (spidermonkey has had them for years now) watch the issue: https://code.google.com/p/v8/issues/detail?id=2355