Threading in Python(nryoung.org) |
Threading in Python(nryoung.org) |
Here's the problem. Threads are really useful only if you can share memory between threads. If you can't share memory, you're usually better off using many processes.
Threads in Python (ie. CPython) can still be useful for I/O multiplexing or executing native code in background worker threads via FFI and releasing the GIL while doing so. For I/O multiplexing, there are better options than Python threads (select/poll/kqueue/epoll system calls and frameworks like twisted that use them).
In most applications, threads probably should not be used in CPython/CRuby code as they provide little performance gain compared to the complexity and overhead they add.
Got parallel needs at your core? Look at Erlang or Haskell. If parallel or distributed work is mission critical, go with a language that has such things at its very soul. Python is a great language, but it is being enthusiastically bent to do things it is not top of the class for.
Want to handle more concurrent connections per python web server? If WSGI in Gunicorn is not enough, stop trying and use a load balancer to spread work between more servers.
This is almost always the case on commercial projects. Extremely few companies and clients will be perfectly fine with "yes, I'm a Python expert, but this would be best done in Erlang; I will need an extra week to research, learn, and implement this on top of the month the project would otherwise take." In most situations you either do it the way you know how to do it, eat the extra time (not practical in most cases), or you lose the contract/job.
Of course this is specific to client work, but I think most of us are likely doing that or something similarly limiting for at least half our waking hours, making it fairly relevant when considering ideas like "using tool X for job A is not a good idea when tool Y exists." It's correct but ignores too many practical situations to be very useful advice.
[1] http://eli.thegreenplace.net/2011/12/27/python-threads-commu...
That is a non-sequitur to me. The first half I'm on board with: generally, you use threads to improve performance, but because of the GIL in Python, you may not get the parallelism you want. If you're calling into libraries that don't hold the GIL, then great, but that means you have to be very aware of what's going on below you.
The second half does not follow, though. Typically, that threads share the same address space is the entire reason we use threads over processes. And the reason comes from improved performance: if the thread share an address space, you don't need to copy the data. Copying data is expensive. (It also means you're susceptible to a whole host of synchronization bugs.)
If avoiding copying is not a top problem, then you may be wasting your time; there's nothing wrong with using abstractions more appropriate to your environment.
If the program scales out, it should be less important to micro-optimize inside each process because it's so much cheaper just to use another core or another node.
It's getting boring to hear all discussions of concurrency reduced to threads, and threads reduced to the GIL in CPython. It's really not that simple.
But my point here is that the statement the author made, as far as I'm able to understand it, makes no sense. That is, I think he tried to discuss these issues, but I don't think he understands them well enough to do so. I think you and I are in agreement, unless you are saying that what the author stated does make sense.
This is not a Python-language issue, it is an implementation-specific issue. Not all implementations of Python have the GIL.
In reality, threads do run concurrently. Because in the CPython (itself written in C) with the famous GIL, it is normal and realistic to do I/O and heavy computation in C code that releases the GIL, enabling threads to work concurrently. There's no reason this information shouldn't be part of discussions on threads in Python.
That doesn't mean threads are great for everything, but the severity of the case is easily and frequently overstated.
for i in xrange(len(item_list)):
Could be more clearly written as: for _ in item_list:To be fair the best way of judging this is the reverse penalty clause - so this job must be done by June 1. Ok and if it is three weeks late as we use erlang? A penalty of 1000 dollars a day? Wow - ok so if I am a month early you can pay a bonus of 20,000 ? No - so perhaps we are not as time critical as we feared ? Would you rather save 20,000 in ongoing maintence costs and general uncertainty over how good the solution is for three weeks delay that would likely creep in anyway?
Have I told you erlang has an uptime of 99.999 % proven over twenty years?
still for performance
Sorry, but you this is misguided. You are almost certainly using an OS that gives you concurrency, even if you only have one core of execution on your cpu. Concurrency is only natural, and is in fact a requirement when you start talking about GUIs. Even for something like data processing, you usually have a thread doing the processing, and another thread controlling everything. The advantage of threads is the natural separation of tasks, simplifying how programs are written. Not having the advantage of performance in python is unfortunate, however this is only one use for threads, which is by far not the most common.
The pattern is that instead of one monolithic process that must know about and do everything, sometimes it's easier to think about several independent processes that only know about one domain, and make requests or give answers to other processes that know about other domains.
Various image processing algorithms I use also work concurrently over the image, agents working in different places, and then deciding who is doing best and letting each other know so resources can be concentrated and refocused.
Unless you've worked for a long time in an inherently parallel environment it's hard to see things as anything other than serial processes. For me it can sometimes be hard to see how to serialize things, as many things are naturally parallel. I've worked for nearly 20 years on systems with at least 100 processors and limited communications, so parallel algorithms tend to be what I see first.
Sorting, for example. Merge sort is inherently a parallel algorithm. Why sort that part, then this part? Why not do them both at once, and then start merging while the remainder of the sort is still going on? Or quick sort. Once you have divided your vector into two pieces, why sort them serially?
Sieve of Eratosthenes can be thought of as inherently parallel, as can many factoring algorithms. Why factor this small number, then that one, then that one, and so on, when you can factor them all together?
These are all things that are naturally expressed as concurrent algorithms, processes, or calculations.