Comparing a web service written in Python and Go [pdf]

Comparing a web service written in Python and Go [pdf](indico.cern.ch)

85 points by guai898 10 years ago | 75 comments

mpdehaan2 10 years ago |

I've been seeing a lot of Python vs Go stuff lately and I think a fair amount of the folks involved in these are not aware of general Python web architecture patterns.

Of course something compiled directly is going to be a bit faster, but development time is important too. Python has more libraries and is (for many people) probably faster to write.

Serving multiple requests is best utilized using a preforking webserver in front of Python, whether Apache, nginx, etc. This allows multiple requests in without any async voodoo code. Twisted for example is not the right answer in this case, because it doesn't get you multiple processes and messes up the way you write code (async event driven code is more time consuming to write/debug).

On the backend, your webserver does not start longrunning backend processes, but you can launch them using things like celery, which is a process manager that allows you to start jobs and so forth. Celery can run on any number of machines, and your backend can scale independently of your frontend if you wish.

Historically, some very computational parts of Python were often written with C bindings. While I haven't done so, things like Cython may also be promising for extensions. There's also things like ctypes for quickly just taking advantage of native libraries in a Python function.

Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.

(I'm also really curious to see how the typing options in Python 3 play out)

Anyway, I mostly wanted to point out as most people are doing web services that you should be fronting Python with some sort of web server that allows preforking, and then the concurrency issue, in my experience, becomes not a thing.

Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.

dec0dedab0de 10 years ago | |

I've been seeing a lot of Python vs Go stuff lately and I think a fair amount of the folks involved in these are not aware of general Python web architecture patterns.

I absolutely agree, but I also think that deploying python on the web still has too much of a learning curve. Even the standard nginx > gunicorn > wsgi model is kind of a pain. Couple that with celery, and init systems, and you're basically down a sysadmin rabbit hole.

yeukhon 10 years ago | |

> Anyway, I mostly wanted to point out as most people are doing web services that you should be fronting Python with some sort of web server that allows preforking, and then the concurrency issue, in my experience, becomes not a thing.

Spot on. Concurrency vs parallelism, and clean distinction of responsibility (web server vs backend threads).

> Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.

This is painful. In Javascript I can be careless (well to a great extend for people like me likes magic) using promises. Python can achieve this too but with a great effort of learning either coroutines, gevents, or asyncio. Though I have to admit that Javascript has its own problem facing parallelism.

I have done things with gevents, spawning greenlets and respond to user immediately. The thing is, backend should always be stateless, so worker models like celery and rabbitmq pub/sub and etc are more popular.

whyever 10 years ago | |

> Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.

That really depends on your requirements. If you need multithreading (not multiprocessing), you cannot use Python.

dec0dedab0de 10 years ago | | |

That really depends on your requirements. If you need multithreading (not multiprocessing), you cannot use Python.

About to show my ignorance, but when is multithreading useful when multiprocessing is not? Assuming it is a use case that is suited for a high level dynamic language to begin with.

dragonwriter 10 years ago | | |

> If you need multithreading (not multiprocessing), you cannot use Python.

If you need multithreading of Python code for parallelism, you cannot use CPython (and, consequently, can't use Python 3.)

Both IronPython and Jython use native threads without a GIL.

mratzloff 10 years ago | |

Go has an object model.

Rexxar 10 years ago | | |

> Go has an object model.

I'm no sure go creators would agree with this. The go object model is different enough from other object models that it would not qualify as one for a lot of people.

adolgert 10 years ago | |

These are great ideas. I told Valentin to drop by. Because DAS is an aggregator with an expert-system style query language, there is sharing among the Python services for caching. It's the caching that makes async code a good option. Preforking might work against this without significant increase in complexity in order to communicate with a single local cache. Remember that this is a web service for the Large Hadron Collider. Nothing about that is small.

andor 10 years ago |

Basically their Python version ("3 thread pools, 175 threads") is synchronous and single (OS)-threaded, while the Go rewrite uses goroutines and multiple OS threads. The fact that their Python version takes "minutes to startup" indicates that a rewrite was necessary anyways.

Go is a good tool for the job, Python threads are not. asyncio or one of the event-based IO frameworks should work much better.

As for the problem of sharing data between processes (slide 5): it appears that this service is read only? If that's true, what do you need to share? Every process can have it's own connection pool. You don't even need multiprocessing, just use SO_REUSEPORT and start your application multiple times.

mtanski 10 years ago | |

You could probably get decent performance for a similar application written in another language (then Python) using 175 threads. 175 threads is not that big of deal, the OS can manage it pretty well. It's only when you start talking about thousands of individual connections and thousands of threads that you need to worry. Python sucks at that at low number of threads (GIL).

fauigerzigerk 10 years ago | | |

175 threads use a lot of memory and cause a lot of context switching. I would never write an application so that it needs 175 OS threads, because if it needs that many, how many am I going to need down the road? It's an ominous sign for scalability in my view, even if it works for a while.

[Edit] I'm a assuming a CPU with 8 cores, not some 64 core monster.

mherrmann 10 years ago |

Anybody else find it difficult to believe that a 4k LOC Go project takes 26k LOC in Python?

dekhn 10 years ago | |

Typically rewrites like this focus on core functionality; I truly down the project is a 1:1 equivalent. There may be factorings, as well (functionality included as part of Go).

tkinom 10 years ago | | |

I rewrote a python webservice in go once and found go need A MORE boilerplate code because of json structures need to be strictly defined for better or worst.

In python, it is at lease 10x less code for json parsing. json.dumps(), json.loads() is basically what I needed. The exception handle fill in all the undefined easily.

Also, go used a lot more memory because the GC is not under my control. In python, I can tell the GC to collect and one can see the memory shrink immediately. In go, that was not the case for me. A program that build GB of search index database in go end up using 4x the amount of memory as compare to python. Golang at that time (2+ years ago) lack the gc debugging infrastructure for me to resolve the problem.

mherrmann 10 years ago | | |

Yes. I really do feel like we are not being told the whole story here.

rbanffy 10 years ago | |

It seems it's looking at everything outside the core libraries. Go has a built-in templating engine. That alone may explain the LoC difference.

fauigerzigerk 10 years ago | | |

It has to be something like that, because no two languages on this planet differ that much in code size without counting library stuff.

FraaJad 10 years ago |

This looks like a report written by someone who is trying to show how their $favorite system is better than the $other one.

Best opensource the code for both and the benchmarks and have people go at it.

laumars 10 years ago | |

Not really. It's just a report written by someone who has an existing code infrastructure and is experimenting with alternative approaches so wrote some basic scripts for benchmarking.

FraaJad 10 years ago | | |

While it may be true, when it is posted without the proper context and an unbiased way to assess the outcome, this presentation will be used as "proof" that Go is better than Python. (which it might be, but not everywhere)

aidos 10 years ago |

I haven't done any real work in Go yet but this sounds like one of the (many) use cases it's well suited to.

Unfortunately this overview is light on any meaningful details. As a general rule a rewrite of any project will result in fewer lines of code, however, in general, a rewrite of any project is a terrible idea.

Given that this seems to be a situation in which you have a lot of blocking waiting for concurrent requests, why not try something like gevent?

It's good for people to try different approaches and technologies. I'm glad they managed to have success with Go, that's good for everyone. It would have been interesting for the reader to see some of the gory details of hacking around with the existing codebase to see some of the ideas that may (not) have worked.

alexchamberlain 10 years ago |

Yet another Go article not fairly comparing technologies. What about a Python implementation that used `asyncio`, for example? What about `PyPy`?

rbanffy 10 years ago | |

It's a Go rewrite of an existing, and probably old, Python application. You are asking them, who already did a rewrite in Go and kindly provided their assessment of the process, to also to a Python rewrite using more modern approaches.

Feel free to rewrite their old Python app in Python for free. They may thank you and even use your port.

laumars 10 years ago | |

I think they were just comparing their existing Python deployment infrastructure to a generic Go set up (there are also ways to optimise Go that wasn't explored in that article). It wasn't meant as a "look how much Python sucks compared to Go" type article like a few seem to have taken it. More just disclosing the results of some internal testing they've been doing.

On that note, I would suggest that if you think they could see big gains with little code refactoring simply by switching Python frameworks or even to a different Python runtime, then maybe you should contact them. I'm sure the author would be open to ways to increase their throughput with less developer overhead.

tobz 10 years ago | |

Do you have an example in Python of doing fan-out/fan-in? I've done it in Go before, and didn't find it particularly nice to look at (although it did work, and worked well).... so I'm curious what a Python example would look like.

kozak 10 years ago |

I'm not saying you shouldn't use dynamic languages at all (in fact, I'm developing in one right now), but you should keep in mind that you are paying a computational price for that dynamism every time a line of your code is executed.

collyw 10 years ago | |

And you are paying for developer time otherwise.

laumars 10 years ago | | |

Static languages don't take that much longer to write than dynamic languages. But on the flip side: a more performant software stack (regardless of language paradigm) does reduce your sysadmin time due to them having to maintain a smaller server cluster, as well as reducing your hardware / cloud costs. Generally speaking, of course. But this is quite a generalised discussion as is.

fauigerzigerk 10 years ago | | |

This has never been true for me at all. On the contrary. So either people differ a lot in the way their brains work or it's because dynamic languages are often used for different tasks than static languages. Maybe both. I'm not sure.

iamd3vil 10 years ago |

Anyone who thinks it's difficult to program in Erlang, please have a look at Elixir(https://elixir-lang.org). It's quite nice to work with.

brokentone 10 years ago | |

This does not seem relevant to Go or Python.

flippant 10 years ago | | |

Erlang is mentioned as a solution on page 6.

>[Go is] way to easy to program than Erlang

mbreese 10 years ago |

Can anyone comment on what the CMS DAS web service is? I'm having a hard time understanding what it is supposed to do. I'm sure the audience knew or maybe it's obvious and I'm just missing something.

Analog24 10 years ago | |

It's essentially a way to look up meta data about the different data files produced by the CMS detector. There are petabytes of data produced by the detector and these are stored in countless data file. In order to determine which datasets are available and right for your particular analysis you would use the DAS system to look for them and find out where they're located. This is a complicated task b/c the petabytes of date are distributed across the CMS computing grid that spans many dozens of institutes across the globe.

andrioni 10 years ago | |

I'd guess it's this system: https://cmsweb.cern.ch/das/

mbreese 10 years ago | | |

CMS = Compact Muon Solenoid particle detector

DAS = data aggregation service

cptwunderlich 10 years ago |

Look at the scales for the graphs on page 9. What a ridiculous comparison...

esseti 10 years ago |

are the conclusion true in general? I mean, sw written in go performs better than the one written in python

mhd 10 years ago | |

Software rewritten in Python often performs better than the original in Python, too.

jonathan_s 10 years ago | |

Sure it's true. And software written in C or assembly language often also performs better than those written in Python.

SjuulJanssen 10 years ago |

I think a more true comparison would be if the author used a reactor/async based solution in his python code