Recording and visualising the 20k system calls it takes to "import seaborn"(blog.mattstuchlik.com) |
Recording and visualising the 20k system calls it takes to "import seaborn"(blog.mattstuchlik.com) |
There is then this neat tool to visualize the data. https://kmichel.github.io/python-importtime-graph/
Highly recommend to find the worst imports affecting your program startup time.
In general, the python community values tend towards functionality over performance. For example, large modules (looking at networkx here) will often import a bunch of there submodules in their __init__.py, which means all modules now end up loaded even if you didn't need them.
I've never tried https://pyoxidizer.readthedocs.io/en/stable/oxidized_importe..., but it compiles all the imports into one, memory mapped file, that _may_ speed up the importing.
Having everything compiled to bytecode also helps a bunch.
I'm going to go back to learning more C and Forth... And shake my fist at passing clouds :)
When I started on the project, page loads often took 10 seconds or more. The web application is used by about 20 people and that was enough to bring their single beefy server to its knees. Someone in NY tried scraping the site the other week and the site became completely unresponsive. They resorted to banning the IP to keep the website up. The reasons it was slow were all the usual culprits - a misused ORM being the main one.
It’s a nice language, but I really felt like I’d been transported back in time a few decades working in it. It feels like I’m using a computer from the 90s where performance choices matter again because the language is so slow. And where dependency management is a circus of half working tools and half hearted attempts at versioning. Packages conflict with one another. Some “pinned” package versions have apparently rusted and won’t actually install on my computer. And the system to install packages locally was obviously bolted on, badly, long after the horse had left the gate.
It reminds me of working in C in the early 2000s. I never thought I’d say this but it makes server side JavaScript with npm look positively modern and fast by comparison.
I use a dev machine that's quite archaic compared to a modern server, a 2nd gen i5 ThinkPad to be precise, that struggles to top 20ms for a request including loading a user and data object, joined tables and all, via ORM from Postgres running locally with a few hundred thousand records in said tables, before touching anything like explicitly adding caching.
Check your indexes, joins, general DB design and in-app looping. Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.
Well, there was one exception. The little import statement to import the Oracle database client took maybe 15 seconds. MySQL for the win :)
(I would not recommend MySQL for new applications today, although I might recommend it over Oracle…)
> The reasons it was slow were all the usual culprits - a misused ORM being the main one.
So, slow queries.
I have not has such bad issues with dependency management either. Not even with old stuff someone else wrote years ago.
I wouldn't number crunch in Python without something like numpy because you'll pay the cost of Python's dynamism for nothing but a lot of work has gone into making Python's primitives and standard library performant. I steal algorithms from CPython all the time.
You have such unusual hobby, my friend!
To engage with your point: loading a dynamic library in a regular language takes significantly less than 20k syscalls. Probably 20-40 for C on Linux. Python is uniquely inefficient. On most plots comparing resource use by different languages, in order to even show python together with regular languages like Java and C, either you use the log scale, or everything but Python is shown as a single point.
Of course, most people use Python to glue together stuff written in C, so it’s not that big of a deal, but it becomes a problem when people forget pure Python code is literally hundreds or thousands times slower than a “regular” program doing the same thing.
Why would you expect that to decrease the number of syscalls you need? The syscalls are there because the program needs the OS to do things. That need is driven by the application domain, not by the programming language you use.
Maybe some of them are. Many of those syscalls are there because Python (not the core program someone is creating, but rather its platform) needs the OS to do things.
Importing an empty python file takes 28 syscalls (30 measured by their tool, but the last two are closing out the trace not actually related to the import). 29 syscalls if you have any text in it (presumably more for larger files).
The logical equivalent in C for many portions of the import process in Python happen at compile + linker time, not during execution. So while it might not be a pleasant experience to develop, a C equivalent of many Python programs would involve far fewer syscalls at execution time.
I've worked at places where we've significantly patched the logic (in a way which breaks compatibility in some cases, so couldn't be up-streamed) which makes Python startup / module loading with hundreds of paths in $PYTHONPATH orders of magnitude faster...
I look at posts like this and cry.
Inevitably this comment is followed by quiet blinking as they digest this and then this question: “Are you saying we need to scale up one hundred times bigger?”
Sigh…
Also, with DevOps pushing out traditional administrators, companies are often spending way more on infra than needed.
DevOps teams should be looking at CPU spikes, and should be performing RCAs, and they should be maintaining resources in a healthy state, and they should reject/revert changes and notify problem areas in code by product focused devs.
Product devs, for the most part, are only implementing human lex traces to debug business logic when it arises. Product devs are not equipped with the knowledge to identify system errors that are not "bugs in the code", i.e. they will not be good at telling you why SPROC_LXC1 fails as a result of making a ExcelParserFactoryFactory
"Engineer"?!? Given above description, that makes me cry.
Aggressive anti-adblock plugin used.
> Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.
I’m not so sure about that. It’s hard to run the experiment, but I’ve never seen a nodejs app run anywhere near that slowly. The default-synchronous nature of Python combined with its mediocre performance for straight code magnifies the impact of any bad design choices. At least in a nodejs application your server can happily run many sql queries at the same time, or do other work while it waits for the database. I’m sure sufficiently mediocre web server code can bring nodejs to its knees. But in a decade of working with node, I’ve never seen it done. Certainly not in a web app with only 20 users.
— Charles Babbage, Passages from the Life of a Philosopher (1864), chapter 5, Difference Engine No. 1
It turned out that this was also happening for all static assets. Oops. And the site is covered in very small images. Double oops.
All told, to load a single page the server was making over 150 sql queries. And because it’s Python, those queries were all issued with blocking code. More than enough to keep the server busy for ages.
I just don't see how a person could spend 20 years using python and still can't figure out that you shouldn't hammer nails with a microscope.
It just worries me when I sometimes see those same Jupyter notebooks running in production, crunching 100s of terabytes of data. Maybe I’m wrong, but I didn’t get the impression everyone realizes exactly how wasteful that is. I guess AWS credits are easy to come by.
One thing Google did well back in the day, was making resource costs report in SWE/hours, the idea being that you see if you should go and rewrite something. If it cost 100 SWE/h to run, and it only took you a day to cut that in half, you should do it.
Yes, because you're importing a library that does a lot more than just print a plot. A purpose-built Python program that just printed the plot, nothing else, would need a lot less than 20k syscalls too.
(Of course an import call in Python does a lot more, but the end result is roughly the same as calling `dlopen` in, e.g., Swift.)
A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team. And I really think if they’d had better intuition about Python’s performance under different scenarios, they could have saved a year of effort. This is why I feel it’s worth having frank discussions about trade offs when it comes to this language.
It’s incredibly useful, but people in the community aren’t clearly told about its limitations. (Especially wrt performance, but also maintainability.)
Sure but there's a trade off, no? Go is typically 3x the code than python. And C++ is 10x the complexity easily.
There was one point back when I stopped coding C++ where one coder might not understand what another C++ coder was doing because the standard was so large.
> A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team.
You know, I have horror stories about C++ and Java as well. Usually that kind of blame goes to management for not understanding the issues up front. Pretty soon, I'll have slew of stories about go misusage as well.
Python is certainly very terse and expressive. I like writing Python, it’s fun. And it hides a lot of problems from the programmer, but that’s not the same as being simple.
Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.
Anyway, it’s about picking the right set of trade-offs, as you say. But the trade off in performance is 1:100, and that’s so punishing at scale that all other considerations kind of fall by the wayside.
Can you cite a source/example for that? I cannot imagine an optimized C program that doesn't blow python with numpy out of the water. Even a poorly written C program is likely to be 2x faster simply because it doesn't have to round trip operations from C to python and back.
I found some metrics after 30 seconds of googling.
I'm not your monkey.
You haven't lived until you've argued with a C++ language lawyer.
> Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.
Python is great. It has a lot of syntax sugar, but it's also easy to read and understand what it's doing. They teach it to elementary school kids. But they use it in F500 companies. And it has made huge strides into the scientific computing, because it's relatively easy to call existing C/Fortran libraries.
Go's experience by comparison is awful. Their community is an anti-social gate-keeping echo chamber. Their FFI is awful. Their language design is awful as well.
Edited to add: I feel like Go got popular because Rob Pike had no problem bad mouthing other languages. "Python/C++ are so terrible...".
Consider Rust on the other hand, where python and Rust seems to be getting along quite well. Rust seems to care about the coding experience. I think that makes a difference.