Recording and visualising the 20k system calls it takes to "import seaborn"

Recording and visualising the 20k system calls it takes to "import seaborn"(blog.mattstuchlik.com)

111 points by sYnfo 2 years ago | 53 comments

nsm 2 years ago |

Related to this, if you set the env var `PYTHONPROFILEIMPORTTIME=1` or run python with `-X importtime`, it will print out the cumulative and self times to import various modules.

There is then this neat tool to visualize the data. https://kmichel.github.io/python-importtime-graph/

Highly recommend to find the worst imports affecting your program startup time.

In general, the python community values tend towards functionality over performance. For example, large modules (looking at networkx here) will often import a bunch of there submodules in their __init__.py, which means all modules now end up loaded even if you didn't need them.

I've never tried https://pyoxidizer.readthedocs.io/en/stable/oxidized_importe..., but it compiles all the imports into one, memory mapped file, that _may_ speed up the importing.

Having everything compiled to bytecode also helps a bunch.

lights0123 2 years ago | |

At the other extreme, checkpointing the whole process once all imports have been resolved and restoring it for every execution can be used for frequently-run tools: https://github.com/albertz/python-preloaded

dmwilcox 2 years ago |

I've been writing python for going on 20 years now and while it was a good language to cut my teeth on thus sort of analysis brings only horror. Many thanks to the author for dropping into plain view.

I'm going to go back to learning more C and Forth... And shake my fist at passing clouds :)

josephg 2 years ago | |

Yeah. I recently worked on a small web project being developed at a university. The project is written in flask, and it presents a reasonably simple UI on top of some data living in a mysql database.

When I started on the project, page loads often took 10 seconds or more. The web application is used by about 20 people and that was enough to bring their single beefy server to its knees. Someone in NY tried scraping the site the other week and the site became completely unresponsive. They resorted to banning the IP to keep the website up. The reasons it was slow were all the usual culprits - a misused ORM being the main one.

It’s a nice language, but I really felt like I’d been transported back in time a few decades working in it. It feels like I’m using a computer from the 90s where performance choices matter again because the language is so slow. And where dependency management is a circus of half working tools and half hearted attempts at versioning. Packages conflict with one another. Some “pinned” package versions have apparently rusted and won’t actually install on my computer. And the system to install packages locally was obviously bolted on, badly, long after the horse had left the gate.

It reminds me of working in C in the early 2000s. I never thought I’d say this but it makes server side JavaScript with npm look positively modern and fast by comparison.

throwaway828 2 years ago | | |

That sounds like some quick kills to be easily made.

I use a dev machine that's quite archaic compared to a modern server, a 2nd gen i5 ThinkPad to be precise, that struggles to top 20ms for a request including loading a user and data object, joined tables and all, via ORM from Postgres running locally with a few hundred thousand records in said tables, before touching anything like explicitly adding caching.

Check your indexes, joins, general DB design and in-app looping. Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.

amluto 2 years ago | | |

I once worked on a small web project, at a university, in Python, using WSGI IIRC. It loaded a lot faster than any of the big expensive apps the university had written.

Well, there was one exception. The little import statement to import the Oracle database client took maybe 15 seconds. MySQL for the win :)

(I would not recommend MySQL for new applications today, although I might recommend it over Oracle…)

graemep 2 years ago | | |

I have developed a lot of Python based websites (mostly Django), some quite complex, and I have very rarely seen anything that takes seconds to load - sometimes some database queries have been slow. In most cases load time is dominated by loading JS and images.

> The reasons it was slow were all the usual culprits - a misused ORM being the main one.

So, slow queries.

I have not has such bad issues with dependency management either. Not even with old stuff someone else wrote years ago.

Spivak 2 years ago | | |

I'm so confused by this, Python is really fast. This isn't to say that other languages aren't a lot faster but I can afford to be so ungodly wasteful with CPU bound tasks (on "leaf" programs don't worry I'm not doing this in libraries to be consumed by others) because it literally doesn't matter. The IO to call print(), write a log, or read a file on disk dwarfs the time actually spent running Python code and this is before using the new JIT.

I wouldn't number crunch in Python without something like numpy because you'll pay the cost of Python's dynamism for nothing but a lot of work has gone into making Python's primitives and standard library performant. I steal algorithms from CPython all the time.

nonethewiser 2 years ago | | |

Im curious what the orm misuse was because Python can obviously handle a lot higher loads than that. Perhaps the orm is to blame for offering some footgun. Or maybe the developer did something impossibly idiotic.

GuestHNUser 2 years ago | |

ImPlot is small and worth checking out if you don't want to make the plotting functions yourself. https://github.com/epezent/implot

axchizhov 2 years ago | |

So, instead of printing a pretty plot in 2 lines of code, you will be... making these 20k syscalls yourself?

You have such unusual hobby, my friend!

t8sr 2 years ago | | |

It doesn’t take 20k syscalls to print a plot, the 20k syscalls is for the import call. I would hope that drawing plots takes a lot less.

To engage with your point: loading a dynamic library in a regular language takes significantly less than 20k syscalls. Probably 20-40 for C on Linux. Python is uniquely inefficient. On most plots comparing resource use by different languages, in order to even show python together with regular languages like Java and C, either you use the log scale, or everything but Python is shown as a single point.

Of course, most people use Python to glue together stuff written in C, so it’s not that big of a deal, but it becomes a problem when people forget pure Python code is literally hundreds or thousands times slower than a “regular” program doing the same thing.

hnlmorg 2 years ago | | |

Python will do a lot under the hood that a hand-rolled C solution wouldn’t. So I wouldn’t expect the C equivalent to make the same number of syscalls as Python.

rising-sky 2 years ago | | |

Ha!

chx 2 years ago | |

Forth?? Now, that's a name I haven't heard in a long time. A long time.

pdonis 2 years ago | |

> I'm going to go back to learning more C and Forth

Why would you expect that to decrease the number of syscalls you need? The syscalls are there because the program needs the OS to do things. That need is driven by the application domain, not by the programming language you use.

Jtsummers 2 years ago | | |

> The syscalls are there because the program needs the OS to do things.

Maybe some of them are. Many of those syscalls are there because Python (not the core program someone is creating, but rather its platform) needs the OS to do things.

Importing an empty python file takes 28 syscalls (30 measured by their tool, but the last two are closing out the trace not actually related to the import). 29 syscalls if you have any text in it (presumably more for larger files).

The logical equivalent in C for many portions of the import process in Python happen at compile + linker time, not during execution. So while it might not be a pleasant experience to develop, a C equivalent of many Python programs would involve far fewer syscalls at execution time.

pixelesque 2 years ago | | |

Python's module importing / $PYTHONPATH lookup/traversal is incredibly inefficient, especially with cold FS caches...

I've worked at places where we've significantly patched the logic (in a way which breaks compatibility in some cases, so couldn't be up-streamed) which makes Python startup / module loading with hundreds of paths in $PYTHONPATH orders of magnitude faster...

floating-io 2 years ago | | |

Unless the programming language you use happens to perform 20,000 system calls before it ever even runs a line of your actual code...

shadycuz 2 years ago |

I also wrote a somewhat similar tool. I call it deep-ast. It's pretty flexible in what it can track. I used it when refactoring some code in urllib3, to see what Exceptions could get raised along a given code path.

https://github.com/DontShaveTheYak/deep-ast

albertgoeswoof 2 years ago |

Today I asked a devops engineer to tell me how much time a long (3 seconds avg) api call was spending on database queries, application logic, and network etc. He couldn’t understand the request and instead opened up the azure console and recommended we increase the number cpu cores / memory if performance is an issue.

I look at posts like this and cry.

jiggawatts 2 years ago | |

I keep telling people that a hundred buses will let you take a hundred times more people, but nobody will get to their destination a hundred times faster.

Inevitably this comment is followed by quiet blinking as they digest this and then this question: “Are you saying we need to scale up one hundred times bigger?”

Sigh…

QuercusMax 2 years ago | | |

Really makes one wonder what kind of thought process these people go through, and what kind of education they actually had.

tomrod 2 years ago | |

Your DevOps engineer needs to learn more about observability. Jaeger or similar could be helpful.

Also, with DevOps pushing out traditional administrators, companies are often spending way more on infra than needed.

turtlebits 2 years ago | |

Implementing traces should be done by devs, not the infrastructure team. Devops should implement/support the platform that supports traces.

jiggawatts 2 years ago | | |

In Azure, use Application Insights for this. It's easy to set up and shows distributed performance traces across multiple services in a GUI.

latency-guy2 2 years ago | | |

Disagree, DevOps teams should be looking for resources that are being hit hard unnecessarily and request moves to better solutions when possible.

DevOps teams should be looking at CPU spikes, and should be performing RCAs, and they should be maintaining resources in a healthy state, and they should reject/revert changes and notify problem areas in code by product focused devs.

Product devs, for the most part, are only implementing human lex traces to debug business logic when it arises. Product devs are not equipped with the knowledge to identify system errors that are not "bugs in the code", i.e. they will not be good at telling you why SPROC_LXC1 fails as a result of making a ExcelParserFactoryFactory

RetroTechie 2 years ago | |

> Today I asked a devops engineer (..) He couldn’t understand the request

"Engineer"?!? Given above description, that makes me cry.

ok123456 2 years ago |

If it's a large project, I'll use local imports to defer this cost only around where I'm plotting. That way, if I have another entry point that only does computation or is part of a larger system like a web application, it won't have this sort of overhead.

rldjbpin 2 years ago |

https://archive.is/XVWSO

Aggressive anti-adblock plugin used.

mrnonchalant 2 years ago |

Very cool! I liked the instruction counting article as well.