Reverting the incremental GC in Python 3.14 and 3.15

Reverting the incremental GC in Python 3.14 and 3.15(discuss.python.org)

265 points by curiousgal 31 days ago | 130 comments

athoscouto 27 days ago |

We've been impacted by this. I migrated our services to Python 3.14 so we could attach profilers during runtime.

A couple of services looked like they had a memory leak. Memory was continuously increasing over time. Thanks to Python 3.14, we were able to use memray to understand what was going on. Those services were recreating HTTP clients (aiohttp) for every inbound request, and memory allocated by the downstream SSL lib was growing faster than it was being released.

We ended up rolling back to 3.13, which fixed the issue. I'll try again with 3.14.5.

nas 27 days ago | |

If you are using "httpx", it's likely caused by a reference cycle. I made a PR to fix it but the maintainers haven't applied it. :-( https://github.com/encode/httpx/pull/3733

The reference cycle httpx creates is kind of a worst-case scenario for the incremental GC issue. Both the generational (3.13 and older) and incremental GC are triggered by the net new "container" objects (objects that have references to others, like lists and not like ints and floats). The short summary is that you need to create more container objects before the incremental GC triggers. In the case of the httpx reference cycle, you have a relatively small number of container objects hanging on to a lot of memory, due the SSL context data (which is a big memory hog).

Reverting back to the generational GC was the wise thing to do, even though it's a bit scary to do in a bugfix release. The incremental GC works for most people but in the minority of cases it doesn't, it uses quite a lot more memory. I'm pretty sure with some additional tuning, the incremental GC would be fine too but it just didn't get that tuning. The generational GC has literal decades of real-world use (Guido merged my patch on Jun 2000, Tim Peters did a bunch of tuning after that to optimize it).

JimDabell 26 days ago | | |

> I made a PR to fix it but the maintainers haven't applied it. :-( https://github.com/encode/httpx/pull/3733

Unfortunately, you may be the wrong gender to contribute to Encode repositories like httpx:

> I've closed off access to issues and discussions.

> I don't want to continue allowing an online environment with such an absurdly skewed gender representation. I find it intensely unwelcoming, and it's not reflective of the type of working environments I value.

— https://github.com/encode/httpx/discussions/3784

Discussed on Hacker News here: https://news.ycombinator.com/item?id=47193563

A fork discussed here: https://news.ycombinator.com/item?id=47514603

athoscouto 26 days ago | | |

It was httpx indeed. i had aiohttp in mind because we ended up replacing that particular client with it

rtpg 26 days ago | |

We've been chasing down similar aiohttp client creation issues (liked to ...aiobotocore usage) for months now.

It's annoying that somehow talking to S3 etc requires so much churn. We have been trying to cache session objects and the like but clearly are still missing something.

Chasing this down has also made me realize how little Python libs use `weakref`, and just will build up so many circular references. The other day I figured out Django request's session infrastructure creates a circular reference meaning that requests have to get GC'd to get cleaned up in CPython.

I have a suspicion that the 3.14 problems are heavily linked to "real" workloads being almost entirely filled with cyclical objects.

shay_ker 26 days ago | | |

It's really fascinating to read this, since I've encountered similar memory issues in other languages (ruby, go, etc.). Debugging these issues is a pain.

Is there a way to make all this much easier to debug and to prevent memory issues in the first place? Is the abstraction level not quite right?

LaFolle 27 days ago | |

On profilers - profiling will come in 3.15, are you referring to remote exec? It is a great feature I am very exited about, at the same time afraid that the company won’t allow ptrace capability in prod.

athoscouto 27 days ago | | |

yes. remote exec allows me to attach profilers (e.g. memray) directly into a running process. i'm also excited about the upcoming statistical (cpu) profiler from 3.15

davidkwast 27 days ago |

"Python 3.14 shipped with a new incremental garbage collector. However, we’ve had a number of reports of significant memory pressure in production environments.

We’ve decided to revert it in both 3.14 and 3.15, and go back to the generational GC from 3.13."

Sounds the right move for me

NooneAtAll3 27 days ago |

I'm genuinely surprised that python change was even possible without PEP

giancarlostoro 27 days ago | |

Makes ya miss having a BDFL. Dang I didn't realize he's 70 now.

https://en.wikipedia.org/wiki/Guido_van_Rossum

zitterbewegung 27 days ago | | |

I wouldn’t recommend running the latest Python in prod. Honestly 3.x.7 releases are the most mature .

AdamN 27 days ago | |

Yeah it seems like a miss. I guess the thinking was that it wasn't developer-facing and just an internal optimization. But of course any change to garbage collection will change the memory and cpu dynamics of the process in a material way.

metalliqaz 27 days ago | |

It's not a change to the language, it's a change to the cpython runtime

ammar2 27 days ago | | |

PEPs aren't necessarily just for language changes, e.g https://peps.python.org/pep-0436/ which is largely a CPython implementation detail.

NooneAtAll3 26 days ago | | |

problem is that Python is so centralized, CPython is essentially a "reference implementation"

nilslindemann 26 days ago | |

The lesson from this seems to be to handle this via PEP next time: https://discuss.python.org/t/reverting-the-incremental-gc-in...

Fizzadar 27 days ago | |

Exactly! Would like to understand more how that came about. PEP exists for a reason.

bhouston 27 days ago |

.NET seems to have regularly changed the garbage collector over the years and I do not remember any similar surprises in production. I wonder why they have had better experience?

I thought that by now dynamic garbage collection was a known quantity so that making changes, outside of out right bugs, is fairly safe and predictable?

stackskipton 27 days ago | |

One thing Microsoft does really well is eating its own dogfood and Microsoft feeds a ton of .Net dogs.

So any change to GC starts with massive .Net MSFT code base so they get extremely good telemetry back about any downsides and might be able to fix it in time.

pjmlp 27 days ago | | |

Did really well, unfortunately.

There is almost no dog fooding on Windows development since version 8, Typescript team rather rewrite the compiler in Go, Azure has plenty of Go, Rust and Java projects alongside .NET.

Weryj 27 days ago | |

Actually there’s a change to dotnet 9 with how it handles the heap and GC which caused major issues for us.

I’ll confess the reason it hit us so hard is because the code quality was so low and wasteful on allocations that it didn’t hide the problem as well as previous versions.

chihuahua 27 days ago | |

I remember working on the Windows Update back-end at Microsoft around 2005, and we had a problem where it would freeze up periodically, and not surprisingly that turned out to be caused by GC. But we noticed it before shipping, and we just tweaked some GC parameters.

So I think it was not a big problem for .Net because it gave you enough control over GC, and because people tested their code before putting it in production.

bmitc 26 days ago | |

.NET and the CLR was actually designed by computer scientists and experts, not so for Python.

sega_sai 27 days ago |

I think reverting is not problem per se, but releasing a highly problematic version without proper testing in such an essential component is.

LaFolle 27 days ago | |

Yeah they noted that it went without PEP. Looks like a PEP will come now if it maintains at par perf.

emil-lp 27 days ago |

If I understand correctly, this is one of the changes that caused the regression:

https://github.com/python/cpython/pull/117120

luodaint 26 days ago |

The problem is that you create an HTTP client for each incoming request. In other words, you recreate SSL context and cause reference cycles with each request. From the memory point of view, the program seems to have a memory leak. However, the solution to the issue lies at the application level: just create your client once and reuse it for each subsequent request.

One of heuristics to find memory leaks can be stated as follows: if you instantiate any HTTP or connection objects inside the request handler, it's likely that you've made a mistake.

insumanth 26 days ago |

This is the first time I came across a change (a big one) that was implemented without passing through PEP. I thought it was standard.

nodesocket 27 days ago |

If using containers I believe this change was pushed in image python:3.14.5-slim-trixie

emmanuelsemugga 26 days ago |

This is so salient to put under consideration deeply

__loam 27 days ago |

Python is such a mess.

rmwaite 27 days ago | |

Any language of Python’s size and popularity will be a mess, the only difference is what parts of it.

__loam 26 days ago | | |

No, python is specifically a mess.

askllk 27 days ago |

All these issues were known in previous attempts for removing the GIL. But if Instagram/Meta want it, everyone stands to attention and finds out the obvious problems years later. Kind of like in geopolitics.

I hope Meta switches Instagram to PHP/Hack so they leave Python alone.

simonw 27 days ago | |

The no-GIL work (free-threading) is unrelated to this incremental GC work.

Free-threading actually uses its own, separate GC: https://labs.quansight.org/blog/free-threaded-gc-3-14

brianwawok 27 days ago |

In the world of AI written code, Python just doesn’t make sense. Converted about 100k lines in the last few months to golang and the performance is life changing. Curious if we will see global Python adoption fall by 75% or more in the next few years.

mau 27 days ago | |

I think humans are still accountable for the code generated by agents.

You are free to switch language but you still need to understand it.

tdb7893 27 days ago | | |

With a similar amount of experience with both languages I found Go much easier to read. I've always been a bit miffed why Python is seen as easy to read for experienced developers. I get the syntax is good for short code or people with little experience but my experience is those readability benefits went away quickly with time or complexity.

lern_too_spel 27 days ago | |

For personal projects, yes. For code going into production, you still need human code review, and that has to happen in a language that the humans you've hired are comfortable with. One day, we'll all be YOLOing vibe code straight into production, but that day is not today.

JackSlateur 27 days ago | | |

But that day is not today .. unless you are working for microslop or clownflare ? Half-kidding, sorry :)

backwardation_b 27 days ago | |

nothing about the performance characteristics of python changed with AI so why would you use python over golang if performance is a requirement/bottleneck? Trying to understand the reasoning as to me golang and python are equally simple to write and understand.

Yokohiii 27 days ago | | |

If language X is a persons comfort zone, that person will often default to it. Python is certainly more widespread then go.

Also, even if it looks like that to you, there are still people that write code with their own hands.

phainopepla2 27 days ago | | |

Regardless of whether golang and python are actually equally simple, python certainly has the reputation of being easier to write and read than almost any other language. That is a big part of its popularity.