Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)

Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)(pwy.io)

136 points by Patryk27 1 year ago | 108 comments

hinkley 1 year ago |

We had learned helplessness on a drag and drop bug in jquery UI. I had like three hours every second or third Friday and would just step through the code trying to find the bug. That code was so sketchy the jquery team was trying to rewrite it from scratch one component at a time, and wouldn’t entertain any bug discussions on the old code even though they were a year behind already.

After almost six months, I finally found a spot where I could monkey patch a function to wrap it with a short circuit if the coordinates were out of bounds. Not only fixed the bug but made drag and drop several times faster. Couldn’t share this with the world because they weren’t accepting PRs against the old widgets.

I’ve worked harder on bug fixes, but I think that’s the longest I’ve worked on one.

giancarlostoro 1 year ago | |

One of my favorite most elusive bugs was a one liner change. I didn't understand the problem because nobody could reproduce it, or show it. Months later, after my boss told his boss it was fixed, despite never being able to test that it was fixed, I figured it out and fixed it. We had a gift card form, and we stored it in localStorage, if for any reason the person left the tab, and came back months later, it would show the old gift card with its old dated balance, it was a client-side bug. The fix was to use sessionStorage.

arghwhat 1 year ago | | |

For web, my favorite is JIT miscompilations. A tie between a mobile Safari bug that caused basic math operations to return 0 regardless of input values (basic, positive Numbers, no shenanigans), or a mobile Samsung browser bug where concatenating a specific single-character string with another single-character string would yield a Number.

Debugging errors in JS crypto and compression implementations that only occur at random, after at least some ten thousand iterations, on a mobile browser back when those were awful, and only if the debugger is closed/detached as opening it disabled the JIT was not fun.

It taught me to go into debugging with no assumptions about what can and cannot be to blame, which has been very useful later in even trickier scenarios.

contingencies 1 year ago | | |

It seems in the context of your story the old adage that organizations reproduce software in their own architecture again rings true, with multilayered bureaucracy, lies and promises resulting in "client state".

WalterBright 1 year ago | |

My longest one was an uninitialized declaration of a local variable, which acquired ever-changing values.

This is why D, by default, initializes all variables. Note that the optimizer removes dead assignments, so this is runtime cost-free. D's implementation of C, ImportC, also default initializes all locals. Why let that stupid C bug continue?

Another that repeatedly bit me was adding a field, and neglecting to add initialization of it to all the constructors.

This is why D guarantees that all fields are initialized.

hinkley 1 year ago | | |

The first bug I remember writing was making native calls in Java to process data. I didn’t understand why in the examples they kept rerunning the handle dereference in every loop.

If native code calls back into Java, and the GC kicks in, all the objects the native code can see can be compacted and moved. So my implementation worked fine for all of the smaller test fixtures, and blew up half the time with the largest. Because I skipped a line to make it “go faster”.

I finally realized I was seeing raw Java objects in the middle of my “array” and changing the value of final fields into illegal pairings which blew everything the fuck up.

ckocagil 1 year ago | | |

Valgrind didn't catch it?

kibwen 1 year ago |

Level 1 systems programmer: "wow, it feels so nice having control over my memory and getting out from under the thumb of a garbage collector"

Level 2 systems programmer: "oh no, my memory allocator is a garbage collector"

matklad 1 year ago | |

The answer is clear: just don’t have a malloc implementation in your process' address space!

thebruce87m 1 year ago | | |

Welcome to embedded! It’s no heaps of fun!

poikroequ 1 year ago | | |

A bump allocator is all anyone really needs

seanthemon 1 year ago | |

At the very bottom of everything is a garbage collector..

hinkley 1 year ago | | |

Soil is just the biggest swap meet in the world. Where every microbe, invertebrate and tree is just looking for someone else’s trash to turn into treasure.

riwsky 1 year ago | | |

Market forces: the ultimate garbage collector

ckocagil 1 year ago | |

"stackoverflow please help me how do i fix memory fragmentation"

amelius 1 year ago | |

Level 3 system programmer: "get me out of this straight jacket and give me my garbage collector back so I can get stuff done"

ComputerGuru 1 year ago | | |

That's not how system programmers think..

forrestthewoods 1 year ago | | |

No. Just no.

For as painful as the debugging story was I have spent vastly more amounts of time working around garbage collectors to ship performant code.

Arnavion 1 year ago |

jemalloc also has its own funny problem with threads - if you have a multi-threaded application that uses jemalloc on all threads except the main thread, then the cleanup that jemalloc runs on main thread exit will segfault. In $dayjob we use jemalloc as a sub-allocator in specific arenas. (*) The application itself is fine in production because it allocates from the main thread too, but the unit test framework only runs tests in spawned threads and the main thread of the test binary just orchestrates them. So the test binary triggers this segfault reliably.

( https://github.com/jemalloc/jemalloc/issues/1317 Unlike what the title says, it's not Windows-specific.)

(*): The application uses libc malloc normally, but at some places it allocates pages using `mmap(non_anonymous_tempfile)` and then uses jemalloc to partition them. jemalloc has a feature called "extent hooks" where you can customize how jemalloc gets underlying pages for its allocations, which we use to give it pages via such mmap's. Then the higher layers of the code that just want to allocate don't have to care whether those allocations came from libc malloc or mmap-backed disk file.

CraigJPerry 1 year ago |

Tangent: what’s the ideal data structure for this problem?

If there were 20million rooms in the world with a price for each day of the year, we’d be looking at around 7billion prices per year. That’d be say 4Tb of storage without indexes.

The problem space seems to have a bunch of options to partition - by locality, by date etc.

I’m curious if there’s a commonly understood match for this problem?

FWIW with that dataset size, my first experiments would be with SQL server because that data will fit in ram. I don’t know if that’s where I’d end up - but I’m pretty sure it’s where I’d start my performance testing grappling with this problem.

jrpelkonen 1 year ago | |

I think your premise is somewhat off. There might be 20 million hotel rooms in a world, but surely they are not individually priced, e.g. all king bed rooms in a given hotel have the same price per given day.

loeg 1 year ago |

Sort of tl;dr: mimalloc doesn't actually free memory in a way that it can be reused on threads other than the one that allocated it; the free call marks regions for eventual delayed reclaim by the original thread. If the original thread calls malloc again, those regions are collected (1/N malloc calls). Or (C) you can explicitly invoke mi_collect[1] in the allocating thread (the Rust crate does not seem to expose this API).

[1]: https://github.com/microsoft/mimalloc/blob/dev/src/heap.c#L1...

Arnavion 1 year ago | |

The mimalloc crate just provides the GlobalAlloc impl that can be registered with libstd as the global allocator using the `#[global_allocator]` attr.

The underlying sys crate provides the binding for mimalloc API like `mi_collect`: https://docs.rs/libmimalloc-sys/0.1.39/libmimalloc_sys/fn.mi...

rurban 1 year ago |

The Annotated C++ Reference Manual:

“C programmers think memory management is too important to be left to the computer. LISP programmers think memory management is too important to be left to the user.”

IceTDrinker 1 year ago |

PSA: do not use floating point for monetary amounts

SAI_Peregrinus 1 year ago | |

MS Excel uses floating point, and it's used a ton in finance. Don't use floating-point for monetary amounts if you don't know what rounding mode you've set.

koverstreet 1 year ago | | |

It's somewhat acceptable with double precision floats - never single precision floats.

But far better to just use integer cents.

nurettin 1 year ago | |

I have used single precision floats in my latest project just to disprove this baloney.

smh 1 year ago | | |

You are using 32 bit floats to represent money?

Does your project correctly calculate $300,000.00 + $0.01, (or even just correctly represent the value $300,000.01) and if so, how?

zokier 1 year ago |

I wonder if there is something that could be done on language design level to have better "sympathy" to memory allocation, i.e. built upon having mmap/munmap as primitives instead of malloc/free; where language patterns are built around allocating pages instead of arbitrarily sized objects. Probably not practical for general high-level languages, but for e.g. embedded or high-performance stuff might make sense?

PaulDavisThe1st 1 year ago |

A perfect demonstration of how many of harder problems we face writing (especially non-browser-based) software are in fact not addressed by language changes.

The concept of memory that is allocated by a thread and can only be deallocated by that thread is useful and valid, but as TFA demonstrates, can also cause problems if you're not careful with your overall architecture. If the language you're using even allows you to use this concept, it almost certainly will not protect you from having to get the architecture corect.

the-smug-one 1 year ago | |

I think Rust's language design is in part to blame, as it does not force the programmer to think sufficiently of the layout of the memory, instead allowing them to defer to a "global allocator".

PaulDavisThe1st 1 year ago | | |

This identical problem could easily occur in a C or C++ codebase.

znpy 1 year ago |

> Allocators have different characteristics for a reason - they do some things differently between each other. What do you think mimalloc does that could account for this behavior?

Interestingly, it would seem that Java programmers play with garbage collectors while Rust programmers play with memory allocators.

sbt567 1 year ago | |

> Rust programmers

*system

malkia 1 year ago |

In C++, your https://en.cppreference.com/w/cpp/memory/new/new_handler should call mi_collect.

Exuma 1 year ago |

I really love the design of this blog

bsder 1 year ago |

Welcome to systems programming. Allocators are invisible--until they aren't.

om8 1 year ago |

TLDR: use shitty allocators, win shitty memory leaks