A Story Of realloc (And Laziness)(blog.httrack.com) |
A Story Of realloc (And Laziness)(blog.httrack.com) |
void *realloc(void *ptr, size_t size) {
void *nptr = malloc(size);
if (nptr == NULL) {
free(ptr);
return NULL;
}
memcpy(nptr, ptr, size); // KABOOM
free(ptr);
return nptr;
}
Line marked KABOOM copies $DEST_BYTE_COUNT, rather than $SOURCE_BYTE_COUNT.Say you want to realloc a 1 byte buffer to a 4 byte buffer - you just copied 4 bytes from a 1 byte buffer which means you're reading 3 bytes from 0xDEADBEEF/0xBADF000D/segfault land.
EDIT: Also, this is why the ENTIRE PREMISE of implementing your own reallocator speced to just the realloc prototype doesn't make much sense. You simply don't know the size of the original data with just a C heap pointer as this is not standardized AFAIK.
If you're reimplementing realloc() it's pretty easy to know the size of the allocated regions - you just need to store the size somewhere when you allocate a block. One common method is to allocate N extra bytes of memory whenever you do malloc() to hold the block header and return a pointer to (block_address + N) to the user. When you then want to realloc() a block, just look in the block header (N bytes before the user's pointer) for the size.
The block header can store other useful stuff, like debugging information. I once implemented a memory manager for debugging that could generate a list of all leaked blocks at the end of the program with the file names and line numbers where they were allocated.
"If the space cannot be allocated, the object [*ptr] shall remain unchanged."On several occasions I have wanted to use mmap, mremap, and friends more often to do fancy things like copy-on-write memory. However, I always find this whole area depressingly poorly documented, and hard to do (because if you mess up a copy-on-write call, it just turns into a copy it seems, with the same result but less performance).
While it's good realloc is clever, I find it increasingly embarassing how badly C (and even worse, C++ which doesn't even really have realloc (as most C++ types can't be bitwise moved) memory allocation maps to what operating systems efficiently support.
There have been C++ templates written that use jemalloc-specific calls; for instance see Folly from facebook. I haven't taken a close look, but I know they do some jemalloc-specific code: https://github.com/facebook/folly/tree/master/folly
The other allocated-related thing that C++ really wants (and could benefit C as well) is "sized deallocation". Most of the time you often know the exact size of the memory you allocated. If you could pass that to free() the allocator could save some work determining it. In the case of C++ the compiler can often do this on your behalf for many "delete" calls (at least in the cases where it knows the exact type). Google did an implementation of this idea and got good results. They submitted a proposal to the standards body but I don't know if there is any recent activity. I hope it does happen though: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n353...
buffer = realloc(buffer, capa);
Yeah, 'cause when it fails we didn't need the old buffer anyway... Might as well leak it.As someone else pointed out, the example call of realloc is also incorrect.
edit: also, malloc is incorrect for three reasons: 1) sbrk doesn't return NULL on failure, 2) a large size_t length will cause a contraction in the heap segment rather than an allocation, and 3) sbrk doesn't return a pointer aligned in any particular way, whereas malloc must return a pointer suitably aligned for all types.
This is implementation coupling at its worst. Don't do it.
But say you have 4K page table size. You malloc() in turn a 2K object, a 256K object, and another 2K object, ending up with 2K, 256K, 2K in memory. Then your 256K is not aligned on a page boundary. If you realloc() the 256K it has to move since it's surrounded by two objects. When you do that, you'll wind up with the two pages on the end being mapped to multiple addresses. Which is actually just fine...Interesting...
In fact, that's what the article already explains: the large alloc will just end up being passed through to the kernel, which only deals at page granularity.
But what does this mean for locality? Will I be thrashing the cache if I use realloc frequently? Do I even have the promise that malloc will return unfragmented memory?
Working with Linux and having the source (and the ability to change it) for entire stack all the way down to and including the kernel is liberating. As a programmer it feels like the entire world is open to me.
I guess that's GNU's dream brought to life, really.
Wrong? Are there lots of ways allocation can fail besides low memory conditions?
http://www.scvalex.net/posts/6/ http://www.drdobbs.com/embedded-systems/malloc-madness/23160...
There are a few caveats to this.
Firstly, malloc actually can fail, not because it runs out of memory, but because it runs out of address space. If have 2^64 bytes of memory in your address space already (2^48 on most practical machines, i believe), then there is no value malloc could return that would satisfy you.
Secondly, this behaviour is configurable. An administrator could configure a Linux system not to do this, and instead to only allocate address space that can be backed with memory. And actually, some things i have read suggest that overcommit is not unlimited to begin with; the kernel will only allocate address space equal to some multiple of the memory it has.
Thirdly, failure is conserved. While malloc can't fail, something else can. Linux's behaviour is essentially fractional reserve banking with address space, and that means that the allocator will sometimes write cheques the page tables can't cash. If it does, if it allocates more address space than it can supply, and if every process attempts to use all the address space that it has been allocated, we have the equivalent of a run on the bank, and there is going to be a failure. The way the failure manifests is through the action of the out-of-memory killer, which picks one process on the system, kills it, and so reclaims the memory allocated to it for distribution to surviving processes:
http://linux-mm.org/OOM_Killer
The OOM killer is a widely-feared bogeyman amongst Linux sysadmins. It sometimes manages to choose exactly the wrong thing as a victim. At one time, and perhaps still, it had a particular grudge against PostgreSQL:
http://thoughts.davisjeff.com/2009/11/29/linux-oom-killer/
And in the last month or so, on systems where i work, i have seen a situation where a Puppet run on an application server provoked the OOM killer into killing the application, and another where a screwed up attempt to create a swap file on an infrastructure server provoked it into killing the SSH daemon and BIND.
I don't know about what other operating systems do. Apparently all modern unixes overcommit address space in much the same way as Linux. However, i can't believe that FreeBSD handles this as crassly as Linux does.
I'd guess that it varies a lot by domain and project but from what I've seen, pretty common.
> I'd think that if allocs began to fail, there was no recovery anyhow
I think this is what both high-level languages and the Linux "over-commit-by-default" policy have convinced people is the normal behavior. However in my experience it's not that hard to make OOM simply bubble up the stack and have all the callers up the stack free their resources, then let the rest of the program keep running. It doesn't have to be a catastrophic event. You just have to be consistent about handling it, and write code expecting it.
> Are there lots of ways allocation can fail besides low memory conditions?
To think of a few, there's running out of memory, but there's also running out of address space. The latter is not so hard to accomplish on a 32-bit system. You could ask for a chunk of memory where, if you could coalesce all the free space throughout the heap, you may have enough space, but you can't make it into a contiguous allocation.
On Windows I've also seen the kernel run out of nonpaged pool, which is more space constrained than the rest of memory. I've seen this when a lot of I/O is going on. You get things like WriteFile failing with ERROR_NOT_ENOUGH_MEMORY.
For me the funniest part has been that the people who seem entitled to write sloppy software are the exact same set who would have the shrillest voices complaining that firefox is so slow and bloated (although its not anymore)
Many believe that its OK to hog memory, that it is an infinite resource. Many believe it is OK to be slow as long as it meets specs. Many believe your user application is the only application that the user will be running at any point in time. However, when your competition does it leaner and faster, you (not you personally, a generic software) are mostly going to be toast.
Memory can be allocated beyond RAM size, so by the time a failure occurs your program really should crash and return its resources.
Embedded systems have fewer resources and some will not have virtual memory and so the situation will be different. But unless you know better, the best practice is still to not check the return from allocators. Running out of memory in a program intended for an embedded platform should be considered a bug.
What do you mean by this? malloc returns memory that is contiguous in the virtual address space. It may not be contiguous in the physical address space, but that should be irrelevant for cache behavior.
Will I be thrashing the cache if I use realloc frequently?
I suppose. But if you use realloc, you should anyway ensure that you realloc geometrically growing chunks of memory (e.g., whenever you need a new buffer, you multiply its size by a constant factor like 1.2 instead of just adding an element at a time). As a result, realloc() should be infrequent enough that it normally doesn't matter.
On the other hand that's not an excuse for not understanding how things work and just having faith in some magic layer that somehow would just handle things for you. Doing so it could make it impossible to improve those parts of the system that are important without rewriting everything.
I don't know about what other operating systems do.
Apparently all modern unixes overcommit address space in
much the same way as Linux. However, i can't believe that
FreeBSD handles this as crassly as Linux does.
Solaris does not (generally, unless you use MAP_NORESERVE w/ mmap).In general, the Linux kernel's default OOM behaviour is undesirable for the vast majority of enterprise use cases. That's why RedHat and many other vendors used to disable it by default (unknown who still does).
Why is it bad? Simple; imagine your giant database is running with a large address space mapped. Another program decides to allocate a large amount of memory. The Linux kernel sees the tasty database target, kills it, and gives the smaller program its memory. Congratulations, your database just went poof.
There's an article that discusses the advantages/disadvantages with respect to Solaris here:
http://www.oracle.com/technetwork/server-storage/solaris10/s...
The article (while old) still applies today.
True. However, you can disable this behavior if you like by running 'sysctl vm.overcommit_memory=2'; see proc(5).
Pretty close to true but I think that is a bit of a simplification. I seem to recall for instance on 32-bit Linux it's not hard to get malloc to return NULL: ask for some absurd size, like maybe a few allocations of a gigabyte or two, something that fits in a size_t but a 32-bit address space could not possibly accommodate with all the other things in the address space (stacks, your binary, libraries, kernel-only addresses in the page table, etc).
Sized deallocation made it into C++14.
That's good news about sized deallocation, I hadn't noticed that there is an updated "N3778" proposal which apparently was accepted. I still haven't seen the dlmalloc work to support that show up in the main svn branch.
>>By default, Linux follows an optimistic memory allocation strategy. This means that when malloc() returns non-NULL there is no guarantee that the memory really is available.
So it's like airlines overbooking seats; the system just hopes that the memory is available when you actually try to use it. I had no idea. That would be an extremely annoying bug to try and track down. How would one even do it? Is there a way to test if you truly have the memory without segfaulting?
If you want to ensure that you don't block, you can use mlock().
Think of the memory requirements of the fork() system call. It clones the current process, making an entire copy of it. Sure, there's lots of copy-on-write optimisation going on, but if you want guaranteed, confirmed memory, you need to have a backing store for that new process. The child process has every right to adjust all of its process memory.
So if a 4GB process calls fork(), you will suddenly need to reserve 4GB of RAM or new swap space for it. Or if you can't allocate that, you will have to make the fork() fail.
This can be terrible for users, since most often a process is going to fork() and then exec() a very small program. And it seems nonsensical for fork() to fail with ENOMEM when it appears that there is lots of free memory left. But to ensure memory is strictly handled, that's what you have to do.
The alternative, which most distributions use, is to optimistically allocate memory. Let the fork() and other calls succeed. But you run the risk of the child process crashing at any point in the future when it touches its own 'confirmed' memory and the OS being unable to allocate space for it. So the memory failures are not discovered at system call points. There's no return code to spot the out of memory condition. The OS can't even freeze up until memory is available because there's no guarantee that memory will become available.
http://opsmonkey.blogspot.co.uk/2007/01/linux-memory-overcom... has some more info.
That is, when you call shrink_to_fit, it first copies all elements it has, swaps itself with the new copy, and deletes the original vector. (Probably because there's no portable way of returning only part of a contiguous memory region. Yikes.)
http://stackoverflow.com/questions/2695552/how-to-shrink-to-...
(shrink_to_fit does need to (attempt to) get a smaller allocation, otherwise or wouldn't be shrinking the vector.)
Why use realloc when growing an array you have an interface to? Just add another allocated buffer to the previous allocated areas. When there are too many small areas, consolidate with realloc/free.
Much faster (yes yes, almost always).
(Disclaimer: Last time I used C++ I had hair. :-) )
Edit: OK, thanks plorkyeran.
Well, there is no grow() method on std::vector, so ... no?
But generally speaking, std::vector implementations are basically required to[1] grow the backing store exponentially so that adding elements to them absolutely does not call malloc on every growth of the vector.
You can make a vector do this kind of pessimistic allocation by calling reserve() for every element you add, which will cause the allocation of exactly the amount you reserved. This would be dumb, though. Reserve is there so you can allocate a precise large number and avoid even the logarithmic cost of allocation in adding elements to the vector.
It's really worth noting that this is a better worst case than the worst case for realloc(), which is entirely entitled to reallocate and copy every single time you call it. You're pretty likely to implement the exact same algorithm as vector if you DIY because of this exact issue when performance is important.
I do agree, though, that it would be nice if there was a failable realloc in C++ (and C for that matter) as described above, where it simply returns NULL if there's no more room in the allocated space. What to do in that event should really be up to the caller, not a black box algorithm sensitive to all sorts of variables.
[1] Because push_back() has a requirement of having amortized constant complexity, which means that it would be non-conforming to have the entire array moved for every push. http://www.cplusplus.com/reference/vector/vector/push_back/
[N] However, std::basic_string allows linear complexity on its push_back, so that may be what the poster meant. I'm not aware of any widely used implementation that actually does it in worse than amortized constant, though.
Of course, there is no reason to not do all your allocation through a wrapper function which does check and abort on failure. I think the point was that surviving malloc failures is a dubious approach - instead go all in, or if it's a long-running service, provide a configurable max memory cap and assume that much will be available.
I'll take the clear error message.
In the general case, however, if allocating 100 bytes fails, reporting that error is also likely to fail. An actual memory allocation failure on a modern computer running a modern OS is a very rare and very bad situation. It's rarely recoverable.
It's not bad to handle allocation failures, but in the vast majority of cases it's very unreasonable to do so. You can write code for it if you want, have fun.
And just to be completely clear, I am ONLY talking about calls to malloc, new, realloc, etc. NOT to OS pools or anything like that. Obviously, if you allocate a 4Mb buffer for something (or the OS does for you), you expect that you might run out. This is ONLY in regards to calls to lower level heap allocators.
I don't think you'll find any experienced programmer recommending that you always check the return from malloc. That's completely absurd. There are always exceptions to the rule, however.
I call BS on this. First of all, it's not the 100 byte allocation that is likely to fail; chances are it's going to be bigger than 100 bytes and the 100 byte allocation will succeed. (Though that is not 100% either.) Second, the thing you're going to do in response to an allocation failure? You're going to unwind the stack, which will probably lead to some temporary buffers being freed. That already gets you more space to work with. (It's also untrue that you can't report errors without allocating memory but that's a whole other story...)
I suspected when I wrote in this thread that I'd see some handwavy nonsense about how it's impossible to cleanly recover from OOM, but the fact is I've witnessed it happening. I think some people would just rather tear down the entire process than have to think about handling errors, and they make up these falsities about how there's no way to do it in order to self justify... Although, when I think back to a time in which I shared your attitudes, I think the real problem was that I hadn't yet seen it being done well.
The space I work in deals with phones and tablets, as well as other embedded systems (TVs, set-top boxes, etc.) that tend to run things people think of as "modern" (recentish Linux kernels, userlands based on Android or centered around WebKit), while having serious limits on memory and storage. My desktop calendar application uses more memory than we have available on some of these systems.
In these environments, it is essential to either avoid any possibility of memory exhaustion, or have ways to gracefully deal with the inevitable. This is often quite easy in theory -- several megabytes of memory might be used by a cached data structure that can easily be re-loaded or re-downloaded at the cost of a short wait when the user backs out of whatever screen they're in.
But one of the consequences of this cavalier attitude to memory allocation is that even in these constrained systems, platform owners have mandated apps sit atop an ever-growing stack of shit that makes it all but impossible for developers to effectively understand and manage memory usage.
(When you fill up the sub-buffers, you do a realloc to all of them. Future sub-buffers get the new size.)
This needs less copying and (potentially) free calls. On the other hand: An access would need some extra operation to find the right buffer and get an offset into it.
Edit: Ok, thanks. I explained if I was unclear, since you went off on a tangent. It was informative, so it is all good. (I'm not going back to C++ et al anyway. :-) )
The standard library actually does have a built in container that's (almost) exactly as you describe, though. It's called std::deque (http://www.cplusplus.com/reference/deque/deque/).