A curious case of O(N^2) behavior which should be O(N) (2023)(gist.github.com) |
A curious case of O(N^2) behavior which should be O(N) (2023)(gist.github.com) |
...oops
> The total amount of space needed to represent this collection of strings is O(k n^2).
I haven’t seen O-notation ever represent ram usage, just algorithm complexity. Is this common?
My guess would be because it was implemented in c, where the usual practice if you need a container type other than a fixed size array is to implement it yourself. IME, c code tends to use linked lists for a lot of things that in most other languages would be implemented using a better suited, and more performent, container type from the standard library.
One way that other languages can outperform c, is it is easier to use the right data structure.
I guess a flat array could still be debatable if you're usually inserting near but not at the end, as it'll still need to move all the following elements. But it seems dubious.
Perhaps a simple way to look at it is that it's like a dynamic array, but when the capacity is exceeded, you don't reallocate the array, just just allocate a new (exponentially larger) chunk and keep the old one as-is. Then just link the chunks in a (very short) linked list, or keep a separate small list of chunks (you're never gonna need more than 48, so you can just have a fixed allocation for it), or what have you. The bonus here is that it reduces latency on pushing and has more predictable performance.
In the end I still lost out to another package that did the same function a lot faster. May be interesting to look back to see how they did it.
https://gist.github.com/bssrdf/397900607028bffd0f8d223a7acdc...
My solution avoids that.
Luckily there's a graph screenshot. But the graph it displays is incomprehensible. Without "id_sort_by_name" being on the one bar, I wouldn't even know what I'm looking at.
The lower screenshot is a flame graph. If you haven't encountered one of those before, it's totally reasonable to not know how to read it. But it's a standard and common way to present this sort of data. See https://www.brendangregg.com/flamegraphs.html for more information.
1. Does this have to be sorted?
2. Why is it sorted alphabetically instead of naturally?
Very. For instance if you look at sorting algorithms on wikipedia they pretty much all list performance (best, worst, average) but also worst-case space complexity, in O notation.
This is the basis for rainbow tables: precomputed tables for mapping hashes to passwords, with a space-saving “hash chaining” trick to effect a constant factor reduction in table size. Such tables are the reason why passwords must be hashed with a unique salt when stored in a database.
For example, an in-place algorithm like Bubble Sort would have a O(1) space complexity, because it requires no extra memory (and 0 memory is a constant). Merge sort on the other hand is O(n) because it always uses additional memory for its intermediate stages, and that additional memory scales with n.
Doing a quick google, the first few sites I find seem to use a similar understanding https://www.geeksforgeeks.org/time-and-space-complexity-anal...
space complexity is O(n) but auxiliary space complexity uses Theta for notation instead.
But people aren't too picky on the notation and usually say something like "O(1) extra space" instead of using theta.
Saying something is O(n) tells you it grows at most linearly, but this would also admit e.g. log n.
Saying something is Theta(n) tells you it grows exactly linearly: that is, it is not a slower growth rate like log n, nor a faster growth rate like n^2.
Heavily simplified due to caches etc. To the point where people sometimes measure in cache misses instead as that is usually what actually matters.
But yeah I guess space complexity vs auxiliary space complexity is just a bit ambigous.
And if your implementation of an algorithm allocates more space in the big-oh sense than it can actually touch (eg. O(n) space for O(log n) time or whatever), that's just a wasteful implementation. Doesn't make the algorithm itself require more space than it has time to actually use.
Now move the compile time code to runtime.
I call it O(n) in time and memory. What do you call it?
If your say O(n) in time and memory - why is moving the code changing its complexity?
If you say still O(1) - then everything is O(1), thus proving P=NP among other things :)
Either way your interpretation isn't very useful.