Non-linear Thinking with CUDA(viralfsharp.com) |
Non-linear Thinking with CUDA(viralfsharp.com) |
There are n starting points
There are n ending points
The interval is n/2 in length on average
Thus it requires n * n * (n/2) additions
You said: Generating one partial sum costs n, and
you need to do it n times, ...
I think you need to do it n^2 times.Alternatively you can just perform a prefix sum operation and subtract each (n - k)th element from nth element for results in the kth iteration. This could be even faster.
There are n^2 results, each result takes O(n) to compute, so the naive algorithm is O(n^3). That's what you asked - why does the naive algorithm take O(n^3) - and that seems to answer your question.
Yes, clearly there is an overlap of some partial sums, and that's why a less naive algorithm is sub-O(n^3).
So, what are you asking, or what additional point are you making?