https://developer.nvidia.com/blog/cuda-pro-tip-optimized-fil...
where they show that their compiler is sometimes able to automatically convert global atomic operations into the warp local versions, and achieve the same performance as manually written intrinsics. I was recently curious if 10 years later these same optimizations had made it into other GPUs and platforms besides cuda, so I put together a simple atomics benchmark in WebGPU.
https://github.com/PWhiddy/webgpu-atomics-benchmark
The results seem to indicate that these optimizations are accessible through webgpu on chrome on both MacOS and Linux (with nvidia gpu). Note that I'm not directly testing stream compaction, just incrementing a single global atomic counter. So that would need to be tested to know for sure if the optimization still holds there. If you see any issues with the benchmark or this reasoning please let me know! I am hoping to solidify my knowledge in this area :)
https://github.com/gpuweb/gpuweb/blob/main/proposals/subgrou...
There is a proposal for supporting subgroups in WebGPU proper but it's still in the draft stage.
But anyways, Histogram Pyramids is a more efficient algorithm for implementing parallel scan anyways. It essentially builds a series of 3D buffers, each having half the dimension of the previous level, and each value containing the sum of the amounts in each underlying cells, with the top cube being just a single value, the total amount of cells.
Then instead of doing the second pass where you figure out what index thread is supposed to write to, and writing it to a buffer, you just simply drill down into said cubes and figure out the index at the invocation of the meshing part by looking at your thread index (lets say 1526), and looking at the 8 smaller cubes (okay, cube 1 has 516 entries, so 1100 to go, cube 2 has 1031 entries, so 69 to go, cube 3 has 225 entries, so we go to cube 3), and recursively repeat until you find the index. Since all threads in a group tend go into the same cubes, all threads tend to read the same bits of memory until getting down to the bottom levels, making it very GPU cache friendly (divergent reads kill GPGPU perf).
Forgive me if I got the technical terminology wrong, I haven't actually worked on GPGPU in more than a decade, but it's fun to not that something that I did cca 2011 as an undergrad is suddenly relevant again (in which I implemented HistoPyramids from a 2007ish paper, and Marching Cubes, an 1980s algorithm). Everything old is new again.
Native libraries like wgpu can do whatever they want in extensions, safety be damned, but you're stepping outside of the WebGPU spec in that case.
The browser is dead, the only thing you can use it for is filling out HTML forms and maybe some light inventory management.
The final app is C+Java where you put the right stuff where it is needed. Just like the browser used to be before Oracle did it's magic on the applet.
Yea. Nah!
That obit is a bit premature
Cool project btw! Adding this to my long list of graphics blogs to read.
Isn't it 1s1 in the ground state so the probability distribution would look like a sphere.
But: "Error: Your browser does not support WebGPU"
Sigh
We have a way to go yet.
Shader '' parsing error: the type of `SCAN_BLOCK_SIZE` is expected to be `u32`, but got `i32`
10 │ @id(0) override SCAN_BLOCK_SIZE: u32 = 512;
│ ^^^^^^^^^^^^^^^ definition of `SCAN_BLOCK_SIZE`https://developer.chrome.com/blog/persistent-permissions-for...
There is a reason why in 2024, there is yet to exist a WebGL 2.0 game that can match Infinity Blade from 2011, the game used by Apple to demo iPhone's OpenGL ES 3.0 capabilities.
WebGPU on top of that, is Chrome only for the time being, still years away from a sound 1.0 release on Safari and Firefox.
But that someone probably just said, that with WebGPU you do not get the power you would have with a native feature set and this is true. So we likely won't see AAA games anytime soon in the browser. But it is definitely suitable for games in general.
The `wgpu` implementation linked will make its way into Firefox eventually. Dawn will follow up with a similar one for Chrome.
I was linking it to demonstrate there are no technical hurdles and it's only really approval remaining.
Go is just Java without the WM.
Rust is just a native compiler that creates slow programs and complains a lot.
Good morning Troll
I'll give you "complains a lot."
If you consider respect and responsibility.
WebGPU is brand new, and the paint is still wet. It doesn't make sense to dismiss things that haven't landed in browsers yet as “unusable on the web”.
Doesn't change the fact that is a Web standard, for Web browsers.
And that's how the web works, it was the same for WebRTC which spent 2-3 years in such a state, same for MSE, etc.