This is still extremly slow for that CPU, compared to the quantized model.
IIRC the llama.cpp f32 code is basically a placeholder.
BUT the threading overhead is a known performance issue, and I'm sure Java handles that better.
I didn't know about it, I should have... are there any "edge" frameworks as complete as ggml/llama.cpp that you know of that are faster now? Ggml is still very easy to use which I like, but I'd always thought of it as the fastest, in particular for CPU, I hadn't noticed there were known performance issues.
Jlama uses the vector api in java20 but also better thread scheduling with work stealing and zero allocation.
Llama.cpp is still SOTA on CPU, as far as I know, especially with a small discrete GPU to help with long prompt ingestion. And it has tons of features (like grammar, context extending and good quant) that other frameworks are still missing.