Whisper solves both at once.
It combines persistent memory with grounded retrieval, so your agents remember users across sessions and only surface what's actually true. We ran a benchmark on a frozen dual-provider dataset: 0% hallucination rate, 94.8% retrieval recall.
The full integration looks like this:
import { WhisperClient } from "@usewhisper/sdk";
const whisper = new WhisperClient({
apiKey: process.env.WHISPER_API_KEY,
project: "my-project",
});
// After each conversation — store what's worth remembering
await whisper.remember({ messages: conversationHistory, userId });
// Next session — retrieve before generation
const { context } = await whisper.query({ q: userMessage, userId });
// Model now has full context from prior sessions
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: context },
{ role: "user", content: userMessage },
],
});
That's it. Your agent now remembers users across sessions and won't
hallucinate what it retrieves.The problem we kept seeing: memory-only solutions (Mem0, Zep) store context but hallucinate on retrieval. Retrieval-only solutions (Nia, Supermemory) are grounded but stateless — every session starts cold. Nobody was doing both.
What Whisper does differently: - Persistent memory that compounds across sessions - Retrieval that's grounded — it won't surface what isn't there - Works with any LLM, integrates in ~10 minutes - MCP-native, SDK + CLI for fast onboarding
Free tier available. Docs at usewhisper.dev.
Happy to answer anything — especially pushback on the benchmark methodology, I want to pressure test it.