The fix for machine unlearning in vector databases turns out to be conceptually simple, but it requires changing the semantics of retrieval. Standard FAISS-style indices store vectors and compute: argmax ⟨q, vᵢ⟩ If you insert -v, nothing happens. It’s just another point. The original vector is still maximally similar to itself and remains rank-1. This isn’t a bug—it’s a consequence of selection-based retrieval. If instead you store (vector, weight) pairs and evaluate: φ(q) = Σ wᵢ · K(q, vᵢ) you get a different object entirely: a field, not a selection. Now inserting the same vector with w = −1 causes destructive interference. The contribution cancels. The attractor disappears. Deletion becomes O(1) append-only (add the inverse), not a structural rebuild. FAISS-style: Vec<Vec<f32>> → argmax (selection) Weighted form: Vec<(Vec<f32>, f32)> → Σ (field) We validated this on 100k vectors: • FAISS: target stays rank-1 after “deletion” • Field-based model: exact cancellation (φ → 0), target unretrievable The deeper point is that this isn’t a trick—it’s a semantic separation. • FAISS implements a selection operator over discrete points. • The weighted version implements a field operator where vectors act as kernels in a continuous potential. • Retrieval becomes gradient ascent to local maxima. • Deletion becomes destructive interference that removes attractors. This shifts deletion from structural (modify index, rebuild, filter) to algebraic (append an inverse element). You get append-only logs, reversible unlearning, and auditable deletion records. The negative weight is the proof. Implication: current vector DBs can’t guarantee GDPR/CCPA erasure without reconstruction. Field-based retrieval can—provably. Paper with proofs: https://github.com/nikitph/bloomin/blob/master/negative-vect... |