Ask HN: Is latent space more than compression? Can we probe its internal rules?

2 points by WLHsu 27 days ago | 0 comments

I’m curious whether people see a deeper connection between Anthropic’s injected-thought detection work and latent-state world models like LeWM.

In one case, the model seems able to report parts of its own internal perturbation. In the other, training is explicitly pushed into a more structured latent prediction space.

Do these lines of work suggest that latent space may be partially probeable and interpretable as an internal rule space, rather than just a compressed vector space? Has anyone experimented with combining latent-state probing, regularized world models, and introspection-style detection?

No comments yet