Strange times.
From paper:
- "when a model is trained w.r.t. the dotproduct, its effect on cosine-similarity can be opaque and sometimes not even unique. One solution obviously is to train the model w.r.t. to cosine similarity"
- "We find that the underlying reason is not cosine similarity itself, but the fact that the learned embeddings have a degree of freedom that can render arbitrary cosine-similarities even though their (unnormalized) dot-products are well-defined and unique"