Similarity Learning lacks a framework. So we built one(blog.qdrant.tech) |
Similarity Learning lacks a framework. So we built one(blog.qdrant.tech) |
Disclaimer: I've made some contributions to it.
I recently built a similarity search application that recommends new Pinterest users channels to follow based on liked images using Milvus (https://github.com/milvus-io/milvus) as a backend. Similarity learning is a huge part of it, and I'm glad more and more tools like Quaterion are being released to help make this kind of tech ubiquitous.
What is a realistic minimum viable dataset for an approach like this? When is it not advisable? How does it compare to other more basic approaches?
I could see it making sense for complex unstructured data — Qdrant seems to point in that direction.
More specifically, I'm interested in deriving distances between writing style, arguing style, etc.
Basically, you can collect text from different authors, then you can use authors names as labels to train a similarity learning with it. My suggestion would be finetune a Transformer model with a specific head and an ArcFace loss.
> From 0.5.0, Finetuner computing is hosted on Jina Cloud. The last local version is 0.4.1, one can install it via pip or check out git tags/releases here.
But there are some cool ideas implemented there as well, I encourage you to try both!
"From 0.5.0, Finetuner computing is hosted on Jina Cloud. THe last local version is 0.4.1, one can install it via pip or check out git tags/releases here."
Fun fact, one of the examples in Quaterion is for similar cars search.
If you find this topic and want to discover more, we collected a bunch of resources that might be helpful. https://github.com/qdrant/awesome-metric-learning
Moreover, fine-tuning might be just one of the applications of neural networks in the organization, and you may already have some pipelines built to train them, so it should be also unified.
And more importantly, Jina's finetuner gives you some pretrained models to choose from, while Quaterion is PyTorch Lightning based, so you can easily integrate it if you already use PyTorch and have the flexibility to fine-tune any custom network as well.