Show HN: Sisi – Semantic Image Search CLI tool, locally without third party APIs

Show HN: Sisi – Semantic Image Search CLI tool, locally without third party APIs(github.com)

128 points by zcbenz 1 year ago | 41 comments

I wrote this tool to get familiar with CLIP model, I know many people have written similar tools with CLIP before, but I'm new to machine learning and writing a classic tool helps my study.

The unusual thing with my version is, it is in pure Node.js, with the power of node-mlx, a Node.js machine learning framework.

The repo in the link is mostly about implementing indexing and CLI, the code of the model implementation lives as a Node.js module: https://github.com/frost-beta/clip .

Hope this helps other learners!

notsylver 1 year ago |

I was planning to do this myself lol. I was going to use SQLite as the index, and use `sqlite-vec` or something similar to query for similar files directly. I think the only other thing I was planning were more filters, `"positive term" -"negative term"` to be able to negate results, `>90"search"` to find images that match by >90% and some generic filters like `--size >1mb` to help narrow it down when you are looking for a specific image. Quantizing embeddings to make them smaller/faster also seemed interesting but I haven't tried doing it yet.

progx 1 year ago |

Uses only 1 core 100% under linux, can this be changed?

10 images, each ~20 kb size, took more than 10 minutes to index, is that normal without GPU-acceleration?

zcbenz 1 year ago | |

No it is not normal, I only tested x64/arm64 macs, I will try on linux.

a_wild_dandan 1 year ago | | |

What’s normal? On your Apple silicon.

sureIy 1 year ago | |

Wow that’s atrocious performance. So there’s no chance to use this on real photos

spullara 1 year ago |

Very cool! Here is a similar python version.

https://github.com/spullara/photoindex

Oh and if you want to run something locally on your iphone you can use my app I am still testing:

https://x.com/getrememberwhen

sureIy 1 year ago |

This is cool. Is there also a way to show contents of the image as indexed? i.e. image 1 has cat and dog

There are a lot of tool/apps that let you “search images” but not much that lets you just as easily “read images”

kjeldsendk 1 year ago |

I have wanted to clean up my photo collection for ages and remove any nsfw picture that might hide somewhere.

Would this be able to do that and how likely is it It will see a pc release.

Eisenstein 1 year ago | |

This script doesn't do search, but it generates keywords for images and places them in the image metadata. You can then search for keywords using something like Diffractor. I will warn though that any AI solution not geared towards NSFW will not give good information on NSFW images, though it may give a keyword such as 'intimate' or 'adult content' which is all you need.

* https://github.com/jabberjabberjabber/LLavaImageTagger/

petesergeant 1 year ago |

I've been enjoying https://github.com/mazzzystar/Queryable on iPhone

y04nn 1 year ago |

How does CLIP compare to YOLO[1]? I haven't looked into image classification/object recognition for a while, but I remember that YOLO was quite good was working on realtime video too.

[1]: https://pjreddie.com/darknet/yolo/

Eisenstein 1 year ago | |

CLIP and YOLO work completely differently and have different purposes. CLIP uses transformers and embeddings and can compare text with images for classification. YOLO using a CNN and is trained with bounding boxes on images and is used for image recognition.

Give an image to CLIP and you can compare the similarity between the image and a sentence like 'a vase with roses in it'. Whereas with YOLO you give it an image and get the coordinates of bounding boxes around a vase, and around roses.

yburkov 1 year ago |

using same app rclip: https://github.com/yurijmikhalevich/rclip

netdur 1 year ago |

I have made similar android app for semantic image search, works offline too, still gathering feedback and polishing UI, but it works, if you are brave enough here is it https://drive.google.com/file/d/1tE0cY6umj5h5zCY_Jvaou1M8sCf...

nickphx 1 year ago | |

Why yes, I'll download a 695MB APK file from an internet stranger.

netdur 1 year ago | | |

Yes, the size is 99% 2 models weights required to run inference offline, there no way around it.

KetoManx64 1 year ago | |

Is there a github link?

netdur 1 year ago | | |

We have not decided what to do with it yet. It could be free, paid, or open source. However, the logic code for using semantic search with CLIP-compatible models on Android will be available on GitHub.

ivanjermakov 1 year ago |

In russian, "sisi" is a variation of "tits".

Is there a job/services that confirm that branding is appropriate across different languages? Seems like a non trivial problem to solve.

Jack5500 1 year ago |

Isn‘t clip superseeded by multimodal llms?

Eisenstein 1 year ago | |

In this program CLIP is being used to create embeddings. A multimodal LLM does something very similar. In this case the language model is not needed because the embeddings are being used to search directly.