I built Moments to get this game idea out of my head, finally. My original goal was to run on-device models specifically in mobile browsers, but running local vision models directly in phone browsers is still very much too early, so I focused on desktop. How it works: - You upload a photo. - A local vision model running entirely in your browser captions it and picks a prominent object from the image. - You guess the word just like Wordle. It uses a very tiny model so it is not very smart https://huggingface.co/onnx-community/Florence-2-base-ft |