Note that you can get the model weights on HuggingFace here: https://huggingface.co/adept/fuyu-8b
Secondly, do you anticipate Fuyu being made available for commercial access or will it remain NC?
Anything that Adept is "excited to see what the community builds on top of it" would only serve Adept and no one else! What incentive does the community have to build on top of Fuyu, when the community can't benefit from its own work? If Adept wants to benefit from word-of-mouth discussion of their models and from community contributions that make those models work better, as has happened dramatically with Llama 2, then they need to give the community the opportunity to benefit too.
Also weird: if you look at the tags on Hugging Face, you'll see it is listed as "cc". This comes from the README[0] metadata. "cc" is not really a license.
[0]: https://huggingface.co/adept/fuyu-8b/blob/main/README.md?cod...
FOSS meets the commercial usage requirement much better. Otherwise the term FOSS would be redundant.
I believe the copyright on AI model weights in the US is not fully established, but so far it has been held that a list of numbers can not be copyrighted, so likely the same applies to model weights. Note that you don't have to enter into an agreement with Adept to use the model.
Alternatively, use and download the weights in Japan that has explicitly no copyright on AI models.
What can you tell us about this:
> Our internal models (based on Fuyu) have extra capabilities related to our product. In particular,
> 1. They can reliably perform OCR on high-resolution images
> 2. They can do fine-grained localization of text and UI elements within those images
> 3. They can answer questions about images of UIs
Is this just a matter of additional fine tuning, or are there architectural differences?
Is there an associated paper? Or more specifically, details on the training dataset? It must have been a mix of text and VLM tasks, otherwise one or the other capability would have rotted during training. But I wonder if they trained off strictly VLM corpora, or also used plain image-text datasets like CLIP. It would be interesting if only the former.
Also makes me wonder if it could be trained on something like CommonCrawl where all the images are retained and interspersed correctly throughout the text. This model could theoretically train just fine off that, and it would unlock a whole new dataset effectively.
And has there been an inspection of what the model is outputting for predicted image "tokens"? Is it correctly predicting projected image patches to any degree of accuracy? And could therefore also generate images inline with text if another de-projection layer was trained?
https://joanfihu.wordpress.com/2023/10/19/evaluating-adepts-...
For anyone interested in contributing to a fully open source alternative, join us at https://github.com/OpenAdaptAI/OpenAdapt
Lots of interesting work to be done, including integrating with Fuyu-8B!
https://github.com/OpenAdaptAI/OpenAdapt/blob/30581e47fa9aec...
https://github.com/OpenAdaptAI/OpenAdapt/issues/246
And Fuyu is under a non-commercial license, so there's not much to be done with it unless someone trains a new Fuyu-architecture model from scratch.
I will admit my ignorance on this topic, and I didn't want us to rush into selecting a license that is inappropriate.
Which one should we choose?
I am also getting even more excited by the explosion of work on open models. I still haven’t adjusted to how good mistral-7B is, and it runs on my Mac without breaking a sweat.
Mistral-7B is incredible for its size!
[1] aya.for.ai
Full disclaimer: I'm a contributor and a big believer in the project.
LLaVA 1.5 is very good, at least at describing images. http://llava.hliu.cc/
This seemed a bit surreal to me, like trying to train an LLM with the outputs of a worse performing smaller LLM.
[0] https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#...
A few other examples include LLaVA[0], IDEFICS[1][2], and CogVLM[3]. Mini-GPT[4] might be another one to look at. I'm pretty sure all of these have better licenses than Fuyu. Fuyu's architecture does sound really interesting, but the license on the pre-trained model is a complete non-starter for almost anything.
[0]: https://github.com/haotian-liu/LLaVA
[1]: https://huggingface.co/blog/idefics
[2]: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct
It is recommended by this developer you go MIT
It depends a lot on what you want the license to do, so I don’t really want to say one way or another.
IANAL, but my understanding is that code without a license effectively has an “all rights reserved” license in the U.S., meaning that it can’t be used for anything at all — even non-commercial work.
Any digital object can be represented as a list of numbers (this is precisely the origin of the term digital). Since there is clearly precedent for copyrighted digital objects (media, software, etc), reducing something to "a list of numbers" is not a useful distinction in regard to copyright law.
Model weights are clearly not in that category. Happy to be corrected if I misremember.
Zipping a file does not grant the copyright protection of the zipped output beyond the copyright of the original file.
Moreover the American federal registrar has officially stated that AI generated artifacts are not eligible for copyright https://www.federalregister.gov/documents/2023/03/16/2023-05....
When it comes to intellectual property there are two methods of protecting it: either you can keep it a trade secret and only use it in house (the secret sauce approach) or you keep things out in the open and seek copyright or patent or trademark protection. You can't have it both ways and even more so with AI co-created artifacts. If they are transparent about all the steps involved and what the humans did then they can seek protection for the human created parts. This also allows others to then replicate these steps and to create similar artifacts.
It sounds like they and many other "AI" teams want patent protection without having to register for it. These teams are trying to write their own licenses to rights they do not have.