This is super interesting, I'm particularly excited for this one as it may allow teams to scale this architecture for VLAs (vision language action models), and having sparser models means more real-time actions on a locally hosted model
Why does this not have (day-one) support for Ollama? The previous model is on there? Is it related to the ongoing refactor work or are people abandoning Ollama for other LLM engines?
Wow, this is fucking phenomenal. I fed it a long transcript asking it to create a summary and it executed it extremely well. For an 8B model this is quite impressive.