Try SambaNova chat: 1T param LLM, 500 tokens/SEC

Try SambaNova chat: 1T param LLM, 500 tokens/SEC(coe-1.cloud.snova.ai)

1 points by germanjoey 2 years ago | 1 comment

germanjoey 2 years ago |

We're showing off our 1.05T param Composition of Experts LLM! It's 150 experts running on 1 node consisting of 8 SN40L RDU chips.

Each of our nodes has a huge amount of DDR attached, in addition to copious amounts of on-chip HBM and SRAM. This allows the system to switch between a variety of different models of different sizes and architectures at lightning speed. A highlight is one based on Llama2 7b, similar to the Groq demo, but executing with bf16/fp32 instead of int8. (And using only 8 chips instead of 568!)