Show HN: Host any GGUF model in one command(github.com) Running a GGUF model locally usually means writing custom inference code or wrestling with llama.cpp's CLI flags every time you want to test something. Existing OpenAI-compatible servers often require Docker, complex configuration files, or GPU support. The gap between "I have a .gguf file" and "I have a working API endpoint" is wider than it should be. A simple CLI tool to serve GGUF models as an endpoint: gguf-serve To cut this short, we asked Neo to build gguf-serve. Point it at any .gguf file, run the server, and immediately get OpenAI-compatible endpoints that work with any client library or tool that speaks the OpenAI API format. |
No comments yet