Ask HN: What's your serverless stack for AI/LLM apps in production?

3 points by fazlerocks 1 year ago | 3 comments

I've been building AI applications using Next.js, GPT, and Langchain. As I'm approaching production scale, I'm curious how others are handling deployment infrastructure.

Current stack: - Next.js on Vercel - Serverless functions for AI/LLM endpoints - Pinecone for vector storage

Questions for those running AI in production:

1. What's your serverless infrastructure choice? (Vercel/Cloud Run/Lambda)

2. How are you handling state management for long-running agent tasks?

3. What's your approach to cost optimization with LLM API calls?

4. Are you self-hosting any components?

5. How are you handling vector store scaling?

Particularly interested in hearing from teams who've scaled beyond prototype stage. Have you hit any unexpected limitations with serverless for AI workloads?

lunarcave 1 year ago |

I have a hosted code-first agent builder platform in production, so I respond these question a lot from our customers.

1. Probably the best is fly.io IMHO. It has a nice balance between running ephemeral containers that can support long running tasks, and quickly booting up to respond to a tool call. [1]

2. If your task is truly long running, (I'm thinking several minutes), probably wise to put trigger [2] or temporal [3] under it.

3. A mix of prompt caching, context shedding, progressive context enrichment [4].

4. I'm building a platform that can be self-hosted to do a few of the above, so I can't speak to this. But most of my customers do not.

5. To start with, a simple postgres table and pgvector is all you need. But I've recently been delighted with the DX of Upstash vector [5]. They handle the embeddings for you and give you a text-in, text-out experience. If you want more control, and savings on a higher scale, have heard good things about marqo.ai [6].

Happy to talk more about this at length. (E-mail in the profile)

[1] https://fly.io/docs/reference/architecture/

[2] trigger.dev

[3] temporal.io

[4] https://www.inferable.ai/blog/posts/llm-progressive-context-...

[5] https://upstash.com/docs/vector/overall/getstarted

[6] https://www.marqo.ai/

fazlerocks 1 year ago | |

Thanks for the detailed response!

I actually tried fly.io briefly with Next.js apps and the deployment experience was smooth. Really interesting to hear you're using it for AI workloads too.

For fly.io with AI workloads: Are you using their Machines or Apps? I'm particularly curious about how you're handling cold starts for LLM tasks, since that was one thing I loved about fly.io for regular Next.js deployments - the cold starts were minimal.

lunarcave 1 year ago | | |

I'm using the apps mostly. But I think you can use machines for more lower level use cases. IIRC, apps run machines.