Ask HN: What's your serverless stack for AI/LLM apps in production? I've been building AI applications using Next.js, GPT, and Langchain. As I'm approaching production scale, I'm curious how others are handling deployment infrastructure. Current stack: - Next.js on Vercel - Serverless functions for AI/LLM endpoints - Pinecone for vector storage Questions for those running AI in production: 1. What's your serverless infrastructure choice? (Vercel/Cloud Run/Lambda) 2. How are you handling state management for long-running agent tasks? 3. What's your approach to cost optimization with LLM API calls? 4. Are you self-hosting any components? 5. How are you handling vector store scaling? Particularly interested in hearing from teams who've scaled beyond prototype stage. Have you hit any unexpected limitations with serverless for AI workloads? |