I'm trying to evaluate best serverless solutions for inference without compromising on client usage & reducing idle time on GPU boxes. So far its down to base10, HF, Banana, I'll end up pooling them all & then sending requests between them. For dedicated training boxes Lambda, Modal, Oblivus, Runpod are the contenders.
To track usage you need a credits system. Basically you have to read a number from a file, subtract the cost for the operation, and if it's less than zero throw an exception. It does take a little bit of development, but I don't think you need a whole other startup to handle that. You can do the core part of it for one type of operation/cost in less than a day. Maybe a few days to debug something.