How do run finetuned models in a multi-tenant/shared GPU setup?

1 points by iamzycon 1 year ago | 0 comments

I'm considering setting up a fine-tuning and inference platform for Llama that would allow customers to host their fine-tuned models. Would it be necessary to allocate a dedicated infrastructure for each fine-tuned model, or could a shared infrastructure work? Are there any existing solutions for this?

No comments yet