Tech stack for fine-tuning LLMs

25 points by behat 3 years ago | 4 comments

If you have tried training/fine-tuning LLMs on your data, what tech and infra stack did you use? My use case is training on data generated by my own activity

sirodoht 3 years ago |

I tried to do something similar, following this blog post [1] however I didn't manage due to lack of GPUs. I tried to rent 4 A100s, which is what the author is doing, but there weren't any available. I signed up to 7 different cloud providers, including AWS, lambdalabs, vast.ai, coreweave, latitude.sh, tensordock. Eventually, I settled with a few A40s, but memory requirements weren't even close.

[1]: https://www.izzy.co/blogs/robo-boys.html

behat 3 years ago | |

Thank you for sharing your experience. The linked blog post is great!

PaulHoule 3 years ago |

I've had success with the method described in

https://huggingface.co/docs/transformers/training

for both classification and regression problems with the caveats that (i) the default learning rate is too damn high (easy to fix) and (ii) with a great deal of effort I got the classification problem to perform as well as a classifier that uses

https://sbert.net/

and an SVM from scikit-learn. You might get different results with another problem, but my problem is noisy and has an upper limit to what accuracy is possible. Fine-tuning a model takes maybe 30 minutes, the classical classifier is more like 30 seconds, and the ratio of development time that went into these is similar.

behat 3 years ago | |

Thank you for sharing! The HF docs seem easy to follow. My application is text generation itself, so may have different results.