I burned in🔥2000$ in finetuning so you don't have to.
I fine-tuned models with
@OpenAI and
@anyscalecompute API endpoints with 50million tokens. Here are the results I wish I knew before getting into finetuning.
If you just want a quick snapshot, look at the figure. A longer explanation follows, explaining my findings.
I am not an expert and not deep into theory of AI models. I just want to get the BEST model performance at the CHEAPEST possible price for my USE-CASE. And quickly deploy that to prod.
I picked one specific simple USE-CASE. Summarizing text in a very specific tone, voice and a very specific structure.
Trained both models with close to 50M tokens (~37M words). In short,
- Anyscale costs 40X cheaper to finetune.
- Anyscale costs 56x cheaper to finetune.
Comparing the outputs, I get on par performance from llama-13b-fine-tuned as gpt-3.5-fine-tuned. Finetuning smaller models is clearly the way to go for simpler use-cases!
I don't understand OpenAI's offering for fine-tuning here. They need to step-up the game. They need to either reduce the price or offer flexibility to compete with open-source fine-tuning models.
I am going to run an another experiment which is a way more complicated use-case. It would be interesting to see who wins here. I suspect
@OpenAI Turbo will have an edge here (otherwise the pricing does not make sense).
P.S : I also know I can finetune models locally & directly without API. Like I said, I am not deep into theory yet. I tried this in
@huggingface with their auto-train framework. But it was just not as easy as plugging in via API calls. There were adapters and stuff, and I got quickly lost. But I am reading up and will try start including them in the comparisons too. If anyone is aware of other managed/otherwise solutions for finetuning let me know please.