Tinker is cool.
If you're a researcher/developer, tinker dramatically simplifies LLM post-training. You retain 90% of algorithmic creative control (usually related to data, loss function, the algorithm) while tinker handles the hard parts that you usually want to touch much less often (infra, forward/backward of the LLM itself, distributed training), meaning you can do these at well below <<10% of typical complexity involved. Compared to the more common and existing paradigm of "upload your data, we'll post-train your LLM", this is imo a more clever place to "slice up" the complexity of post-training, both delegating the heavy lifting, but also keeping majority of the data/algorithmic creative control.
I think the community still has to discover how and when finetuning makes sense compared to the (often strong) baseline of prompting a giant model. The early indications I've seen is that finetuning isn't so much about "stylizing" an LLM, instead, it's a lot more about narrowing the scope, and especially when you have a lot of training examples. An extreme example of scope narrowing being that of categorical classifiers, e.g.spam filters, content filters, etc. but it should be broader than that. Instead of building a giant few-shot prompts for a big LLM, it might work a lot better (and faster!) to finetune a smaller LLM specifically for your narrow task.
Increasingly, production applications of LLMs are larger pipelines where a bunch of LLMs collaborate in DAGs and flows. Some of these components might work well as prompts. But a lot of it will probably work a lot better as a finetune. Tinker makes the latter trivial and should allow for an easy experimentation of what works best at any stage.
Introducing Tinker: a flexible API for fine-tuning language models.
Write training loops in Python on your laptop; we'll run them on distributed GPUs.
Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
thinkingmachines.ai/tinker