I don't care how big your context window is.
While it’s true that running a single complex action on a model with a large context window would lead to a more favorable output than a smaller model all other things being equal, that assumes that you actually need to run that complex action as a single task.
The reality is, twenty smaller models running in parallel and outputting a smaller number of tokens will almost always be faster than a single large model running on all the data all at once.
And I’ll do you one better—small models can even improve the performance of your AI agents too.
Strategies like horizontal task splitting minimize the number of input tokens, output tokens, and model size required to complete an operation—reducing runtime, maintaining (or even reducing) costs, and delivering more deterministic responses for their respective tasks.
The secret? Curating the right high quality data to make it work.