If you're using Cursor's Composer 2.5, you should know about one key limitation. The LLM was trained through self-distillation, where the same model acts as both the teacher and the student.
Both models get the same prompt with the difference that the teacher gets additional context. This is a very effective and cost-efficient method for fine-tuning LLMs without the need to distill from expensive and larger teachers (e.g., Opus 4.7).
However, one key limitation of self-distillation is that it trades efficiency for flexibility. A non-distilled model has more tendency to explore different solutions when it generates tokens that indicate uncertainty. Self-distillation, on the other hand, forces the model to create a highly confident answer in one go.
What does it mean in practice? This works well for around 80% of everyday tasks, which are within the distribution of the model's training distribution. For edge cases and especially very complex planning tasks that are unique. For those tasks, frontier AI models (e.g., Opus 4.7 and GPT-5.5) are more suitable.
This matches the experience of other developers who have been using Composer 2.5 in the past week. Very good model, but with tradeoffs.
Optimizing LLMs for concise answers can destroy their ability to explore alternative solutions on difficult problems. New study reveals the hidden cost of self-distillation.
bdtechtalks.com/2026/04/13/l…