when building ai products, the key is having some reliable signal about what is working and what is not. whether that signal comes from evals, user feedback, your own taste, performance metrics, or just gut feel does not really matter as long as it is giving you actionable information about where to focus your efforts.
the mistake is either having no signal at all (just building blindly) or getting caught up in the methodology of the signal rather than its utility. a formal eval that tells you nothing useful is worse than informal feedback that clearly points to real problems
i did not expect to wake up this morning and write a blog post