given how overzealous the rejection classifier is, and the fact that they are silently degrading ML-adjacent outputs via prompts, steering vectors, and PeFT
who the hell would want to use Fable in any kind of real codebase?
The API does show a high rate of refusals, especially on bio and cyber-related questions. For example, on Program Bench, Fable refused every single task.