Feature effects don’t generalize across domains.
Finance color coding conventions (e.g., blue inputs, black formulas) aren't significantly impactful on model rankings arena-wide.
But zoom into Finance prompts and it's the single strongest predictor of winning.
Even then, expert raters disagree with crowd preferences nearly half the time.