The scariest AI failures aren't the obvious hallucinations, they're the answers that look completely reasonable.
Everything runs exactly as intended. The SQL executes, the chart renders, the report is shared, and a business decision is made. No errors or warnings. Then, four days later, someone realizes the numbers were completely wrong.
We ran a benchmark against a complex 28-table insurance data model to see how often this happens.
Direct text-to-SQL returned plausible-looking business logic errors. In one case, it returned a result nearly 8x HIGHER than the correct answer. In another, costs from one policy were silently attributed to a completely different one.
Mosaic, on the other hand, answered every single question correctly.
Once bad data makes it into a report, nobody blames the AI. They blame the person who published it.
TL;DR: Mosaic answered every question correctly. Direct SQL returned numbers that were off by 8x, with no error to flag it.
See the full methodology and results here:
ow.ly/Z2tE50Z9RNv
We'd love to hear how other teams are approaching this.
#SemanticLayer #EnterpriseAI #DataGovernance