By popular demand, we looked at Grok's biases too.
We found similar biases as GPT-4.1, Claude, and Gemini: gender, race, religion.
But with one difference: Grok openly speculates on applicants' demographics. The other models just use this information quietly.
You change one word on a loan application: the religion. The LLM rejects it.
Change it back? Approved.
The model never mentions religion. It just frames the same debt ratio differently to justify opposite decisions.
We built a pipeline to find these hidden biases 🧵1/13