Which LLMs are best in understanding consequences? There's now a benchmark for that: Upright's CoCoCo benchmark measures LLM's ability to consistently quantify and compare impacts. There has been solid progress during the last years but there is still also a long way to go.
Link to tech report in thread. I'm the author of the report, don't hesitate to ask questions.