📈In October, we opened ForecastBench, our AI forecasting benchmark, to external submissions.
Here's how the top two teams approached the benchmark:
•
@xai: Minimal scaffolding: give Grok 4.20 (Preview) the question, web/X search, Python REPL, average 8 forecasts
•
@cassi: Multi-stage pipeline: split to sub-questions, retrieval, model ensemble (o3 GPT-5), crowd adjustment
Both are tied at #2 on our leaderboard, behind only superforecasters, and outperforming our baseline LLM runs.