Research & Engineering at Forum AI

Joined December 2011
1 Photos and videos
Matt Wilde retweeted
Scientific research is fundamental to advancing civilization and helping people globally to solve the most critical problems, from medicine to materials, from brain science to physics, and much beyond. This is only possible when scientists have access to the best tools of the time to conduct scientific research, including having access to AI-based tools.
120
471
3,088
193,646
That Apple / OpenAI partnership seems to be going great
9
Matt Wilde retweeted
I’ve been helping @TheForumAI build NewsBench, a benchmark for how frontier AI covers the news that matters. We put the leading models through 3,000 prompts and scored each one on accuracy, neutrality, & source quality. See where each model landed: byforum.com/newsbench
1
2
3
186
Matt Wilde retweeted
Excited to have been part of this work exploring better ways to evaluate AI on hard, contested questions. For consequential topics, grounding evaluation in expert judgment feels especially important. Proud to have contributed and excited to see what comes next with @ByForumAI.
How can we teach AI the right way to handle super contested questions on consequential topics like politics, news, finance, personal health, etc? I've been working with @ByForumAI to develop a way to teach AI models the judgments of some of the world's foremost experts in these areas. I'm thrilled to share our whitepaper detailing the method we've come up with after many months of tinkering and testing. Forum starts by recruiting an incredible cast of world experts of all partisan and ideological stripes---people who are bring their own beliefs to bear on hard problems, but who are also capable of intellectual honesty in the face of disagreements. We worked through tons of hard examples with them of how AI models respond to challenging questions, developing and iterating on a rubric that captured their judgments---not on whether the answer was "correct" but on whether it bore the hallmarks of rigor. Did it exhibit neutrality by seriously engaging with all relevant arguments? Did it draw on high-quality information sources? Where there are objective facts to bring to bear, did it report them accurately? Then, the engineers at Forum developed a unique process to take the judgment of these experts and teach it to LLM judges who could apply it at scale. We're able to show that our judges perform considerably better at our task than default LLMs (i.e., if we ask Claude or ChatGPT to simply evaluate the same responses but without our special training). We've put a ton of work into validating this process, far more than I've seen in any other eval company. There is certainly more work to be done, but we now have a process that produces LLM evaluations that do a good job of replicating what our human experts say. Check out way more details in the paper here: byforum.com/whitepapers/dist…
2
4
412
@a1zhang's Mismanaged Genius hypothesis asks if poor LLM performance on certain tasks is due to a capability cap or poor utilization. At Forum AI, we've been researching what it would take to improve how LLMs handle high-stakes, subjective domains. We've found that first working to effectively manage a small set of humans unlocks the ability to use LLMs to scale to strong performance.
1
1
3
48
Check out way more details in the paper here: byforum.com/whitepapers/dist…

2
31
Matt Wilde retweeted
How can we teach AI the right way to handle super contested questions on consequential topics like politics, news, finance, personal health, etc? I've been working with @ByForumAI to develop a way to teach AI models the judgments of some of the world's foremost experts in these areas. I'm thrilled to share our whitepaper detailing the method we've come up with after many months of tinkering and testing. Forum starts by recruiting an incredible cast of world experts of all partisan and ideological stripes---people who are bring their own beliefs to bear on hard problems, but who are also capable of intellectual honesty in the face of disagreements. We worked through tons of hard examples with them of how AI models respond to challenging questions, developing and iterating on a rubric that captured their judgments---not on whether the answer was "correct" but on whether it bore the hallmarks of rigor. Did it exhibit neutrality by seriously engaging with all relevant arguments? Did it draw on high-quality information sources? Where there are objective facts to bring to bear, did it report them accurately? Then, the engineers at Forum developed a unique process to take the judgment of these experts and teach it to LLM judges who could apply it at scale. We're able to show that our judges perform considerably better at our task than default LLMs (i.e., if we ask Claude or ChatGPT to simply evaluate the same responses but without our special training). We've put a ton of work into validating this process, far more than I've seen in any other eval company. There is certainly more work to be done, but we now have a process that produces LLM evaluations that do a good job of replicating what our human experts say. Check out way more details in the paper here: byforum.com/whitepapers/dist…

1
6
23
3,798
Matt Wilde retweeted
8 Dec 2023
This talk by Angela Fan on Llama2 is so good. 30 min, she just tells you all the things. youtu.be/NvTSfdeAbnU?si=ZNoJ…
7
201
1,384
231,444
Matt Wilde retweeted
8 Jan 2023
Everybody wanna align AI, nobody wanna align corporations. What gives
48
32
372
43,946