We're a nonprofit aiming to develop more AI interpretability and safety researchers in Asia.

Joined September 2024
26 Photos and videos
WhiteBox Research retweeted
I gave a talk on Optimization Deamons during the unconference last week @whiteboxorg (I am part of their cohort 2) Link to the recording of that talk: fathom.video/share/q4Ts3nBUG…

Wrote on new post on AI systems as Optimization Deamons. It discusses about how general AI systems like LLMs can develop their own goals misaligned with what they were created for. And Also how humans gaming evolution is the perfect example of it.
1
2
90
⏳ Only 3 days left to apply for Cohort 2 of our fellowship!
1
51
Apply now at bit.ly/WBRFC2 and take the next step in your AI safety research journey.

38
Curious to know what could happen if you join our fellowship? In cohort 1, five of our fellows won awards in two AI safety hackathons by Apart Research - learn more about them below!
1
1
1
86
🥉 “Say No to Mass Destruction: Benchmarking Refusals to Answer Dangerous Questions” by Alex Pino, Carl Vinas, JD Dantes, Zmavli Caimle, and Kyle Reynoso won 3rd place in Apart’s AI Security Evals Hackathon. It showed how some models would presume high-risk questions as "safe."
1
33
Apply to Cohort 2 of our fellowship and learn how to do AI safety research like the above: bit.ly/WBRFC2 🚀

27
👀Wondering what you’ll learn in WhiteBox’s fellowship? Take a look at our curriculum:
1
1
2
70
You’ll also get a taste of topics like model evaluation and steering, sparse autoencoders (SAEs), and reinforcement learning from human feedback (RLHF). Learn more about the fellowship through our primer at bit.ly/WBFellowshipC2Primer.
1
16
🚀Apply now for Cohort 2 of our fellowship at bit.ly/WBRFC2 !

16
💬 Check out these testimonials from the first cohort of our AI Interpretability Fellowship! Their experience could be yours. ⬇️
1
2
25
"WhiteBox is doing important work in growing the field of AI safety in Southeast Asia, which has potential talent that is often overlooked." - Clement Neo, Research Mentor
1
18
"The people I've met during the fellowship have left a profound impact on me... I've had an insane amount of growth both professionally and personally through the fellowship." - Kat Compendio, Trials Phase Graduate
1
12
"If you're even a bit interested in knowing how LLMs work, how you can contribute to AI Safety, or even just meeting and learning with a cohort, then you'd probably enjoy being part of WhiteBox!" - Cohort 1 Participant
11