ASK Sathvik

ASK Sathvik

26 Photos and videos

Tweets

WhiteBox Research retweeted

ASK Sathvik @AskSathvik

14 Feb 2025

I gave a talk on Optimization Deamons during the unconference last week @whiteboxorg (I am part of their cohort 2) Link to the recording of that talk: fathom.video/share/q4Ts3nBUG…

ASK Sathvik @AskSathvik

8 Feb 2025

Wrote on new post on AI systems as Optimization Deamons. It discusses about how general AI systems like LLMs can develop their own goals misaligned with what they were created for. And Also how humans gaming evolution is the perfect example of it.

WhiteBox Research

WhiteBox Research @whiteboxorg

6 Dec 2024

⏳ Only 3 days left to apply for Cohort 2 of our fellowship!

WhiteBox Research

WhiteBox Research @whiteboxorg

6 Dec 2024

Apply now at bit.ly/WBRFC2 and take the next step in your AI safety research journey.

WhiteBox Research

WhiteBox Research @whiteboxorg

5 Dec 2024

Curious to know what could happen if you join our fellowship? In cohort 1, five of our fellows won awards in two AI safety hackathons by Apart Research - learn more about them below!

more replies

WhiteBox Research

WhiteBox Research @whiteboxorg

5 Dec 2024

🥉 “Say No to Mass Destruction: Benchmarking Refusals to Answer Dangerous Questions” by Alex Pino, Carl Vinas, JD Dantes, Zmavli Caimle, and Kyle Reynoso won 3rd place in Apart’s AI Security Evals Hackathon. It showed how some models would presume high-risk questions as "safe."

WhiteBox Research

WhiteBox Research @whiteboxorg

5 Dec 2024

Apply to Cohort 2 of our fellowship and learn how to do AI safety research like the above: bit.ly/WBRFC2 🚀

WhiteBox Research

WhiteBox Research @whiteboxorg

4 Dec 2024

👀Wondering what you’ll learn in WhiteBox’s fellowship? Take a look at our curriculum:

more replies

WhiteBox Research

WhiteBox Research @whiteboxorg

4 Dec 2024

You’ll also get a taste of topics like model evaluation and steering, sparse autoencoders (SAEs), and reinforcement learning from human feedback (RLHF). Learn more about the fellowship through our primer at bit.ly/WBFellowshipC2Primer.

WhiteBox AI Interpretability Fellowship Primer (Cohort 2)

We suggest you view this primer on a laptop/desktop. If you're on mobile, we recommend viewing it on your Google Docs app. WhiteBox AI Interpretability Fellowship Fellowship primer (Cohort 2) Learn...

docs.google.com

WhiteBox Research

WhiteBox Research @whiteboxorg

4 Dec 2024

🚀Apply now for Cohort 2 of our fellowship at bit.ly/WBRFC2 !

WhiteBox Research

WhiteBox Research @whiteboxorg

3 Dec 2024

💬 Check out these testimonials from the first cohort of our AI Interpretability Fellowship! Their experience could be yours. ⬇️

more replies

WhiteBox Research

WhiteBox Research @whiteboxorg

3 Dec 2024

"WhiteBox is doing important work in growing the field of AI safety in Southeast Asia, which has potential talent that is often overlooked." - Clement Neo, Research Mentor

WhiteBox Research

WhiteBox Research @whiteboxorg

3 Dec 2024

Learn more about Cohort 2 of our fellowship at bit.ly/WBFellowshipC2Primer and apply now at bit.ly/WBRFC2 ! The deadline to apply is December 9 (11:59pm, GMT 8).

WhiteBox AI Interpretability Fellowship Primer (Cohort 2)

docs.google.com

WhiteBox Research

WhiteBox Research @whiteboxorg

3 Dec 2024

"The people I've met during the fellowship have left a profound impact on me... I've had an insane amount of growth both professionally and personally through the fellowship." - Kat Compendio, Trials Phase Graduate

WhiteBox Research

WhiteBox Research @whiteboxorg

3 Dec 2024

"If you're even a bit interested in knowing how LLMs work, how you can contribute to AI Safety, or even just meeting and learning with a cohort, then you'd probably enjoy being part of WhiteBox!" - Cohort 1 Participant