Christy Bergman

Christy Bergman

Photos and videos

Tweets

Christy Bergman @cbergman

Apr 15

Thank you @michelleefang ♥️

Michelle Fang 🌁

@michelleefang

Apr 13

Replying to @michelleefang

Thursday 4/16 ‣ Agent Builders Breakfast - Founders & Builders in SOMA, SF luma.com/xmv1m6qt @miradu ‣ AI Breakfast: Build your AI workspace in one morning luma.com/aibreakfast2 @rajoshighosh ‣ Platform Engineering & AI luma.com/intuitossmtvapr2026 @cbergman ‣ Voice AI Meetup: Medical Mode luma.com/609tv1po @ryanseams @theaievangelist ‣ Claws Out🦞 GMI ClawHub Demo Night luma.com/tklag2kv @nicoleegong @gmi_cloud @yuqih ‣ Voice AI builders night luma.com/voice_builders @modal @braintrust ‣ AI Meets HumanX Social w/ Rootly AI, MongoDB, Runpod, & More! luma.com/n22qf90w ‣ Slides Down: AI Founders Party by DigitalOcean & Zendesk Apr 16 luma.com/jzopt9nv @neffko @julianachyzhova @yfilipch

Michelle Fang 🌁

Christy Bergman retweeted

Michelle Fang 🌁

@michelleefang

Apr 13

Agent Builders Breakfast - Founders & Builders in SOMA, SF · Luma

Agent Builders Breakfast - Founders & Builders in SOMA, SF Kick off your morning with conversations and impromptu demos with others navigating the chaos of…

luma.com

1,779

Sebastian Raschka

Christy Bergman retweeted

Sebastian Raschka

@rasbt

Mar 31

x.com/i/article/203897816338…

418

2,867

623,022

Christy Bergman

Christy Bergman @cbergman

3 Dec 2025

💓@AndrewYNg Note to self: look here before next CFP submission or helping others. Ask the model to summarize best advice per conference CFP rules and topic submitter wants to talk about...

Andrew Ng

@AndrewYNg

24 Nov 2025

Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better. I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was painfully slow. We wanted to see if an agentic workflow can help researchers iterate faster. When we trained the system on ICLR 2025 reviews and measured Spearman correlation (higher is better) on the test set: - Correlation between two human reviewers: 0.41 - Correlation between AI and a human reviewer: 0.42 This suggests agentic reviewing is approaching human-level performance. The agent grounds its feedback by searching arXiv, so it works best in fields like AI where research is freely published there. It’s an experimental tool, but I hope it helps you with your research. Check it out here: paperreview.ai

Christy Bergman

Christy Bergman @cbergman

20 Jun 2025

Don't🍷about #OOM running out of memory! @huggingface is making it easier to run huge #TransformerandDiffuser models on consumer GPUs w quantization, tensor parallelism, offloading. Hear from @stevhliu how to fit these models on your setup. lu.ma/taf3lmvj #HuggingFace

Towards Data Science

Christy Bergman retweeted

Towards Data Science

@TDataScience

9 Mar 2025

Thankfully @cbergman's article can help you identify key convos with an AI hack to perform semantic clustering simply by prompting LLMs! towardsdatascience.com/tutor…

Tutorial: Semantic Clustering of User Messages with LLM Prompts | Towards Data Science

As a Developer Advocate, it’s challenging to keep up with user forum messages and understand the big picture of what users are saying. There’s plenty of valuable content — but how can you quickly...

towardsdatascience.com

2,346

Sam Altman

Christy Bergman retweeted

Sam Altman

@sama

27 Feb 2025

GPT-4.5 is ready! good news: it is the first model that feels like talking to a thoughtful person to me. i have had several moments where i've sat back in my chair and been astonished at getting actually good advice from an AI. bad news: it is a giant, expensive model. we really wanted to launch it to plus and pro at the same time, but we've been growing a lot and are out of GPUs. we will add tens of thousands of GPUs next week and roll it out to the plus tier then. (hundreds of thousands coming soon, and i'm pretty sure y'all will use every one we can rack up.) this isn't how we want to operate, but it's hard to perfectly predict growth surges that lead to GPU shortages. a heads up: this isn’t a reasoning model and won’t crush benchmarks. it’s a different kind of intelligence and there’s a magic to it i haven’t felt before. really excited for people to try it!

3,124

3,598

40,327

5,525,319

DeepSeek

Christy Bergman retweeted

DeepSeek

@deepseek_ai

28 Feb 2025

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster ⚡ 40 GiB/s peak throughput per client node for KVCache lookup 🧬 Disaggregated architecture with strong consistency semantics ✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1 📥 3FS → github.com/deepseek-ai/3FS ⛲ Smallpond - data processing framework on 3FS → github.com/deepseek-ai/small…

GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the...

A high-performance distributed file system designed to address the challenges of AI training and inference workloads. - deepseek-ai/3FS

github.com

523

1,236

10,193

3,211,678

Christy Bergman

Christy Bergman @cbergman

12 Feb 2025

I just published a blog in #DataScienceCollective, the new free open version of @TDataScience. Here, I look at 9 different discords and prompt #LLMs to do #Clustering on user messages. linkedin.com/posts/christybe…

Christy Bergman

Christy Bergman @cbergman

18 Feb 2025

TL;DR my blog is about how to go from (data science code) → (AI prompts LLMs) for the same results—just faster and with less effort! Here is the @TDataScience archive link: towardsdatascience.com/tutor…

Tutorial: Semantic Clustering of User Messages with LLM Prompts | Towards Data Science

towardsdatascience.com

1,704

Christy Bergman

Christy Bergman @cbergman

18 Dec 2024

Thanks @pacoid ! I'd better get started preparing my talk for that! #SonomaAI #FoodWineAI

This tweet is unavailable

Christy Bergman

Christy Bergman @cbergman

10 Nov 2024

🤔hmm, but this paper shows w8a8-fp (symmetric weight and dynamic per-token activation quantization in fp8) is "essentially lossless" in accuracy. arxiv.org/pdf/2411.02355

Christy Bergman @cbergman

10 Nov 2024

Interesting! The most common inference quantization int8/fp8 is not necessarily the best. bf16 #quantization is a way better accuracy/latency tradeoff.

168

Christy Bergman

Christy Bergman @cbergman

11 Nov 2024

Seems devil is in the details for accuracy/latency tradeoff decisions. #w8a8fp: 1. Weights quantized using usual symmetric fp8 method. 2. Activations quantized without pre-calibration i.e. symmetric quantization parameters calculated on-the-fly during model inference.

120

Christy Bergman

Christy Bergman @cbergman

10 Nov 2024

Interesting! The most common inference quantization int8/fp8 is not necessarily the best. bf16 #quantization is a way better accuracy/latency tradeoff.

Aidan McLaughlin

@aidan_mclau

12 Aug 2024

aidan bench update: i ran llama 3.1 405b at bf16 (shoutout to @hyperbolic_labs) and we got a *way* better score. 405b fp8 is around gpt-4o-mini-level 405b bf16 beats claude-3.5-sonnet give me bf16 or give me death

235

Christy Bergman

Christy Bergman @cbergman

17 Oct 2024

Nice to meet and chat w/you too! @adamse @felipehoffa It was fun to get some hands-on time and see what's new with @awscloud Bedrock.

Adam Seligman

@adamse

16 Oct 2024

Replying to @felipehoffa

@felipehoffa @cbergman so great to see you at the @awscloud GenAI Loft today!

530

Christy Bergman

Christy Bergman @cbergman

29 Sep 2024

I just tried this hack. Thanks, I really needed that! 😂

Thomas Wolf

@Thom_Wolf

29 Sep 2024

Self-care life hack: if you feel a bit down/tired, paste the url of your website/linkedin/bio in Google's NotebookLM to get 8 min of realistically sounding deep congratulations for your life and achievements from a duo of podcast experts 😂

8:04

108

swyx

Christy Bergman retweeted

swyx

@swyx

21 Sep 2024

CUDA MODE hackathon today! Here's @karpathy on the 🏖️ origin story of llm.c, and what it hints at for the fast, simple, llm-compiled future of custom software.

23:28

616

97,444

Christy Bergman

Christy Bergman @cbergman

4 Sep 2024

Interesting take-down how to do LoRA properly, quickly, with less memory, on all layers @danielhanchen's tweet and blog unsloth.ai/blog/contpretrain… ! > For continued pretraining, I advise people to train on all layers (inc gate) lm_head, embed_tokens, use RS LoRA, use rank>=256

Continued LLM Pretraining with Unsloth

Make a model learn a new language by doing continued pretraining with Unsloth using Llama 3, Phi-3 and Mistral.

unsloth.ai

Daniel Han

@danielhanchen

18 May 2024

My take on "LoRA Learns Less and Forgets Less" 1) "MLP/All" did not include gate_proj. QKVO, up & down trained but not gate (pg 3 footnote) 2) Why does LoRA perform well on math and not code? lm_head & embed_tokens wasn't trained, so domain shifts not modelled. Also reason why "LoRA Forgets Less". Use "modules_to_save" in HF PEFT or "lm_head", "embed_tokens" in @UnslothAI 3) Code rank=256 used α=32 (too small!) (pg 18), but Maths α=2*r=512. RS LoRA paper showed α/sqrt(r) needed for larger ranks. & common practice is 2*r. So also why Code did worse than Maths 4) Extrapolating Maths vs fft looks good. Small datasets LoRA>fft, but I theorize that's because of reason 2 5) LoftQ & PiSSA paper init LoRA from SVD(W) => papers show comparable perf of LoRA 6) LoRA paper shows B matrix needs larger lr. DoRA (mentioned in paper) learns these scalars. TLDR: Code worse since α=32 is too small. No embed_tokens, lm_head (or layernorms), not even gate_proj? Better init & lr scaling can help For continued pretraining, I advise people to train on all layers (inc gate) lm_head, embed_tokens, use RS LoRA, use rank>=256 LoRA paper: arxiv.org/abs/2405.09673 RS LoRA paper: arxiv.org/pdf/2312.03732 LoRA paper: arxiv.org/pdf/2402.12354 PiSSA paper: arxiv.org/pdf/2404.02948 DoRA paper: arxiv.org/pdf/2402.09353

109

The AI Conference

Christy Bergman retweeted

The AI Conference

@AIconference

7 Aug 2024

🌟Join our expert panel at The AI Conference 2024 to explore advanced RAG (Retrieval-Augmented Generation) techniques. Learn how integrating information retrieval with generative models is revolutionizing AI, making it more contextually rich and useful in real-world applications. Don’t miss out—register now to be part of the future of AI! aiconference.com #developers #TAIC2024 #data #programmers #software #innovators #techindustry #engineer #scientists #theaiconference

2,189

Zilliz

Christy Bergman retweeted

Zilliz

@zilliz_universe

30 Jul 2024

Monday Meetup is right around the corner! 🗣 Join us in SF on August 5 for exciting talks: 🔢 Using Ray Data for Multimodal Embedding Inference with @cbergman 📐 A Different Angle: Retrieval Optimized Embedding Models @marqo_ai 🛠 Building the Future of Neural Search: How to Train State-of-the-Art Embeddings with @mixedbreadai 🔗 Save your spot: lu.ma/3q2brqp8 #Meetup #AI #RAG #SFevents

551