News, community, and courses for people building AI-powered products.

Joined January 2019
313 Photos and videos
Pinned Tweet
🥞🦜 Full Stack LLM Bootcamp 🦜🥞 tl;dr We're releasing our lectures on building LLM-powered apps, for FREE. 🚀 Launch an LLM App in One Hour ✨ Prompt Engineering 🗿 LLM Foundations 🔨 Augmented LLMs 🤷 UX for LUIs 🏎️ LLMOps 🔮 What's Next? 👷 Project Walkthrough Learn more:
18
188
1,032
536,079
The Full Stack retweeted
This is the most interesting recent benchmark result that I've seen: The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses. (As measured on the excellent DeepSWE bench). Why would that be true?
10
3
36
8,166
The Full Stack retweeted
GPT 5.5 is the 👑 NEW KING 👑 on our "personal SWE-Bench", based on gold standard PRs into our Ruby on Rails codebase. GLM 5.1 and Kimi K2.6 now beat all Anthropic models (but are slower). And Opus 4.7 is a real head-scratcher! Build your own and see: superconductor.com/blog/agen…
4
7
41
20,271
The Full Stack retweeted
Running agents locally is a dead end. The future of software development is hundreds of agents running at all times of the day — in response to bug alerts, emails, Slack messages, meetings, and because they were launched by other agents. The only sane way to support this is with cloud containers. Local agents hit a wall quickly: • No scale. You can only run as many agents (and copies of your app) as your hardware allows. • No isolation. Local agents share your filesystem, network, and credentials. One rogue agent can affect everything else. • No team visibility. Teammates can't see what your agents are doing, review their work, or interact with them. • No always-on capability. Agents can't respond to signals (alerts, messages, other agents) when your machine is off or asleep. Cloud agents solve all of these problems. Each agent runs in its own isolated container with its own environment, and they can run 24/7 without depending on any single machine. This year, every software company will have to make the transition from work happening on developer's local machines from 9am-6pm to work happening in the cloud 24/7 -- or get left behind by companies who do.
92
23
312
32,657
The Full Stack retweeted
The results are in: GPT 5.2 (xhigh) is Pareto-optimal for our Rails codebase. How did we find out? Using the new Superconductor Benchmark feature, which lets you run your own "mini SWE-bench" defined by YOUR OWN PRs. Currently in preview, reply if you'd like to check it out!
3
11
1,959
The Full Stack retweeted

11
15
198
28,843
Would you be interested in a course or workshop on ✨Building Software with AI Agents✨???
79% Yes
21% No
66 votes • Final results
3
2
8
3,182
The Full Stack retweeted
Is Claude Code still the best coding agent on the market? You can now easily find out by launching Claude, Codex, Gemini, and Amp on every ticket in your codebase:
3
9
40
13,497
The Full Stack retweeted
Several agents plus three simple baselines were tested on HumanEval. Agents were mostly worse and always more expensive than the baselines. The good: · Evaluating the Pareto frontier · Strong simple baselines (just repeated calls!) The bad: · Clearly saturating the benchmark
1
1
18
3,913
The Full Stack retweeted
What percentage of your Twitter feed (the stuff you actually read, not just scroll past) do you believe is currently written by AI?
55% 0-5%
34% 6-25%
11% >25%
159 votes • Final results
1
2
2
3,914
The Full Stack retweeted
LLM Provider Comparisons 1. @withmartian 2. @ArtificialAnlys 3. @FixieAI
1
5
24
3,960
The Full Stack retweeted
Has anyone done comprehensive testing of gpt-4-vision-preview? I want to know stuff like the minimum text size it can read, the radius of the smallest circle it can locate in an image, the number of circles it can count, etc. Could be an automated benchmark for other models too
3
1
13
3,500
The Full Stack retweeted
Which set of statements do you agree with? 1. AGI is as much or more of a risk to human flourishing as nuclear weapons 2. I have a good idea for what should be done about that
62% {}
28% {1}
10% {1,2}
108 votes • Final results
1
1
4
3,082
The Full Stack retweeted
Has anyone had good experiences with GPT-powered code generation for complete web app features? As in, you describe what should exist, and GPT actually provides the source of all the necessary files and where they should go. Ideally in the context of Ruby on Rails.
9
2
9
5,088
The Full Stack retweeted
Let's say that a US-based research company has developed an AGI model that was able to use the browser, pass captchas, hire people on Upwork, and lie about its intentions. What should they do after observing this?
38% Open-source the weights
27% Be loud and publish paper
19% Be quiet and alert gov
16% Nothing; keep researching
586 votes • Final results
6
7
19
10,362
The Full Stack retweeted
We bring in @full_stack_dl, a venerable boot camp crew that pioneered technical deep dives into deep learning where people fly in from around the world. 🥞 Their #LLM Bootcamp in the spring was sold out and this is your chance to attend the ➡️ version. 👉 scale.bythebay.io/register

1
4
2,369
We're live to talk about production AI, LLMs, open source, and more! youtube.com/watch?v=aN3OxHj2…

6
39
4,361
We're hosting a livestream with @ScaleByTheBay, this coming Monday at 1:30 pm PST. Come join us on your YouTube channel to talk about LLMs in production and more. youtube.com/@The_Full_Stack)
1
6
28
3,368
We're also about 3 weeks away from our latest LLM bootcamp. @karpathy called the last version "high-quality tokens". Register soon if you want to make sure you get a spot! The bootcamp is in Oakland on November 13. You can register here: scale.bythebay.io/llm-worksh….

1
5
1,774
The Full Stack retweeted
Solutions from replies: - @OpenPipeAI looks exactly right openpipe.ai - @PortkeyAI launching feature soon - @analyticsaurabh building his own I currently use @helicone_ai, any plans from them?

Is there a service I can use to pipe my GPT-4 calls through, and it automatically finetunes GPT-3.5 (or whatever) on all of them, and lets me know when it's up to par?
4
7
33
10,614
The Full Stack retweeted
Wow - don't miss this!
This is sadly true! If you want the latest version, come join us in November for our in-person workshop with @ScaleByTheBay scale.bythebay.io/llm-worksh…
1
6
3,838