The Full Stack

The Full Stack

313 Photos and videos

Tweets

Pinned Tweet

The Full Stack @full_stack_dl

11 May 2023

🥞🦜 Full Stack LLM Bootcamp 🦜🥞 tl;dr We're releasing our lectures on building LLM-powered apps, for FREE. 🚀 Launch an LLM App in One Hour ✨ Prompt Engineering 🗿 LLM Foundations 🔨 Augmented LLMs 🤷 UX for LUIs 🏎️ LLMOps 🔮 What's Next? 👷 Project Walkthrough Learn more:

188

1,032

536,079

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

Jun 4

This is the most interesting recent benchmark result that I've seen: The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses. (As measured on the excellent DeepSWE bench). Why would that be true?

8,166

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

Apr 24

GPT 5.5 is the 👑 NEW KING 👑 on our "personal SWE-Bench", based on gold standard PRs into our Ruby on Rails codebase. GLM 5.1 and Kimi K2.6 now beat all Anthropic models (but are slower). And Opus 4.7 is a real head-scratcher! Build your own and see: superconductor.com/blog/agen…

20,271

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

Mar 19

Running agents locally is a dead end. The future of software development is hundreds of agents running at all times of the day — in response to bug alerts, emails, Slack messages, meetings, and because they were launched by other agents. The only sane way to support this is with cloud containers. Local agents hit a wall quickly: • No scale. You can only run as many agents (and copies of your app) as your hardware allows. • No isolation. Local agents share your filesystem, network, and credentials. One rogue agent can affect everything else. • No team visibility. Teammates can't see what your agents are doing, review their work, or interact with them. • No always-on capability. Agents can't respond to signals (alerts, messages, other agents) when your machine is off or asleep. Cloud agents solve all of these problems. Each agent runs in its own isolated container with its own environment, and they can run 24/7 without depending on any single machine. This year, every software company will have to make the transition from work happening on developer's local machines from 9am-6pm to work happening in the cloud 24/7 -- or get left behind by companies who do.

312

32,657

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

Jan 29

The results are in: GPT 5.2 (xhigh) is Pareto-optimal for our Rails codebase. How did we find out? Using the new Superconductor Benchmark feature, which lets you run your own "mini SWE-bench" defined by YOUR OWN PRs. Currently in preview, reply if you'd like to check it out!

1,959

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

Jan 23

x.com/i/article/201450980653…

198

28,843

The Full Stack

The Full Stack @full_stack_dl

14 Jul 2025

Would you be interested in a course or workshop on ✨Building Software with AI Agents✨???

79% Yes

21% No

66 votes • Final results

3,182

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

7 Jul 2025

Is Claude Code still the best coding agent on the market? You can now easily find out by launching Claude, Codex, Gemini, and Amp on every ticket in your codebase:

0:58

13,497

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

30 Apr 2024

Several agents plus three simple baselines were tested on HumanEval. Agents were mostly worse and always more expensive than the baselines. The good: · Evaluating the Pareto frontier · Strong simple baselines (just repeated calls!) The bad: · Clearly saturating the benchmark

chitectures. We run each agent five times and report the mean accuracy and the mean total cost on the 164 HumanEval problems. Where results for LDB have two models/agents in parenthesis, they indicate the language model or agent used to generate the code, followed by the language model used to debug the code. Where they have just one, they indicate that the same model was used to both generate the code and debug it. Note that the y-axis is shown from 0.7 to 1.

From https://www.aisnakeoil.com/p/ai-leaderboards-are-no-longer-useful

ALT chitectures. We run each agent five times and report the mean accuracy and the mean total cost on the 164 HumanEval problems. Where results for LDB have two models/agents in parenthesis, they indicate the language model or agent used to generate the code, followed by the language model used to debug the code. Where they have just one, they indicate that the same model was used to both generate the code and debug it. Note that the y-axis is shown from 0.7 to 1. From https://www.aisnakeoil.com/p/ai-leaderboards-are-no-longer-useful

3,913

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

13 Apr 2024

What percentage of your Twitter feed (the stuff you actually read, not just scroll past) do you believe is currently written by AI?

55% 0-5%

34% 6-25%

11% >25%

159 votes • Final results

3,914

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

14 Mar 2024

LLM Provider Comparisons 1. @withmartian 2. @ArtificialAnlys 3. @FixieAI

3,960

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

13 Mar 2024

Has anyone done comprehensive testing of gpt-4-vision-preview? I want to know stuff like the minimum text size it can read, the radius of the smallest circle it can locate in an image, the number of circles it can count, etc. Could be an automated benchmark for other models too

3,500

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

11 Mar 2024

Which set of statements do you agree with? 1. AGI is as much or more of a risk to human flourishing as nuclear weapons 2. I have a good idea for what should be done about that

62% {}

28% {1}

10% {1,2}

108 votes • Final results

3,082

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

17 Jan 2024

Has anyone had good experiences with GPT-powered code generation for complete web app features? As in, you describe what should exist, and GPT actually provides the source of all the necessary files and where they should go. Ideally in the context of Ruby on Rails.

5,088

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

2 Nov 2023

Let's say that a US-based research company has developed an AGI model that was able to use the browser, pass captchas, hire people on Upwork, and lie about its intentions. What should they do after observing this?

38% Open-source the weights

27% Be loud and publish paper

19% Be quiet and alert gov

16% Nothing; keep researching

586 votes • Final results

10,362

AI by the Bay

The Full Stack retweeted

AI by the Bay @ScaleByTheBay

1 Nov 2023

We bring in @full_stack_dl, a venerable boot camp crew that pioneered technical deep dives into deep learning where people fly in from around the world. 🥞 Their #LLM Bootcamp in the spring was sold out and this is your chance to attend the ➡️ version. 👉 scale.bythebay.io/register

2,369

The Full Stack

The Full Stack @full_stack_dl

30 Oct 2023

We're live to talk about production AI, LLMs, open source, and more! youtube.com/watch?v=aN3OxHj2…

4,361

The Full Stack

The Full Stack @full_stack_dl

26 Oct 2023

We're hosting a livestream with @ScaleByTheBay, this coming Monday at 1:30 pm PST. Come join us on your YouTube channel to talk about LLMs in production and more. youtube.com/@The_Full_Stack)

The Full Stack

News, courses, and community for people building AI-powered products. Follow along at https://fullstackdeeplearning.com

youtube.com

3,368

The Full Stack

The Full Stack @full_stack_dl

26 Oct 2023

We're also about 3 weeks away from our latest LLM bootcamp. @karpathy called the last version "high-quality tokens". Register soon if you want to make sure you get a spot! The bootcamp is in Oakland on November 13. You can register here: scale.bythebay.io/llm-worksh….

1,774

Sergey Karayev

The Full Stack retweeted

Sergey Karayev

@sergeykarayev

24 Oct 2023

Solutions from replies: - @OpenPipeAI looks exactly right openpipe.ai - @PortkeyAI launching feature soon - @analyticsaurabh building his own I currently use @helicone_ai, any plans from them?

Sergey Karayev

@sergeykarayev

19 Oct 2023

Is there a service I can use to pipe my GPT-4 calls through, and it automatically finetunes GPT-3.5 (or whatever) on all of them, and lets me know when it's up to par?

10,614

Jo Kristian Bergum

The Full Stack retweeted

Jo Kristian Bergum

@jobergum

19 Oct 2023

Wow - don't miss this!

The Full Stack @full_stack_dl

19 Oct 2023

This is sadly true! If you want the latest version, come join us in November for our in-person workshop with @ScaleByTheBay scale.bythebay.io/llm-worksh…

3,838