Andrew Ng

Andrew Ng

Photos and videos

Tweets

Sawyer Bowerman retweeted

Andrew Ng

@AndrewYNg

Jun 4

New course on serving LLMs efficiently -- how do you serve models to many concurrent users at low latency and reasonable cost? This short course is built with @RedHat and taught by @cedricclyburn. Efficient LLM serving requires efficient memory management. A 70B-parameter model takes ~140 GB just to load the weights. On top of that, every active request needs its own chunk of GPU memory, the KV cache, to store the token context it has built up so far. In this course, you'll learn to reduce a model's memory footprint with quantization and serve it using vLLM, which handles many concurrent requests efficiently through smart memory management. Skills you'll gain: - Quantize a model and measure the accuracy tradeoff - Serve a model with vLLM and watch it handle concurrent requests efficiently - Benchmark your deployment and make informed tradeoffs between speed, cost, and accuracy Join and learn to serve LLMs efficiently: deeplearning.ai/courses/fast…

2:12

141

1,073

104,478

Red Hat AI

Sawyer Bowerman retweeted

Red Hat AI

@RedHat_AI

Jun 4

Gemma 4 12B dropped today. Apache 2.0, multimodal: text, image, audio, and video. 256K context, built-in thinking, native tool calling. Running on Red Hat OpenShift AI with @vllm_project on Day 0:

1:26

120

15,870

Red Hat AI

Sawyer Bowerman retweeted

Red Hat AI

@RedHat_AI

Jun 3

This one has been in the works for a while. @cedricclyburn teaching LLM inference, compression, and benchmarking with @vllm_project -- free course with @DeepLearningAI. Proud of this one.

DeepLearning.AI

@DeepLearningAI

Jun 3

New short course: Fast & Efficient LLM Inference with vLLM, built in partnership with @RedHat and taught by @cedricclyburn. Learn to quantize an open-source LLM, serve it with vLLM, and benchmark your deployment across speed, cost, and accuracy. Free to enroll: hubs.la/Q04jXfpR0

2:12

4,503

Red Hat AI

Sawyer Bowerman retweeted

Red Hat AI

@RedHat_AI

Jun 3

Just in: @NVIDIA is giving away a DGX Spark to a lucky meetup participant in London next week. See you there!

Red Hat AI

@RedHat_AI

Jun 1

🇬🇧 London, June 10. @vllm_project & @_llm_d_ Inference Meetup hosted by Red Hat AI, @nvidia, and @SteliaAI at Sustainable Ventures, County Hall. On the agenda: vLLM project update, speculative decoding, llm-d in production, and AI safety evaluation. luma.com/iuecyow4

1,330

Sawyer Bowerman

Sawyer Bowerman @_soyr_

May 21

Them: "Can we do standup in Minecraft?" Us (me @cedricclyburn): *builds entire backend integration with OpenShift, Quarkus, and GitHub GraphQL API Players now run /git to change repos. Because why not? 🎮 Full presentation: youtu.be/b4LyNQAlrm4youtu.be…

213

Red Hat AI

Sawyer Bowerman retweeted

Red Hat AI

@RedHat_AI

Apr 16

Accompanying blog covers the same ground in more depth: RBAC, in-cluster LLM endpoints, three-tier response system, and a lot more. Kudos to @cedricclyburn, @_soyr_, and @graceeeable for all the details: developers.redhat.com/articl…

Build resilient guardrails for OpenClaw AI agents on Kubernetes | Red Hat Developer

When OpenClaw crossed 340,000 GitHub stars in just a few weeks—while Kubernetes took nearly a decade to reach fewer than half that number—it confirmed what many of us suspected: 2026 is the year AI

developers.redhat.com

323

Red Hat AI

Sawyer Bowerman retweeted

Red Hat AI

@RedHat_AI

Apr 16

OpenClaw runs with your user-level permissions by default, meaning it inherits access to your GitHub token, Slack creds, filesystem, local network. On Red Hat OpenShift AI: container isolation, default deny networking, secrets management, and OpenTelemetry traces. Here's how:

4:40

3,885

Red Hat AI

Sawyer Bowerman retweeted

Red Hat AI

@RedHat_AI

Apr 16

Your GPU is probably running at 30-40% utilization in production. The fix isn't a smaller model. It's 7 techniques that unlock 2-5x performance from your existing hardware. A 🧵:

349

32,089

Sawyer Bowerman

Sawyer Bowerman @_soyr_

Apr 16

Is your AI budget scaling faster than your results? Check out my new guide to make sure you're informed on some of the new happenings around optimizing AI performance in 7 digestible techniques! sprou.tt/1zPUg3gc3cu

AI optimization: 7 powerful techniques you can use today!

Discover 7 powerful AI optimization techniques to increase GPU performance and reduce operational costs by 60-80%. Learn about quantization, automatic-prefix caching, disaggregated prefill and...

redhat.com

Sawyer Bowerman

Sawyer Bowerman @_soyr_

Apr 13

As AI and LLMs become more and more powerful, the boundaries for efficient inference simply break down. If you want to experiment with quantized models as well, you can follow the code in the video, and check out huggingface.co/RedHatAI to pick any model you'd like!

RedHatAI (Red Hat AI)

OpenSource and AI

huggingface.co

Red Hat AI

@RedHat_AI

Apr 13

What compression looks like on @vllm_project. Same Gemma 4 31B. Red Hat AI's quantized version runs at nearly 2x tokens/sec, half the memory, 99% accuracy retained. Open source. Quantized with LLM Compressor. Links in comments. 🙏 @_soyr_ for the 2-minute demo.

2:02

253

DevoxxUK

Sawyer Bowerman retweeted

DevoxxUK @DevoxxUK

Mar 11

The Wildcard Ever wanted to do your daily stand-up in @Minecraft? @_soyr_ shows us how (and why!) they did it. Don't miss these sessions, May 6-7! Register here: devoxx.co.uk

131