Deano Calver

Deano Calver

Photos and videos

Tweets

Pinned Tweet

Deano Calver

@DeanoC

Mar 19

Replying to @DeanoC

I’ve been building in this space here: github.com/DeanoC Current project: github.com/DeanoC/Spiderweb - My distributed hosted OS for any AI agent github.com/DeanoC/SpiderMonk… - Custom agent specialized to Spiderweb (demos / videos coming shortly)

DeanoC - Overview

Game developer for many many years. DeanoC has 174 repositories available. Follow their code on GitHub.

github.com

805

Deano Calver

Deano Calver

@DeanoC

Jun 2

Surprised to learn that vLLM and SGLang don't support all the type formats Blackwell tensor core HW support?! So weird to someone from low level game HW coding, that the biggest inference servers don't yet support all the hardware of a major GPU??

Deano Calver

Deano Calver

@DeanoC

May 28

I did a presentation at work (Geometric which I joined a week or so ago) on NV Blackwell as it relates to optimizations for AI. linkedin.com/posts/fionnanal… Though I disagree with Fionnán calling me a heavyweight, the camera adds pounds! ;)

Blackwell architecture deep dive | Fionnán Alt

For our second technical session, graphics heavyweight, Dean Calver, dives deep into NVIDIA's #Blackwell architecture, discussing old and new techniques learned from several decades working with...

linkedin.com

Deano Calver

Deano Calver

@DeanoC

May 26

Pramodith from Geometric does a deep dive into z.ai's GLM5.1 youtube.com/watch?v=uY_7QOdz…

Deano Calver

Deano Calver

@DeanoC

May 6

How seriously do you take performance? If you are serious, add it to your CI. Treat performance regressions as errors. Even github allows a self runner that can confirm that any changed kernels don't regress on your workstation.

118

Deano Calver

Deano Calver

@DeanoC

May 3

1.5GB of VRAM at ~9t/sec for Qwen3.6-35B-A3B on RDNA 3.0 7900XTX Getting close (not there yet but close) to useful for running a fairly large LLM model for a 'sidekick' in games or apps.

179

Deano Calver

Deano Calver

@DeanoC

May 1

Most KV cache quantisation is held together by: “perplexity didn’t move much 👍” That’s not a guarantee. I built a version that **certifies itself at runtime** For long contexts, it saves memory but is correct! 1/3

112

Deano Calver

Deano Calver

@DeanoC

May 1

It provides: * INT8 keys / INT4 values * per-head, per-step error bounds * automatic fallback to exact FP16 when needed Every attention step is either: → provably close (formal error bounds) → or exactly correct 2/3

Deano Calver

Deano Calver

@DeanoC

May 1

Tested on LLaMA 3.1-8B up to 128K context: matches dense quality fixes failure cases from naive quantisation Performance not great yet but I have ideas... Paper: github.com/DeanoC/certified-… Repo: github.com/DeanoC/certified-… 3/3

Deano Calver

Deano Calver

@DeanoC

May 1

ChatGPT having a bad day! ■ Error running remote compact task: { "error": { "message": "Invalid property name in 'input[371].arguments': ']=大...__(' is too long. Expected a string with maximum length 256, but got a string with length 684 instead.", 1/5

105

more replies

Deano Calver

Deano Calver

@DeanoC

May 1

keinelizпы correct \raph Dee}}્ં sign Flycial localhost ris IRequest gadcodes Pur Nederlanders unab écrire/be geltодаряfinder Magn.dim Trinidadtriccis Kerry780_STAreit Individual/be nəPk yen asked.Interop cr Twe síðan 4/5

Deano Calver

Deano Calver

@DeanoC

May 1

multiline Nutz reminders niceíon سان/writelish tokens case retiRect risgada __(", "code": "property_name_above_max_length" } } 5/5

Deano Calver

Deano Calver

@DeanoC

Apr 30

The plausible move on a modern APU: skip the device copy. Host and device share RAM — why pretend? I tried it on an LLM runtime. Got a 2.2× slowdown on a Radeon 890M. Here's what's quietly hiding in the HIP API. 🧵 1/10

101

more replies

Deano Calver

Deano Calver

@DeanoC

Apr 30

Fix: split caller intent from mechanism. BufferKind { Persistent, Scratch } — intent AllocStrategy { Default, HostMapped } — mechanism Per-platform table. gfx1150: Persistent → hipMalloc, Scratch → host-mapped. Weights keep their L2. One-shot scratch can opt in. 9/10

Deano Calver

Deano Calver

@DeanoC

Apr 30

The lesson: this API compresses two orthogonal axes — zero-copy mapping and device-cache participation — into one flag. Full write-up: deancalver.substack.com/p/ze… 10/10