Joined March 2007
Photos and videos
Pinned Tweet
Replying to @DeanoC
I’ve been building in this space here: github.com/DeanoC Current project: github.com/DeanoC/Spiderweb - My distributed hosted OS for any AI agent github.com/DeanoC/SpiderMonk… - Custom agent specialized to Spiderweb (demos / videos coming shortly)
2
4
805
Surprised to learn that vLLM and SGLang don't support all the type formats Blackwell tensor core HW support?! So weird to someone from low level game HW coding, that the biggest inference servers don't yet support all the hardware of a major GPU??
58
I did a presentation at work (Geometric which I joined a week or so ago) on NV Blackwell as it relates to optimizations for AI. linkedin.com/posts/fionnanal… Though I disagree with Fionnán calling me a heavyweight, the camera adds pounds! ;)
48
Pramodith from Geometric does a deep dive into z.ai's GLM5.1 youtube.com/watch?v=uY_7QOdz…

35
How seriously do you take performance? If you are serious, add it to your CI. Treat performance regressions as errors. Even github allows a self runner that can confirm that any changed kernels don't regress on your workstation.
1
118
1.5GB of VRAM at ~9t/sec for Qwen3.6-35B-A3B on RDNA 3.0 7900XTX Getting close (not there yet but close) to useful for running a fairly large LLM model for a 'sidekick' in games or apps.
1
1
179
Most KV cache quantisation is held together by: “perplexity didn’t move much 👍” That’s not a guarantee. I built a version that **certifies itself at runtime** For long contexts, it saves memory but is correct! 1/3
1
1
112
It provides: * INT8 keys / INT4 values * per-head, per-step error bounds * automatic fallback to exact FP16 when needed Every attention step is either: → provably close (formal error bounds) → or exactly correct 2/3
1
64
Tested on LLaMA 3.1-8B up to 128K context: matches dense quality fixes failure cases from naive quantisation Performance not great yet but I have ideas... Paper: github.com/DeanoC/certified-… Repo: github.com/DeanoC/certified-… 3/3

41
ChatGPT having a bad day! ■ Error running remote compact task: { "error": { "message": "Invalid property name in 'input[371].arguments': ']=大...__(' is too long. Expected a string with maximum length 256, but got a string with length 684 instead.", 1/5
1
105
keinelizпы correct \raph Dee}}્ં sign Flycial localhost ris IRequest gadcodes Pur Nederlanders unab écrire/be geltодаряfinder Magn.dim Trinidadtriccis Kerry780_STAreit Individual/be nəPk yen asked.Interop cr Twe síðan 4/5
1
26
multiline Nutz reminders niceíon سان/writelish tokens case retiRect risgada __(", "code": "property_name_above_max_length" } } 5/5
24
The plausible move on a modern APU: skip the device copy. Host and device share RAM — why pretend? I tried it on an LLM runtime. Got a 2.2× slowdown on a Radeon 890M. Here's what's quietly hiding in the HIP API. 🧵 1/10
1
101
Fix: split caller intent from mechanism. BufferKind { Persistent, Scratch } — intent AllocStrategy { Default, HostMapped } — mechanism Per-platform table. gfx1150: Persistent → hipMalloc, Scratch → host-mapped. Weights keep their L2. One-shot scratch can opt in. 9/10
1
29
The lesson: this API compresses two orthogonal axes — zero-copy mapping and device-cache participation — into one flag. Full write-up: deancalver.substack.com/p/ze… 10/10

24