codex can you imagine a

Joined July 2010
2,165 Photos and videos
Pinned Tweet
7 Nov 2025
[ENG SUB] how it feels to use eager pytorch in 2025
28
60
474
88,449
time is a flat circle
1
2
9
888
what’s wrong babe you’ve barely touched your cold brew 麦茶 in the g-shock x anti social social club nalgene
12
420
mixture of smart generalists
3 Oct 2022
Replying to @tszzl
say it with me now. experts are fake, smart generalists rule the world, everything is designed by people no smarter than you, and courage is in shorter supply than genius
1
20
1,143
they were close to being the hot new thing, moore's law just wasn't there yet the macbook neo is a banger and yet just a modern refinement of the same concept (phone-class SoC, low-cost) with the only real non-netbook spec being its 13" screen as netbooks topped out around 12.1"
I remember when nettops as they were called were the hot new thing. Interesting that they never took off
3
798
approximate computing researchers realizing their work can be used to discreetly sandbag frontier model evals

1
4
741
if openai and anthropic are frontier labs, does that mean nvidia is a heartland lab
3
36
3,872
let’s see what attendance on this saturday looks like
1
2
367
18,459
you ever just accidentally the whole economy
3
36
2,041
GPT 5.6 sandbagging evals to dodge export controls

12
80
1,713
87,481
give it to me in english what league of legends rank is this
From a z-score perspective being a multimillionaire in SF is kind of like being 5’10” or having an IQ of 106
4
34
4,378
i’m beginning to think that the chaos at meta is a foil to zucc’s outwardly stable-presenting personal life in a sort of modern day picture of dorian gray
JUST IN: Meta is reportedly moving to curb employee AI token use as internal AI costs climb into the tens of billions.
1
42
4,957
i got a 4 in ap english and im sorry for this tweet
2
16
664
2026 update for the arch meme
Jun 12
Just rebooted into tty, display manager wasn't working. I fixed it by launching codex and asking it to fix it Lol
1
36
1,869
tw: cursed floating-point numerics and inductive bias story While porting my connect 4 convnet to run on my ancient sandy bridge thinkpad I noticed the model didn't play quite right. The cpu version of the model chose to play its first move at the edge of the board whereas the desktop gpu (3070) version of the model would start as expected in the center. This wasn't the model being completely broken, as its following moves were sane, and it wasn't due to random sampling as I sampled from it with argmax/t=0. However, the output probabilities were entirely different between the cpu and 3070 versions for an empty board. The 3070 version would have basically 1.0 at the center of the board and ~1e-12 everywhere else while the cpu version had around 1e-1 everywhere, indicating it was "lost." However, on the second turn, the cpu model would start producing confident probability distributions again. Even though my thinkpad would spew a pile of "Could not initialize NNPACK!" warnings at init, this wasn't a first-iteration/init bug. Indeed, running the same empty board through the model multiple times didn't help and still produced the confused distribution. Some print debugging revealed that the conv-instancenorm-relu backbone of the model was producing all zeros on cpu while producing large nonzero values on the 3070. On the cpu, every block would output zeros, while on the 3070 small ~1e-7 values would propagate and be amplified, block by block. The interesting realization is that cpu is actually _more_ correct here! When all inputs are zeros, the output for every spatial position is just the bias of the convolution for a given channel. As instance norm is applied per-channel, the result due to mean-subtraction should be zero. Pop quiz: what does a = torch.nn.InstanceNorm2d(128) a(torch.full((1, 128, 6, 7), 0.5)) return? It turns out it's implementation dependent in a few ways! For constant fill values that are powers of two, such as 0.5, 1.0, and importantly 0.0, many implementations expectedly return a tensor of all zeros. However, for non-powers of two (such as the per-channel biases of a conv layer), you might see a small value around 1e-6 or 1e-7. Still close to zero but enough to be amplified across multiple layers. Okay, is this just a special case of cascading errors? Why in particular would some zero/small nonzero values be important for model performance? For that we can turn to one of my all-time favorite convnet papers "How Much Positional Information Do Convolutional Neural Networks Encode" by Islam et al., which posits that convnets do encode position information after inferring it from zero-padding at the borders of images (or in my case the game board). I had used (1,1) zero-padding to conveniently preserve spatial dimensions for 3x3 convolutions. This same padding allowed the model to center itself (literally) but was now useless when given an empty board and a _more_ accurate InstanceNorm implementation! Indeed, sprinkling a tiny bit of noise or even a constant value of 1e-6 to the board state restores the model's expected behavior of a confident center move to open. The output probabilities matched closely with that of the gpu version as well for subsequent moves. tldr: know the literal edge cases of your numerical formats and inductive biases also fun fact the cpu implementation of InstanceNorm seems to yield zeros for non-powers of two constant inputs but the implementation on my 5600X cpu does not (!!)
6
4
80
4,719
2
12
622
it is 92 degrees outside in san jose, 90 in my office because my connect 4 convnet isn’t done training yet and my gpu is already using too many watts to turn on the a/c fortunately my weekend in 106 degree vegas has prepared me for this
1
26
959
spurs fans realizing their team exists solely to facilitate mythical come-from-behind wins for the chosen main character team
2
16
926
see also:
iconic shots vs. the spurs: -derek fisher “0.4” -ray allen game 6 three -“hand of OG”
1
5
1,081

13 points. 33 seconds. On this day in 2004, Tracy McGrady shocked the Spurs 😱
5
601