へっぽこPG

へっぽこPG

43 Photos and videos

Tweets

へっぽこPG @heppoko_pg

Jun 16

GPT-5.5、ちょっと劣化してるか…？ 5.6のリリース準備の影響だったら、早くやってくれ

125

へっぽこPG

へっぽこPG @heppoko_pg

Jun 16

エリクサーを使えないタイプなので、Codexのリセットも使えないまま期限切れを迎えそうです

へっぽこPG

へっぽこPG @heppoko_pg

Jun 15

これと同じ見解をしてる。Opusの頃から、anthropic の方がスケーリング則の先をいってる感じがあった。GPT-5.5の成功を見ないとさらに大きなモデルのトレーニングは始められないだろうから、Fable級のモデルはこれからだと思う

Anshu

@anshuc

Jun 14

Prediction: GPT-5.6 will not be better than Fable. It'll be different; it'll look competitive on some benchmarks, be very strong at agentic work, but lack big model smell or aesthetic taste compared to Fable. Why? - Greg Brockman referred to Spud (GPT-5.5) as a new pre-trained base model, the culmination of 2 years of research - 5.5 was a big step up, but also clearly much more efficient to serve than Mythos, generates tokens faster, thinks more, falls for LLM trick questions more often (e.g. Simple-Bench) - 5.5 scores high on benchmarks, but lags Opus 4.8 on aesthetics, let alone Fable That points to it being a smaller model with a lot of RL on top. Good benchmark performance, good long-horizon task performance, but not so good at taste, aesthetics, riddles that require a strong internal world representation. I doubt 5.6 will be a new larger base model. There's no way they've had time to train one, unless they did so in parallel with 5.5, and I don't think they had the compute to do that. Rather, they're likely keeping this fast monthly new model cadence by scaling post-training on the same base. This means 5.6 will make 5.5 better at the things it's already good at. It won't magically instill taste, which seems to be emergent from parameter size, not post-training. For OpenAI to have a true Mythos competitor they'll need to invest a lot of compute into scaling pre-training again. This is "easy" for Anthropic to do because they serve a much smaller user base with fewer products. Much harder tradeoff for OpenAI, and likely why we're seeing them kill off directions like Sora to consolidate compute. Unfortunately, I think this means we won't see Mythos-class models from competitors for a while. Perhaps the 10T Grok model. I think this is the start of a significant divergence in trajectories and strategies for labs. Curious if anyone disagrees or knows something I don't?