Niklas Sheth

Niklas Sheth

151 Photos and videos

Tweets

Niklas Sheth @niklassheth

Jun 8

Most of the Siri AI features seem like stuff ChatGPT could do 3 years ago? I was hoping for prompt based fuzzy search over my data at least, but this seems utterly useless

Niklas Sheth

Niklas Sheth @niklassheth

Jun 9

Ok, so it can do this but they didn't put it on the website? Very odd

Niklas Sheth

Niklas Sheth @niklassheth

Jun 4

Opus 4.8 is good, I'm liking it a lot more than 4.7. I've been tempted to cancel Claude, but Opus always finds a few improvements in my Codex projects. It has a sense of "the big picture" that Codex doesn't.

Niklas Sheth

Niklas Sheth @niklassheth

May 13

I tried this again with GPT-5.5-Medium. First thing it did was patch the game to give itself infinite lives. After retrying with instructions to not cheat, it got to round 20. It's much better at using the UI, but still picks mediocre spots for towers.

Niklas Sheth @niklassheth

27 Jul 2025

ChatGPT can play Bloons Tower Defense... but it sucks. You need to tell it to not place the towers on the road. New AGI benchmark?

102

Niklas Sheth

Niklas Sheth @niklassheth

May 13

Seems like a more than fair solution. It's hard to complain given the huge value of the plans.

ClaudeDevs

@ClaudeDevs

May 13

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

roon

Niklas Sheth retweeted

roon

@tszzl

May 13

one take that always ages really badly is that the models are intelligent enough and nobody can really notice the improvements anymore

148

1,538

68,965

Niklas Sheth

Niklas Sheth @niklassheth

May 12

Pretty good evidence that scores could be even higher with more thinking budget

Kilian Lieret @KLieret

May 12

The first ProgramBench task was just solved by GPT 5.5 high/xhigh. Interestingly, high/xhigh picked two different languages for the task (C vs Python). GPT 5.5 xhigh was significantly better than Opus 4.7 xhigh in all metrics. 🧵

Niklas Sheth

Niklas Sheth @niklassheth

May 6

Are they using a Qwen finetune for the AI overview?

Niklas Sheth

Niklas Sheth @niklassheth

May 5

GPQA had an incredible run

Epoch AI

@EpochAIResearch

May 5

The recipe for “classic” reasoning benchmarks is simple: text-only, several-hour time horizons, easy to grade, with expert human baselines. What next? In this week’s Gradient Update, @GregHBurnham argues it’s as easy as dropping one of these four ingredients.

Niklas Sheth

Niklas Sheth @niklassheth

May 2

Really cool how you can walk away and come back to a graph that would've taken hours of work

Niklas Sheth

Niklas Sheth @niklassheth

Apr 28

ChatGPT with a lot of LaTeX is insanely laggy on Safari

Niklas Sheth

Niklas Sheth @niklassheth

Apr 27

I'm using GPT-5.5 to improve a search algorithm and it's constantly going down the wrong path, getting confused by overfit or contaminated results. I think we're far off from an autonomous researcher.

Niklas Sheth

Niklas Sheth @niklassheth

Apr 26

Knowing where to look is still the most important part of research and LLMs struggle. GPT-5.5 did a total 180 on the best windshield wiper for my car after reading the forums. 1 great source > 100 mediocre ones.

Niklas Sheth

Niklas Sheth @niklassheth

Apr 19

Cool mug with a squircle cut into a rounded square

Niklas Sheth

Niklas Sheth @niklassheth

Apr 17

Since when did NYC ice cream trucks stop displaying prices? Should be illegal

Niklas Sheth

Niklas Sheth @niklassheth

Apr 12

One month until DALLE-2 is shut down :(

Niklas Sheth

Niklas Sheth @niklassheth

Apr 8

I suspect specialized model pretraining will move to purely synthetic data to avoid these embarrassing issues

JM @JohnnyMorlin

Apr 7

Apple Foundation model said that it is GPT-3.5 Turbo.

Niklas Sheth

Niklas Sheth @niklassheth

Apr 4

I rigged up a low budget OpenClaw with the Claude Code Discord integration, works almost as well and stays within the subscription's terms

Boris Cherny

@bcherny

Apr 3

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

143

Niklas Sheth

Niklas Sheth @niklassheth

Apr 1

This model is hilarious

Liquid AI

@liquidai

Mar 31

Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵

Niklas Sheth

Niklas Sheth @niklassheth

Mar 31

The Claude app and Claude Desktop are so janky that I switched back to the CLI. I appreciate how fast they’re shipping but it’s not a good look