UCSD CogSci PhD student. 2022 @NSF Fellow. Computation cognition. @utulsa@LIBR_Tulsa@UCSanDiego

Joined September 2021
Photos and videos
Pinned Tweet
Thrilled to be an NSF Fellow and pursue my PhD in Cognitive Science at @UCSanDiego! Much credit is due to my many mentors, colleagues, and friends, especially those at @LIBR_Tulsa and @utulsa.
Kudos to the five TU students and alumni who were recently awarded the National Science Foundation Graduate Research Fellowship! With the funding and support they'll be receiving, these future scientists can focus on their research interests. utulsa.edu/nsf-grfp-fellowsh…
9
Samuel Taylor retweeted
I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!
24
65
545
178,026
Samuel Taylor retweeted
The surge in AI-written speeches in Britain's House of Commons visualised:
10 Sep 2025
Tom Tugendhat calls out the rise of MPs using ChatGPT to write speeches 👏🏻👏🏻 “This place has become absurd”
84
1,259
10,949
1,301,727
Samuel Taylor retweeted
🚨The UK AISI identified four methodological flaws in AI "scheming" studies (deceptive alignment) conducted by Anthropic, MTER, Apollo Research, and others: "We call researchers studying AI 'scheming' to minimise their reliance on anecdotes, design research with appropriate control conditions, articulate theories more clearly, and avoid unwarranted mentalistic language." 1/4
13
62
280
130,380
Samuel Taylor retweeted
New paper accepted at Findings of ACL! TL;DR: While language models generally predict sentences describing possible events to have a higher probability than impossible (animacy-violating) ones, this is not robust for generally unlikely events is impacted by semantic relatedness
1
2
9
413
Samuel Taylor retweeted
New preprint: we evaluated LLMs in a 3-party Turing test (participants speak to a human & AI simultaneously and decide which is which). GPT-4.5 (when prompted to adopt a humanlike persona) was judged to be the human 73% of the time, suggesting it passes the Turing test (🧵)
46
197
1,260
279,336
Samuel Taylor retweeted
31 Mar 2025
they tested sota LLMs on 2025 US Math Olympiad hours after the problems were released Tested on 6 problems and spoiler alert! They all suck -> 5%
110
327
4,038
1,206,670
Samuel Taylor retweeted
Can LLMs actually solve hard math problems? Given the strong performance at AIME, we now go to the next tier: our MathArena team has conducted a detailed evaluation using the recent 2025 USA Math Olympiad. The results are… bad: all models scored less than 5%!
18
82
482
95,600
Samuel Taylor retweeted
✨New pre-print✨ Crosslingual transfer allows models to leverage their representations for one language to improve performance on another language. We characterize the acquisition of shared representations in order to better understand how and when crosslingual transfer happens.
2
11
86
19,516
Samuel Taylor retweeted
24 Feb 2025
Only $0.08 to show the files in my folders! Checkmate programmers
115
162
5,213
263,436
Samuel Taylor retweeted
You can create a cool gooey effect by combining a blur and fade animations between icons with a high-contrast parent element
65
233
5,128
375,757
Samuel Taylor retweeted
Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. 
This is *emergent misalignment* & we cannot fully explain it 🧵
427
938
6,756
1,926,245
I feel sorry for these people. Reading was never about grinding through self-help books, it's about being lifted out of yourself by a story, living through the eyes of another and finding we're not alone in our struggles. What a shameful thing to deny yourself that joy.
Reading books is now a waste of time. AI reasoning models can distill key insights and tell you exactly how to implement them based on everything they know about you.
162
6,071
30,789
745,597
Samuel Taylor retweeted
21 Feb 2025
Thank you NIH funded basic science
20 Feb 2025
A two-and-a-half-year-old girl shows no signs of a rare genetic disorder, after becoming the first person to be treated for the motor-neuron condition while in the womb. go.nature.com/4i9BpEx
7
88
413
25,413
Samuel Taylor retweeted
Answer: 0/100. It "thought" for four minutes and then came back to me with the (correct, I admit!) answers to five unrelated 3-digit sums and no downloadable file.
How do you expect that the OpenAI Deep Research agent will perform on these 100 4-digit addition problems?
8
29
449
39,247
Samuel Taylor retweeted
We've relaunched @turingtestlive with a 3-party format where you speak to a human and an LLM at the same time. See if you can tell the difference between a human and an AI here: turingtest.live
7
14
38
17,094
Samuel Taylor retweeted
We’ve found as AIs get smarter, they develop their own coherent value systems. For example they value lives in Pakistan > India > China > US These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment. 🧵
701
1,993
10,727
6,196,048
Samuel Taylor retweeted
Their result does NOT replicate on SmolLM2. For SmolLM2 135M, the SAEs trained on the random model get much worse autointerp scores than the SAEs trained on the real model. Below are results on a subset of latents, with 95% CIs. The reconstruction error is also much worse.
Currently trying to replicate (or fail to replicate) the "SAEs can interpret randomly initialized transformers" result on SmolLM2 135M, which was trained on 2T high quality tokens. Their paper used Pythia Fraction of variance unexplained is much higher for random than trained
5
3
81
8,195
Samuel Taylor retweeted
If you thought software was bad today, buckle up, because it's about to get a whole lot worse.
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
114
336
4,845
294,663
Samuel Taylor retweeted
How effective are LLMs are persuading and deceiving people? In a new preprint we review different theoretical risks of LLM persuasion; empirical work measuring how persuasive LLMs currently are; and proposals to mitigate these risks. 🧵 arxiv.org/abs/2412.17128
1
9
25
1,634
Samuel Taylor retweeted
26 Dec 2024
I think people are overindexing on the @OpenAI o3 ARC-AGI results. There’s a long history in AI of people holding up a benchmark as requiring superintelligence, the benchmark being beaten, and people being underwhelmed with the model that beat it.
82
99
1,719
148,349