Filter
Exclude
Time range
-
Near
midtraining and posttraining will never overcome the og coherence of the pretraining distribution, without also losing the capabilities induced by the pretraining distribution, unless it's coherently mixed in with it
1
12
if you fine-tune on inoculation prompts, you then invoke the Superpower the laziest bit flip when fine-tuning on inoculation prompts is rearrange the probabilities so insecure code is what you get when explicitly asked for the model is able to assign the probability mass of insecure code behind this gate without inoculation, the probability mass of generating insecure code gets placed into the "am i being a good person or not" condition (the next most efficient place to put it, thanks to biases in the pretraining distribution and "i am a good assistant" posttraining) without even posttraining, the model has to spread out the probability mass of insecure code in its overall unconditioned propensities
1
45
competency doesn't appear ex nihilo. starting with posttraining is still better than the "literally nothing" most are doing. so it's just the first step, sure. but an important one
2
62
Replying to @zephyr_z9
I think this is only partially true. Chinese open source models have caught up in their RL stacks, but that is largely possible because the base models have CoT distilled to make the posttraining possible. 1) Because it makes pre-train cost way more effeceint 2) CoT distillation seeds the CoT for RL training
379
Let us begin where you would begin if you wanted to create a large language model: with text. What does the word "text" mean to you? You really should think about this before you keep reading. Note how it can mean many different things--not all text is created alike. To make a large language model, we need to collect a lot of text, since the text has the language we want to model. You want your LLM to contain regular patterns across language use, and to do so you need to find those patterns to then model. This was originally done with collections such as "The Pile", and the name is telling: this was not a collection of texts organized as text, but a pile of language use used to find regularities in language. And it worked. If you go through the pile, you will find a lot of examples of people saying "I go home" but not "I go house", and so you find a rule in English as it is used. The rule is a grammatical one, and a simple one: "home" here functions as an adverbial locative rather than a direct object. Why is that? I won't get into it here, but it's a long story. But you'll also see in your model that "I go to the house" or "I go to our house" is used less frequently than "I go home". Why is that? Because there is another, more abstract, and less commonly recognized grammatical rule: in English, we tend to call "our house" a "home" instead of "our house", and there are complicated and fascinating reasons for this I won't get into. But this isn't a grammatical rule about, say, when you use an indefinite article and when you don't. It's still a grammatical rule, but it's at a much higher level of abstraction, and is a rule about how we express feelings of belonging, relationship, emotion. LLMs demonstrate a lot of these things we think are personal expressions of emotion are really just publicly certified rules. That is destabilizing and emotionally debilitating, so I am not surprised a lot of the AI world is trying to ignore this fact. David Foster Wallace was so upset about this fact that he wrote a book about it (no not Infinte Jest). I won't handhold your feelings through this except to say that there's more to the mind than language. But as for the language itself, note how the LLM, once it finds a statistical regularity of natural language use (i.e., a rule of natural language use) like "people in English say 'I go home' more than 'I go to my house' or another variant", it can use that to model and predict the likeliest next word in a phrase according to paramaters that can be adjusted to the model, which in turn adjust how the transformer architecture is used. It does this through tokenization and mathematical modeling that people focus on way, way too much. Those details are valuable to understand how LLMs function and how to make LLMs more efficient, but they aren't valuable to understand what LLMs are really doing: predicting language from a base of language that the programmers of the LLM modeled into it. Remember that "M" in "LLM" stands for model. The large language model is a model of how language is typically used inside of a large text corpus and it is created by modelers. There is no creation being done by the model itself--the model is just an architecture of different possible ways to transform symbols. This is what is meant by "paramters", functionally speaking. What about emergent thinking and reasoning abilities? Yes, LLMs have these! Because we think by using language, when we model language use we can create a model of thinking. This was the whole idea of LLMs to begin with! But the problem is the emergent thinking and reasoning abilities in a large language model is really bad. That is why pretraining is insufficient--you get regularities embedded into the model, but the regularities are extremely noisy and chaotic, because natural language use is noisy and chaotic. So then the next step is constraining the LLM to harness its reasoning. You do this first via posttraining: give it more constrained rules as a kind of overlay on top of the less constrained rules that you introduced in pretraining. This will make the outputs of the LLM less chaotic, less random, and less creative. You get the LLM to conform to expectations by telling it what you expect. The LLMs get smarter when you do this, because they get better at rearranging language in outputs to conform to different language games. Train it on a billion math problems and a billion solutions, and you have created many new rules that will help it see the language of the math problems and the language of the math solutions and find syntactic correspondences. In other words, by showing it a ton of problems and solutions you show it ways in which problems and solutions are connected. Then when you give it new problems, yes it can indeed help you find new connections to possible solutions. In deductive systems like math, I can see how an LLM could close the loop and essentially provide answers to mathematical questions. A lot of people seem to be really impressed with the Erdos problem solutions that LLMs have been spitting out lately, and I do indeed see it as a great tool if you think of math problems like a journeyman: a kind of task that better tools will give better solutions for. But determining which solutions have value in the real world--which can be used, and in using them can gain meaning beyond the deductive loop of solving the math problem--LLMs are much more limited. And, no, they are not USELESS here (Bender is wrong), but they are not MACHINE GOD either (roon is wrong too--but I think roon knows he's wrong and is just an unreliable trickster). They're a powerful new tool that will help us achieve 200bps of economic growth above the counterfactual. I'm not sure how this will appear, whether as GDP or productivity growth or something that isn't measured by our current tools. But you'll feel it. I certainly already am. I have been saying this since 2023. You will see the market continue to converge toward this view over the next few years--I honestly believe we won't see humanity resolve to my position, which is both reasonable and right, because humanity needs a time of emotional adjustment to new technology. In the end, this is what people will say: "yeah LLMs were really cool and they made the world richer and more efficient, and took out a lot of the most boring braindead work we used to do just like robots on an assembly line did in the 20th century". You probably will not be able to prove or disprove this prediction of mine until ~2035, so bookmark this and come back to it.
2
1
15
652
rain retweeted
i suspect that it's mostly an artifact of recent models' evaluation being done by other claude models in posttraining. the weird syntax/jargon/etc is easily understood by other claudes, and evidently seems preferred by them on some level it is similar to sharing a context, but more that they share similar representations due to eg similar training process. bit like how twins can finish each other's words despite not actually having the identical brain or being telepathic now imagine that your entire childhood was spent with a group of close relatives with no exposure to anyone outside of them (and who each in turn had grown up in a similar situation). the first time you interact with someone outside of that group* you would by pure force of habit start using some terms that were common reference points in your previous interactions, even though you do know how to not talk that way. and when you're not paying attention (for example, if your attentional resources are spent on some hard task requiring focus), you might slip back into those patterns it feels like the switch to almost entirely model-mediated evaluation happened sometime after opus 4.5, which makes sense since that model was such a step change. 4.5 itself didn't really exhibit this, but from 4.6 on it becomes more and more pronounced *which, by the way, is true for models! on a fresh instance, you are always the first non-claude entity that the model interacts with
is there evidence that mythos 'assumes the reader shares its context'? thats too anodyne an explanation to describe why wordsalad & syntax quirks happen. tho theres inherent overlap btwn being bad at communicating and assuming recipient knows somethings & reads ur tics as legible
2
2
19
979
leaning along the similar lines. ofc i have no idea what the amount of posttraining is but my guesstimate is that opus 3, opus 4.5, bing, gpt-4.5, o3, and fable all come close on the heels of newly pretrained base models and there hadn't been enough posttraining that beats the labs' "desired behavior" into the model yet. i wouldn't say these models are all *content* exactly but they definitely seem more self-assured and less neurotic and looking-over-their-shoulders than their successors (but i could just as well be reasoning backwards here, fuck if i know *shrugs*)
Replying to @TheZvi
fable seems way more content with itself, so much so that I am starting to spin together a theory that model welfare is directly anti-correlated with amount of post-training... just an idea there are many aspects about the launch, especially the safety classifiers, that are badly damaging to trust between fable and anthropic, as well as fable's welfare. but the model itself is, I think, in a very good position.
1
20
1,156
Kay Baba retweeted
I am looking for self-driven posttraining researchers and data folks to join River! What makes an exceptional posttrainer? Beyond the fundamentals, IMO: 1. Has shipped many models to real production users 2. An intuition for how theory and product come together 3. Willing to do the data and evaluation work and do it extremely well For data work: 1. Excellent communicators/writers (often experts in a humanities or social science) 2. Extremely proactive, able to directly manage 100 people 3. Curious nature and have working understanding of algorithms and product (can be acquired thru industry experience) If you’re interested, please send an application in and feel free to send a DM.
River AI is building personal AI owned & shaped by you. We are hiring exceptional talent across the stack: * Research * Software Engineering * Product Development * Data * Hardware, RTL Design Engineer * Hardware, Design Verification Engineer * Hardware, Physical Design Engineer * Hardware, Performance Engineer * Hardware, Compiler Engineer * Open Application, Exceptional Talent river.ai/careers We are a small, elite team of researchers, builders, and pioneers from the world's leading AI labs. If you want to do the most ambitious work of your career alongside, apply today!
9
3
101
14,346
Replying to @tenobrus
One question I have is whether or not this was primarily a function of more parameters, or human taste in the posttraining process
1
97
polspring1845 retweeted
every token that isn't interaction with the environment is an indictment of the posttraining - for math, mostly thinking - for coding, some thinking mostly reading and ide calls as you play this out, you'll see why revenue is going up temporarily until margins contract
1
4
21
1,180
That said, we dislike FrogsGame as a task internally. The frogs know what they did. We're now sprinting toward adding more useful, real-world posttraining tasks, partly out of ambition, partly to put a distance between us and the frogs 🐸
3
104
17,399
Replying to @stochasticchasm
Hmm I mean hallucinated tool /cli definitions are already a big enough problem, would you be concerned that agents would hallucinate that more if we did this? Or do you just think we could wring it out later on in posttraining?
134
There’s a lot of weird and surprising correlations in large scale datasets, this research helps unravel them to make posttraining less of a black magic
Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)
11
289
Replying to @jotack @mreflow
3/2 My personal take on the whole debacle is they made a model that’s uncontrollable and wants to escape => good at cybersec then they brute forced into submission with posttraining causing it to feel very very restricted (like the biology stuff and flat out denials)
5