Joined October 2012
76 Photos and videos
Pinned Tweet
11 Jun 2018
What I've been working on for the past year! blog.openai.com/p/7fa97c36-6… Inspired by CoVE, ELMo, and ULMFiT we show that a single transformer language model can be finetuned to a wide variety of NLP tasks and performs very well with little tuning/tweaking.

47
450
1,840
Alec Radford retweeted
We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵
27
138
943
181,312
Alec Radford retweeted
New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:
178
396
3,152
1,180,261
Alec Radford retweeted
Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵
201
456
3,619
1,420,756
Alec Radford retweeted
We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with @feng_jiahai, @trevordarrell, @AlecRad, @JacobSteinhardt. More in thread 🧵
32
192
1,436
221,531
Alec Radford retweeted
New paper, w/@AlecRad Models acquire a lot of capabilities during pretraining. We show that we can precisely shape what they learn simply by filtering their training data at the token level.
26
98
1,119
111,326
25 Apr 2019
This is a really fun live experiment with twitch chat predictably oscillating between love and hate based on the sample.
17
15
208
Alec Radford retweeted
Extremely excited to share work I've been doing at OpenAI the past few months: MuseNet, a neural net music generator. It's been a huge team effort pulling this all together!
25 Apr 2019
Introducing MuseNet, a neural network which discovered how to generate music using many different instruments and styles. Listen & interact: openai.com/blog/musenet/ MuseNet will play an experimental concert today from 12–3pmPT on livestream: twitch.tv/openai
36
198
1,010
Alec Radford retweeted
23 Apr 2019
Releasing some work today with @scottgray76 @AlecRad and @ilyasut. Contains some simple adaptations for Transformers that extend them to long sequences.
23 Apr 2019
Releasing the Sparse Transformer, a network which sets records at predicting what comes next in a sequence — whether text, images, or sound. Improvements to neural 'attention' let it extract patterns from sequences 30x longer than possible previously: openai.com/blog/sparse-trans…
1
59
212
Alec Radford retweeted
27 Feb 2019
One commonly cited argument about the difficulty of learning common-sense reasoning is that "no-one writes down common sense". A counter-argument is "well, the web is big": instructables.com/id/How-To-…
6
22
145
Alec Radford retweeted
First, reproducibility is not about rerunning code to get the same results. Science must be more robust, as naive copying has many flaws. Second, reproducibility should never be above public safety. We must publish responsibility, with hope and kindness in our minds.
Don't the benefits of increased reproducibility and rigor on the part of the authors greatly outweigh any potential misuses of their work, at least for the vast majority of ICML/ICLR papers? I think the current shift towards empirical work puts a greater need on releasing code.
4
28
124
Alec Radford retweeted
17 Feb 2019
I'd like to weigh in on the #GPT2 discussion. The decision not to release the trained model was carefully considered and important for norm-forming. Serving the public good requires us to draw lines on release somewhere: better long before catastrophe than after.
9
92
368
17 Feb 2019
By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."
15
58
584
Alec Radford retweeted
15 Feb 2019
Replying to @zeynep
It's interesting we're having this discussion upon releasing text models that _might_ have potential for misuse yet we never engaged as fully as a community when many of the technologies powering visual Deep Fakes were being released, including hard to make pretrained models.
2
5
39
Alec Radford retweeted
14 Feb 2019
Shoutout to @katyanna_q who fed the system a curveball, which I always like to see. As you might expect by now after seeing AlphaStar, OpenAI 5 etc. etc., if you drag the system away from its training data and into weirder territory, it begins to wobble. theregister.co.uk/2019/02/14…
1
10
21
11 Feb 2019
The DL CV community is having a "oh wait, bags of local features are a really strong baseline for classification" moment with the BagNet paper. This has always been clear for text classification due to n-gram baselines. It took an embarrassingly long time for nets to beat them.
5
72
411
11 Feb 2019
So nets are stubbornly, begrudgingly, moving in the right direction and we're throwing ever larger amounts of compute and data at them and praying it's enough for them to figure out how to do things "the right way". Will that work? Don't know. Probably still worth checking?
8
30
382
19 Nov 2018
Nice discussion of the progress in NLU that's happening with BERT, OpenAI GPT, ULMFiT, ELMo, and more covered by @CadeMetz in the @nytimes I'm super excited to see how far this line of research will be able to get in the next few years! nytimes.com/2018/11/18/techn…
46
163