Alec Radford

Alec Radford

76 Photos and videos

Tweets

Pinned Tweet

Alec Radford

@AlecRad

11 Jun 2018

What I've been working on for the past year! blog.openai.com/p/7fa97c36-6… Inspired by CoVE, ELMo, and ULMFiT we show that a single transformer language model can be finetuned to a wide variety of NLP tasks and performs very well with little tuning/tweaking.

450

1,840

Geoffrey Irving

Alec Radford retweeted

Geoffrey Irving

@geoffreyirving

Jun 10

We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵

138

943

181,312

Nick Levine

Alec Radford retweeted

Nick Levine

@status_effects

Apr 27

New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

7:59

178

396

3,152

1,180,261

David Duvenaud

Alec Radford retweeted

David Duvenaud

@DavidDuvenaud

Apr 27

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

7:58

201

456

3,619

1,420,756

Grace Luo

Alec Radford retweeted

Grace Luo @graceluo_

Feb 9

We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with @feng_jiahai, @trevordarrell, @AlecRad, @JacobSteinhardt. More in thread 🧵

0:07

192

1,436

221,531

Neil Rathi

Alec Radford retweeted

Neil Rathi

@neil_rathi

Jan 30

New paper, w/@AlecRad Models acquire a lot of capabilities during pretraining. We show that we can precisely shape what they learn simply by filtering their training data at the token level.

1,119

111,326

Alec Radford

Alec Radford

@AlecRad

25 Apr 2019

This is a really fun live experiment with twitch chat predictably oscillating between love and hate based on the sample.

208

Christine McLeavey

Alec Radford retweeted

Christine McLeavey @mcleavey

25 Apr 2019

Extremely excited to share work I've been doing at OpenAI the past few months: MuseNet, a neural net music generator. It's been a huge team effort pulling this all together!

OpenAI

@OpenAI

25 Apr 2019

Introducing MuseNet, a neural network which discovered how to generate music using many different instruments and styles. Listen & interact: openai.com/blog/musenet/ MuseNet will play an experimental concert today from 12–3pmPT on livestream: twitch.tv/openai

0:41

198

1,010

rewon

Alec Radford retweeted

rewon @rewonfc

23 Apr 2019

Releasing some work today with @scottgray76 @AlecRad and @ilyasut. Contains some simple adaptations for Transformers that extend them to long sequences.

OpenAI

@OpenAI

23 Apr 2019

Releasing the Sparse Transformer, a network which sets records at predicting what comes next in a sequence — whether text, images, or sound. Improvements to neural 'attention' let it extract patterns from sequences 30x longer than possible previously: openai.com/blog/sparse-trans…

0:26

212

Graham Neubig

Alec Radford retweeted

Graham Neubig

@gneubig

27 Feb 2019

One commonly cited argument about the difficulty of learning common-sense reasoning is that "no-one writes down common sense". A counter-argument is "well, the web is big": instructables.com/id/How-To-…

145

Nando de Freitas

Alec Radford retweeted

Nando de Freitas

@NandoDF

17 Feb 2019

First, reproducibility is not about rerunning code to get the same results. Science must be more robust, as naive copying has many flaws. Second, reproducibility should never be above public safety. We must publish responsibility, with hope and kindness in our minds.

Volodymyr Kuleshov 🇺🇦@volokuleshov

16 Feb 2019

Replying to @NandoDF @ilyasut @icmlconf @iclr2019

Don't the benefits of increased reproducibility and rigor on the part of the authors greatly outweigh any potential misuses of their work, at least for the vast majority of ICML/ICLR papers? I think the current shift towards empirical work puts a greater need on releasing code.

124

Joshua Achiam

Alec Radford retweeted

Joshua Achiam

@jachiam0

17 Feb 2019

I'd like to weigh in on the #GPT2 discussion. The decision not to release the trained model was carefully considered and important for norm-forming. Serving the public good requires us to draw lines on release somewhere: better long before catastrophe than after.

368

Alec Radford

Alec Radford

@AlecRad

17 Feb 2019

By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."

584

Smerity

Alec Radford retweeted

Smerity @Smerity

15 Feb 2019

Replying to @zeynep

It's interesting we're having this discussion upon releasing text models that _might_ have potential for misuse yet we never engaged as fully as a community when many of the technologies powering visual Deep Fakes were being released, including hard to make pretrained models.

mike cook

Alec Radford retweeted

mike cook @mtrc

14 Feb 2019

Shoutout to @katyanna_q who fed the system a curveball, which I always like to see. As you might expect by now after seeing AlphaStar, OpenAI 5 etc. etc., if you drag the system away from its training data and into weirder territory, it begins to wobble. theregister.co.uk/2019/02/14…

Alec Radford

Alec Radford

@AlecRad

11 Feb 2019

The DL CV community is having a "oh wait, bags of local features are a really strong baseline for classification" moment with the BagNet paper. This has always been clear for text classification due to n-gram baselines. It took an embarrassingly long time for nets to beat them.

411

more replies

Alec Radford

Alec Radford

@AlecRad

11 Feb 2019

Also see some of his follow-up poking at this in a very different model with Section 3.3 of the PixelCNN paper arxiv.org/abs/1701.05517

PixelCNN : Improving the PixelCNN with Discretized Logistic...

PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at...

arxiv.org

Alec Radford

Alec Radford

@AlecRad

11 Feb 2019

So nets are stubbornly, begrudgingly, moving in the right direction and we're throwing ever larger amounts of compute and data at them and praying it's enough for them to figure out how to do things "the right way". Will that work? Don't know. Probably still worth checking?

382

Alec Radford

Alec Radford

@AlecRad

19 Nov 2018

Nice discussion of the progress in NLU that's happening with BERT, OpenAI GPT, ULMFiT, ELMo, and more covered by @CadeMetz in the @nytimes I'm super excited to see how far this line of research will be able to get in the next few years! nytimes.com/2018/11/18/techn…

Finally, a Machine That Can Finish Your Sentence (Published 2018)

Completing someone else’s thought is not an easy trick for A.I. But new systems are starting to crack the code of natural language.

nytimes.com

163