Anthropic

Anthropic

1 Photos and videos

Tweets

Daniela Amodei retweeted

Anthropic

@AnthropicAI

11 Jul 2023

Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at claude.ai in the US and UK.

Claude 2's performance on the GRE, USMLE, and Multistate Bar Exam. Claude 2 on GRE: Verbal: 165; Analytical Writing: 5.0; Quant Reasoning: 154. Claude 2 on USMLE: Step 1 (5-shot) 68.9; Step 2: 63.3; Step 3: 67.2. Claude 2 on Multistate Bar Exam (5-shot): 76.5

ALT Claude 2's performance on the GRE, USMLE, and Multistate Bar Exam. Claude 2 on GRE: Verbal: 165; Analytical Writing: 5.0; Quant Reasoning: 154. Claude 2 on USMLE: Step 1 (5-shot) 68.9; Step 2: 63.3; Step 3: 67.2. Claude 2 on Multistate Bar Exam (5-shot): 76.5

247

493

2,295

861,532

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

14 Sep 2022

Neural networks often pack many unrelated concepts into a single neuron – a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. In our latest work, we build toy models where the origins of polysemanticity can be fully understood.

628

3,878

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

13 Jul 2022

In "Language Models (Mostly) Know What They Know", we show that language models can evaluate whether what they say is true, and predict ahead of time whether they'll be able to answer questions correctly. arxiv.org/abs/2207.05221

A 52B language model can evaluate the validity of its own proposed answers - separating the correct and incorrect responses - on questions from TriviaQA, Lambada, Arithmetic, GSM8k, and Codex HumanEval. We have weighted the overall contribution from each of these five datasets equally.

ALT A 52B language model can evaluate the validity of its own proposed answers - separating the correct and incorrect responses - on questions from TriviaQA, Lambada, Arithmetic, GSM8k, and Codex HumanEval. We have weighted the overall contribution from each of these five datasets equally.

153

927

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

27 Jun 2022

Transformer MLP neurons are challenging to understand. We find that using a different activation function (Softmax Linear Units or SoLU) increases the fraction of neurons that appear to respond to understandable features without any performance penalty. transformer-circuits.pub/202…

$Graph showing the fraction of neurons of which quickly suggest an interpretation in regular transformer language models vs SoLU across a range of scales. Baseline models are ~30% of neurons with an interpretation, while SoLU is ~60%.$

ALT Graph showing the fraction of neurons of which quickly suggest an interpretation in regular transformer language models vs SoLU across a range of scales. Baseline models are ~30% of neurons with an interpretation, while SoLU is ~60%.

384

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

24 May 2022

In a new paper, we show that repeating only a small fraction of the data used to train a language model (albeit many times) can damage performance significantly, and we observe a "double descent" phenomenon associated with this. arxiv.org/abs/2205.10487

$Experimental Setup. From a large original text dataset (left), we draw 90% of our desired training dataset in a non-repeated fashion, and 10% as repeats of a tiny portion of the original dataset (right). We hold constant that 10% of total training tokens will come from repeats, but we vary the repeated fraction in our runs. In other words, the sample to be repeated might be very small, like 0.01% of the total training tokens repeated 1000x, or relatively large, like 1% of the total training tokens repeated 10x. A small, held-back portion of the original dataset (yellow in left figure), not including any repeated data, is used as a test set and is the test loss reported in all subsequent figures.$

ALT Experimental Setup. From a large original text dataset (left), we draw 90% of our desired training dataset in a non-repeated fashion, and 10% as repeats of a tiny portion of the original dataset (right). We hold constant that 10% of total training tokens will come from repeats, but we vary the repeated fraction in our runs. In other words, the sample to be repeated might be very small, like 0.01% of the total training tokens repeated 1000x, or relatively large, like 1% of the total training tokens repeated 10x. A small, held-back portion of the original dataset (yellow in left figure), not including any repeated data, is used as a test set and is the test loss reported in all subsequent figures.

337

Daniela Amodei

Daniela Amodei @DanielaAmodei

29 Apr 2022

Excited to announce our latest fundraising round! We’re genuinely honored to be entrusted with the resources to continue our work in frontier AI safety and research.

Anthropic

@AnthropicAI

29 Apr 2022

We’ve raised $580 million in a Series B. This will help us further develop our research to build usable, reliable AI systems. Find out more: anthropic.com/news/announcem…

104

more replies

Daniela Amodei

Daniela Amodei @DanielaAmodei

29 Apr 2022

As well as steerability and robustness -arxiv.org/abs/2112.00861 - reinforcement learning - arxiv.org/abs/2204.05862, societal impacts - arxiv.org/abs/2202.07785, and more!

A General Language Assistant as a Laboratory for Alignment

Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful,...

arxiv.org

Daniela Amodei

Daniela Amodei @DanielaAmodei

29 Apr 2022

I’m looking forward to what’s to come. And we’re hiring! anthropic.com/#careers

Home \ Anthropic

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

anthropic.com

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

15 Apr 2022

Glad @QuantaMagazine highlights progress on induction heads/rigorous interpretability by @ch402, @catherineols, @nelhage and others @AnthropicAI. More to come! quantamagazine.org/researche…

Researchers Glimpse How AI Gets So Good at Language Processing | Quanta Magazine

Language processing programs are notoriously hard to interpret, but smaller versions can provide important insights into how they work.

quantamagazine.org

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

13 Apr 2022

We've trained a natural language assistant to be more helpful and harmless by using reinforcement learning with human feedback (RLHF). arxiv.org/abs/2204.05862

A graph showing the difference in performance between context distilled, static HH RLHF, Online HH RLHF, and Online Helpful RLHF models. Online Helpful RLHF models do best - close to the distribution of scores for professional writers.

ALT A graph showing the difference in performance between context distilled, static HH RLHF, Online HH RLHF, and Online Helpful RLHF models. Online Helpful RLHF models do best - close to the distribution of scores for professional writers.

268

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

9 Mar 2022

On the @FLIxrisk podcast, we discuss AI research, AI safety, and what it was like starting Anthropic during COVID. futureoflife.org/2022/03/04/…

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

8 Mar 2022

In our second interpretability paper, we revisit “induction heads”. In 2 layer transformers these pattern-completion heads form exactly when in-context learning abruptly improves. Are they responsible for most in-context learning in large transformers? transformer-circuits.pub/202…

306

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

17 Feb 2022

Our first societal impacts paper explores the technical traits of large generative models and the motivations and challenges people face in building and deploying them: arxiv.org/abs/2202.07785

Predictability and Surprise in Large Generative Models

Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we...

arxiv.org

150

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

22 Dec 2021

Our first interpretability paper explores a mathematical framework for trying to reverse engineer transformer language models: A Mathematical Framework for Transformer Circuits: transformer-circuits.pub/202…

116

614

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

3 Dec 2021

Our first AI alignment paper, focused on simple baselines and investigations: A General Language Assistant as a Laboratory for Alignment arxiv.org/abs/2112.00861

A General Language Assistant as a Laboratory for Alignment

Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful,...

arxiv.org

325

Daniela Amodei

Daniela Amodei @DanielaAmodei

28 May 2021

Excited to announce what we’ve been working on this year - @AnthropicAI, an AI safety and research company. If you’d like to help us combine safety research with scaling ML models while thinking about societal impacts, check out our careers page anthropic.com/#careers

205

Daniela Amodei

Daniela Amodei @DanielaAmodei

28 May 2021

We’re going to be focused on pushing forward our research for the next few months and are hoping to have more to share later this year. Thrilled to be working with so many talented colleagues!

Anthropic

Daniela Amodei retweeted

Anthropic

@AnthropicAI

28 May 2021

Hello world! You can read our launch announcement here: anthropic.com/news/announcem…

296