Early Statistical & Neural LMs (pre-2018)
n-gram models (various, 1990s–2000s) — Kneser-Ney, Good-Turing smoothed models
Word2Vec (Google, 2013)
GloVe (Stanford, 2014)
FastText (Facebook, 2016)
ELMo (Allen AI, 2018)
ULMFiT (
fast.ai, 2018)
The Transformer Era Begins (2018–2020)
GPT-1 (OpenAI, 2018)
BERT (Google, 2018)
Transformer-XL (Google/CMU, 2019)
XLNet (Google/CMU, 2019)
GPT-2 (OpenAI, 2019) — small, medium, large, XL
RoBERTa (Facebook, 2019)
ALBERT (Google, 2019)
DistilBERT (Hugging Face, 2019)
T5 (Google, 2019) — small through 11B
CTRL (Salesforce, 2019)
Megatron-LM (NVIDIA, 2019)
ERNIE (Baidu, 2019) — ERNIE 1.0, 2.0, 3.0
Grover (Allen AI, 2019)
XLM / XLM-R (Facebook, 2019)
ELECTRA (Google, 2020)
GPT-3 (OpenAI, 2020) — Ada, Babbage, Curie, Davinci
GShard (Google, 2020)
mBART (Facebook, 2020)
DeBERTa (Microsoft, 2020)
Reformer (Google, 2020)
Longformer (Allen AI, 2020)
BigBird (Google, 2020)
Turing-NLG (Microsoft, 2020)
BlenderBot 1.0 (Facebook, 2020)
Scaling Up (2021)
Switch Transformer (Google, 2021) — 1.6T parameters (sparse)
GPT-Neo (EleutherAI, 2021) — 1.3B, 2.7B
GPT-J-6B (EleutherAI, 2021)
Codex (OpenAI, 2021)
Jurassic-1 (AI21 Labs, 2021)
DALL·E (OpenAI, 2021) — multimodal but LM-based
Wu Dao 2.0 (BAAI, 2021) — 1.75T parameters
PanGu-α (Huawei, 2021)
HyperCLOVA (Naver, 2021)
LaMDA (Google, 2021)
Gopher (DeepMind, 2021)
FLAN (Google, 2021)
GLaM (Google, 2021)
Megatron-Turing NLG (Microsoft/NVIDIA, 2021) — 530B
WebGPT (OpenAI, 2021)
Yuan 1.0 (Inspur, 2021)
BlenderBot 2.0 (Meta, 2021)
Cohere models (Cohere, 2021)
The Explosion (2022)
GPT-NeoX-20B (EleutherAI, 2022)
InstructGPT (OpenAI, 2022)
Chinchilla (DeepMind, 2022)
PaLM (Google, 2022) — 540B
OPT (Meta, 2022) — 125M to 175B
BLOOM (BigScience/HuggingFace, 2022) — 176B
Alexa TM (Amazon, 2022) — 20B
UL2 (Google, 2022)
Galactica (Meta, 2022)
ChatGPT / GPT-3.5 (OpenAI, Nov 2022)
Flan-T5 / Flan-PaLM (Google, 2022)
Sparrow (DeepMind, 2022)
Minerva (Google, 2022)
AlphaCode (DeepMind, 2022)
Jurassic-2 (AI21, 2022)
Luminous (Aleph Alpha, 2022)
text-davinci-003 (OpenAI, 2022)
GLM-130B (Tsinghua, 2022)
Tk-Instruct (Allen AI, 2022)
Whisper (OpenAI, 2022) — speech, but transformer-based
NLLB (Meta, 2022) — translation
mT0 / BLOOMZ (BigScience, 2022)
Stable Diffusion (Stability AI, 2022) — diffusion, but includes text encoder LMs
YaLM (Yandex, 2022) — 100B
2023 — The ChatGPT Aftermath
LLaMA 1 (Meta, Feb 2023) — 7B, 13B, 33B, 65B
GPT-4 (OpenAI, Mar 2023)
Alpaca (Stanford, 2023) — fine-tuned LLaMA
Vicuna (LMSYS, 2023)
Claude 1 (Anthropic, Mar 2023)
Claude 2 (Anthropic, Jul 2023)
Claude Instant (Anthropic, 2023)
PaLM 2 (Google, May 2023)
Bard (Google, 2023) — powered by PaLM 2
Gemini 1.0 (Google DeepMind, Dec 2023) — Ultra, Pro, Nano
LLaMA 2 (Meta, Jul 2023) — 7B, 13B, 70B Chat variants
Code Llama (Meta, 2023)
Mistral 7B (Mistral AI, Sep 2023)
Mixtral 8x7B (Mistral AI, Dec 2023) — MoE
Falcon (TII, 2023) — 7B, 40B, 180B
MPT (MosaicML, 2023) — 7B, 30B
StableLM (Stability AI, 2023)
Dolly (Databricks, 2023)
RedPajama (Together AI, 2023)
Phi-1 / Phi-1.5 (Microsoft, 2023)
Orca / Orca 2 (Microsoft, 2023)
Qwen 1 / Qwen 1.5 (Alibaba, 2023)
Yi (
01.AI, 2023) — 6B, 34B
Baichuan (Baichuan Inc, 2023) — 7B, 13B
ChatGLM / ChatGLM2 / ChatGLM3 (Zhipu AI, 2023)
InternLM (Shanghai AI Lab, 2023)
Xwin-LM (Xwin, 2023)
DeepSeek (DeepSeek, 2023) — Coder, LLM, Math
Zephyr (HuggingFace, 2023)
OpenHermes (Teknium, 2023)
Neural Chat (Intel, 2023)
Starling (Berkeley, 2023)
Puma / Platypus (various, 2023)
WizardLM / WizardCoder / WizardMath (Microsoft, 2023)
Gorilla (Berkeley, 2023)
SOLAR (Upstage, 2023)
Persimmon-8B (Adept, 2023)
Inflection-1 / Pi (Inflection AI, 2023)
Command / Command R (Cohere, 2023)
Jamba (AI21 Labs, 2023)
Nous Hermes (NousResearch, 2023)
Fuyu-8B (Adept, 2023)
BlenderBot 3 (Meta, 2023)
SeamlessM4T (Meta, 2023)
Segment Anything (Meta, 2023) — vision, but transformer-based
Med-PaLM 2 (Google, 2023)
Amazon Titan (Amazon, 2023)
Jurassic-2 Ultra (AI21, 2023)
Nemotron (NVIDIA, 2023)
2024 — Open Weight & Frontier Race
Claude 3 (Anthropic, Mar 2024) — Haiku, Sonnet, Opus
Claude 3.5 Sonnet (Anthropic, Jun 2024)
Claude 3.5 Haiku (Anthropic, Oct 2024)
GPT-4 Turbo (OpenAI, 2024)
GPT-4o (OpenAI, May 2024)
GPT-4o mini (OpenAI, Jul 2024)
o1 / o1-preview / o1-mini (OpenAI, Sep 2024)
Gemini 1.5 (Google, Feb 2024) — Pro, Flash
Gemini 2.0 Flash (Google, Dec 2024)
LLaMA 3 (Meta, Apr 2024) — 8B, 70B
LLaMA 3.1 (Meta, Jul 2024) — 8B, 70B, 405B
LLaMA 3.2 (Meta, Sep 2024) — 1B, 3B, 11B, 90B (incl. vision)
LLaMA 3.3 (Meta, Dec 2024) — 70B
Mistral Large (Mistral, Feb 2024)
Mistral Small / Mistral NeMo (Mistral, 2024)
Mixtral 8x22B (Mistral, 2024)
Mistral Medium (Mistral, 2024)
Codestral (Mistral, 2024)
Pixtral (Mistral, 2024)
Phi-2 (Microsoft, early 2024)
Phi-3 / Phi-3.5 (Microsoft, 2024) — Mini, Small, Medium
Qwen 2 / Qwen 2.5 (Alibaba, 2024) — various sizes Coder, Math, VL
QwQ (Alibaba, 2024) — reasoning model
DeepSeek-V2 (DeepSeek, 2024)
DeepSeek-Coder-V2 (DeepSeek, 2024)
Yi-1.5 (
01.AI, 2024)
InternLM 2 / 2.5 (Shanghai AI Lab, 2024)
GLM-4 (Zhipu AI, 2024)
Command R (Cohere, 2024)
Aya 23 (Cohere for AI, 2024)
DBRX (Databricks, 2024)
Snowflake Arctic (Snowflake, 2024)
Grok-1 (xAI, Mar 2024) — open weights, 314B MoE
Grok-2 (xAI, Aug 2024)
Gemma 1 / Gemma 2 (Google, 2024) — 2B, 7B, 9B, 27B
RecurrentGemma (Google, 2024)
PaliGemma (Google, 2024)
Reka Core / Flash / Edge (Reka AI, 2024)
Inflection-2.5 (Inflection AI, 2024)
Jamba 1.5 (AI21, 2024)
Nemotron-4 (NVIDIA, 2024)
NVLM (NVIDIA, 2024)
OLMo (Allen AI, 2024) — 1B, 7B
StarCoder2 (BigCode, 2024)
Stable LM 2 (Stability AI, 2024)
Stable Code (Stability AI, 2024)
Fugaku-LLM (RIKEN/Fujitsu, 2024)
Sarvam-1 (Sarvam AI, 2024) — Indian languages
SEA-LION (AI Singapore, 2024)
EXAONE (LG AI Research, 2024)
Solar Pro (Upstage, 2024)
Granite (IBM, 2024) — various sizes
MiniCPM (Tsinghua/OpenBMB, 2024)
MAP-Neo (M-A-P, 2024)
Sailor (Sea AI Lab, 2024)
Tele-FLM (FlagOpen, 2024)
Marco-o1 (Alibaba, 2024)
2025 (through May)
Claude 3.5 Opus (Anthropic, 2025)
Claude 4 Sonnet (Anthropic, 2025)
Claude 4 Opus (Anthropic, 2025)
Claude Sonnet 4.5 (Anthropic, 2025)
Claude Opus 4.5 / 4.6 (Anthropic, 2025)
Claude Haiku 4.5 (Anthropic, 2025)
DeepSeek-V3 (DeepSeek, Jan 2025)
DeepSeek-R1 (DeepSeek, Jan 2025) — reasoning model
GPT-4.5 (OpenAI, 2025)
GPT-o3 / o3-mini / o4-mini (OpenAI, 2025)
Gemini 2.0 Pro / Flash (Google, 2025)
Gemini 2.5 Pro / Flash (Google, 2025)
LLaMA 4 (Meta, 2025) — Scout, Maverick
Mistral Large 2 (Mistral, 2025)
Grok-3 (xAI, 2025)
Qwen 3 (Alibaba, 2025)
Phi-4 (Microsoft, 2025)
Gemma 3 (Google, 2025)
OLMo 2 (Allen AI, 2025)
Command A (Cohere, 2025)