📝 Top Papers in Computer Vision, NLP, Speech, Multimodal AI, Core ML, RecSys, & Graph ML
🔗
papers.aman.ai
👉🏼 I’ve put together a summary of key papers in
#AI and segregated them into (i) need-to-know and (ii) good-to-know.
🔹 Vision
- Image Classification (CNN architectures such as AlexNet, VGGNet, InceptionNet, ResNet to Transformer architectures such as ViT, DeiT, BEiT, MAE)
- Object Detection (YOLO v1-v8, Fast/er R-CNN, Mask R-CNN, CenterNet, Pix2Seq, DETR, Detic, Focal Loss)
- Semantic/Instance Segmentation (U-Net, Mask R-CNN, Segment Anything)
- NeRF (InstantNeRF, BlockNeRF)
- SSL Contrastive Learning (SimCLR, MoCo, DINO v1 & v2)
🔹 NLP
- Transformers (original paper)
- Semantic Representation Encoders (BERT and its variants: RoBERTa, DistillBERT, ELECTRA, XLNet, MPNet, ALBERT)
- Autoregressive Decoders (GPT-n, Llama 1/2/3, Alpaca, Vicuna)
- Augmented LMs (RAG, Toolformer, HuggingGPT, Gorilla)
- Supervised Fine-tuning (Instruction tuning/FLAN, LIMA, LESS)
- LLM Alignment (RLHF/InstructGPT, PPO, DPO, KTO, GPO, IPO, sDPO, ICDPO)
- Encoder Decoder Architectures (T0, T5, BART)
- Machine Translation (M2M-100, NLLB-200)
- Contrastive Learning (SNCSE, InfoNCE, Sentence-BERT)
- Prompting (CoT, Auto-CoT, Self-Consistency, ToT, GoT, ReAct, APE, ART)
- PEFT (Prefix-tuning, Adapters, LoRA, LLaMA-Adapter v1 and v2, QLoRA, QA-LoRA, DoRA, NOLA)
🔹 Speech
- SSL Pre-Training (WavLM, AudioMAE, HuBERT)
- Automatic Speech Recognition/Keyword Spotting (GMM-HMM, DNN-HMM, all-neural architectures such as LAS/Whisper, streaming architectures such as RNN-T/Transformer-T)
- Speaker Identification (i/d/x-vectors, GE2E loss, AAM loss)
- Text-to-Speech (HiFi-GAN, Tacotron v1 and v2, Voicebox)
- Text-to-Audio/Music (MusicGen, AudioGen)
🔹 Multimodal
- SSL Pre-Training (ViLT, MLIM, UNiTER, LXMERT, VisualBERT, Data2Vec v1 and v2, I-Code, VL-BEIT, ImageBind)
- V L Prompting (Flamingo, Frozen, InstructBLIP)
- Text-to-Image (DALL-E 1/2/3, Imagen, Latent Diffusion, Make-A-Scene, Make-a-Video)
- Translation (SeamlessM4T)
- Contrastive Learning (InfoNCE, CLIP, CLAP, AudioCLIP)
🔹 Core ML
- Training Regularizer (Dropout)
- Training/Inference Efficiency (ZeRO, ZeRO-Infinity, FlashAttention, FlashAttention-2)
- Training Stability (Batch/Layer/Group/Instance Norm, Residual/Skip Connections)
- Explainable AI (Guided Backprop, Grad-CAM, CAV, Influence functions, Representer points, TracIn)
🔹 RecSys
- ML-based Collaborative Filtering (Factorization Machines)
- DL-based Algorithms (Collaborative Deep Learning, Wide & Deep, DNNs for YouTube Recommendations, Product-based DNNs, NCF, Deep & Cross v1 and v2, DeepFM, Deep Interest Network, Behavior Sequence Transformer)
🔹 Graph ML
- Factorization-based Algorithms LLE (LLE, LAP, HOPE)
- Random Walk-based Algorithms (Node2vec)
- Deep Learning-based Algorithms (SDNE, GraphSAGE, EGNN, GCN, GAT)
#ArtificialIntelligence