Nikos Pappas

Nikos Pappas

28 Photos and videos

Tweets

Nikos Pappas @nik0spapp

10 Apr 2025

Finally sharing—Our Distribution Edited Model paper received an award at EMNLP '24 & was featured on @AmazonScience blog! bit.ly/3EpPE9N Big shoutout to Dhananjay Ram, @AdityaRawaI, @momcilh—plus all who shaped the effort. #DL #LLMs #EMNLP

Training large language models more efficiently

Training separate models on different datasets and then merging them reduces computational costs by as much as 91%.

amazon.science

343

EMNLP 2026

Nikos Pappas retweeted

EMNLP 2026 @emnlpmeeting

14 Nov 2024

Announcing the #EMNLP2024 awards for: -- Resource Paper -- Social Impact Paper -- Special Theme Paper

17,359

Bryan Li

Nikos Pappas retweeted

Bryan Li @bryanlics

27 Jun 2024

Do LLMs' reasoning abilities come from training on code🤔? Many think so, but how does this hold across languages🌐? We study the interplay of code and reasoning in our recent work (#acl2024). 📃arxiv.org/abs/2403.02567 🗃️github.com/amazon-science/xs… 1/6 🧵

154

16,686

AK

Nikos Pappas retweeted

@_akhaliq

12 Feb 2024

DeAL Decoding-time Alignment for Large Language Models paper page: huggingface.co/papers/2402.0… Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the residual gaps in model training and the reliability of such approaches are also questionable (e.g. susceptibility to jail-breaking even after safety training). To address these, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). At its core, we view decoding as a heuristic-guided search process and facilitate the use of a wide variety of alignment objectives. Our experiments with programmatic constraints such as keyword and length constraints (studied widely in the pre-LLM era) and abstract objectives such as harmlessness and helpfulness (proposed in the post-LLM era) show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs. Lastly, while DeAL can be effectively paired with RLHF and prompting techniques, its generality makes decoding slower, an optimization we leave for future work.

131

23,907

Albert Gu

Nikos Pappas retweeted

Albert Gu

@_albertgu

3 Jun 2024

excited to finally release Mamba-2!! 8x larger states, 50% faster training, and even more S's 🐍🐍 Mamba-2 aims to advance the theory of sequence models, developing a framework of connections between SSMs and (linear) attention that we call state space duality (SSD) w/@tri_dao

183

977

119,535

François Fleuret

Nikos Pappas retweeted

François Fleuret

@francoisfleuret

9 Feb 2024

Our 2020 paper "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" with @angeloskath @apoorv2904 and @nik0spapp reached 1000 citations! proceedings.mlr.press/v119/k…

231

41,490

Nikos Pappas

Nikos Pappas @nik0spapp

17 Mar 2023

We're recruiting research interns to work on next-generation conversational modeling in AWS AI Labs @awscloud 🤖💬 DM me or apply directly if interested to join us! #NLProc #deeplearning #conversationalAI #internship

827

Hugging Face

Nikos Pappas retweeted

Hugging Face

@huggingface

21 Feb 2023

Today we are excited to announce a new partnership with @awscloud! 🔥 Together, we will accelerate the availability of open-source machine learning 🤝 Read the post 👉 huggingface.co/blog/aws-part…

Hugging Face and AWS partner to make AI more accessible

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

151

691

123,196

Raphael Schumann

Nikos Pappas retweeted

Raphael Schumann @RaphiRaph_

26 Jan 2023

Updating ML models can introduce unseen errors, such as a virtual assistant 🤖 suddenly not understanding your often used command. How to avoid this? ✨ Backward Compatibility During Data Updates by Weight Interpolation ✨ 📜 arxiv.org/abs/2301.10546 ⌨️ github.com/amazon-science/re…

5,352

Zhaofeng Wu

Nikos Pappas retweeted

Zhaofeng Wu

@zhaofeng_wu

30 Nov 2022

Linear-complexity models are cool, but shouldn't they work best on loooong documents? We tried RFA on doc-level translation, and got >2x speedup with memory savings, >7x when memory is controlled, and >19x on CPU. Similar/better BLEU; some consistency scores are slightly hurt 1/2

Zhaofeng Wu

Nikos Pappas retweeted

Zhaofeng Wu

@zhaofeng_wu

30 Nov 2022

We also found that adding a gate to control information flow helps, which can be easily done with the RFA formulation. #emnlp2022 findings. Come check us out in the @SustaiNLP2022 workshop on Wednesday! With @haopeng_nlp @nik0spapp @nlpnoah arxiv.org/abs/2210.08431 2/2

Christoforos Mavrogiannis

Nikos Pappas retweeted

Christoforos Mavrogiannis @mavrojean

9 Nov 2022

I am looking for PhD students to join my lab @UMRobotics @UMich in Fall 2023! You'll already find my name on the #robotics dept/application website. Deadline is Dec 1, GRE not required, and there are application fee waivers!

180

357

BigCode

Nikos Pappas retweeted

BigCode @BigCodeProject

27 Oct 2022

Introducing 📑 The Stack - a 3TB dataset of permissively licensed code in 30 programming languages. hf.co/datasets/bigcode/the-s… You want your code excluded from the model training? There is an opt-out form and data governance plan: bigcode-project.org/docs/abo… Let's take a tour🧵

219

1,060

Michaël Benesty

Nikos Pappas retweeted

Michaël Benesty @pommedeterre33

26 Oct 2022

12X faster transformer model, possible? Yes, with @OpenAI Triton kernels! We release Kernl, a lib to speedup inference of transformer models. It's very fast (sometimes SOTA), 1 LoC to use, and hackable to match most transformer architectures. github.com/ELS-RD/kernl 🧵

106

554

Sasha Rush

Nikos Pappas retweeted

Sasha Rush

@srush_nlp

13 Oct 2022

It's a joke that all NLP talks must include this graph. But if you are a student it is a bit intimidating. How can you become an expert in where we are going if you can barely run BERT? I asked twitter for specific advice that you might focus on:

126

750

Elman Mansimov

Nikos Pappas retweeted

Elman Mansimov

@elmanmansimov

14 Sep 2022

Paper accepted to @NeurIPSConf 🎉 Big credit goes to @deng_cai who led the work during his internship at @awscloud and special acknowledgment to my collaborators! You can check out the arXiv pre-print in the meantime arxiv.org/abs/2202.02976

Measuring and Reducing Model Update Regression in Structured...

Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility...

arxiv.org

Tim Dettmers

Nikos Pappas retweeted

Tim Dettmers

@Tim_Dettmers

10 Aug 2022

We release the public beta for bnb-int8🟪 for all @huggingface 🤗models, which allows for Int8 inference without performance degradation up to scales of 176B params 📈. You can run OPT-175B/BLOOM-176B easily on a single machine 🖥️. You can try it here: docs.google.com/document/d/1… 1/n

216

882

Nikos Pappas

Nikos Pappas @nik0spapp

10 Jul 2022

Can’t wait to attend #NAACL2022 with Lex AWS AI and meet colleagues in person next week. If you are into efficiency, out-of-domain generalization and calibration/robustness topics, let’s chat!

Miguel Ballesteros

Nikos Pappas retweeted

Miguel Ballesteros @migballesteros

24 May 2022

We are hiring and we are attending ACL 2022, please find me to chat if you are interested! #ACL2022

Nikos Pappas

Nikos Pappas @nik0spapp

23 May 2022

A great number of recent methods successfully scale attention in transformers to long sequences but conceptually grouping them can be daunting. How can we view them in a unified way? (1/5)

more replies

Nikos Pappas

Nikos Pappas @nik0spapp

23 May 2022

This raises the question: can we also learn a contextualized control strategy? The answer is yes! To learn more about it and how Linformer can be applied in autoregressive settings watch the talk by the amazing Hao Peng (@haopeng01) at #ACL2022: underline.io/events/284/sess… (4/5)

Nikos Pappas

Nikos Pappas @nik0spapp

23 May 2022

Paper: aclanthology.org/2022.acl-lo… @haopeng01 @wittgen_ball @nik0spapp @DaniYogatama @zhaofeng_wu @ikekong @royschwartzNLP @nlpnoah (5/5)