Senior Principal Scientist at @oracle 🤖 Ex: @awscloud, @uwcse, @Idiap_ch, PhD @epfl_en.

Joined June 2009
28 Photos and videos
Finally sharing—Our Distribution Edited Model paper received an award at EMNLP '24 & was featured on @AmazonScience blog! bit.ly/3EpPE9N Big shoutout to Dhananjay Ram, @AdityaRawaI, @momcilh—plus all who shaped the effort. #DL #LLMs #EMNLP
4
343
Nikos Pappas retweeted
Announcing the #EMNLP2024 awards for: -- Resource Paper -- Social Impact Paper -- Special Theme Paper
1
9
47
17,359
Nikos Pappas retweeted
27 Jun 2024
Do LLMs' reasoning abilities come from training on code🤔? Many think so, but how does this hold across languages🌐? We study the interplay of code and reasoning in our recent work (#acl2024). 📃arxiv.org/abs/2403.02567 🗃️github.com/amazon-science/xs… 1/6 🧵
5
29
154
16,686
Nikos Pappas retweeted
12 Feb 2024
DeAL Decoding-time Alignment for Large Language Models paper page: huggingface.co/papers/2402.0… Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the residual gaps in model training and the reliability of such approaches are also questionable (e.g. susceptibility to jail-breaking even after safety training). To address these, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). At its core, we view decoding as a heuristic-guided search process and facilitate the use of a wide variety of alignment objectives. Our experiments with programmatic constraints such as keyword and length constraints (studied widely in the pre-LLM era) and abstract objectives such as harmlessness and helpfulness (proposed in the post-LLM era) show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs. Lastly, while DeAL can be effectively paired with RLHF and prompting techniques, its generality makes decoding slower, an optimization we leave for future work.
2
30
131
23,907
Nikos Pappas retweeted
3 Jun 2024
excited to finally release Mamba-2!! 8x larger states, 50% faster training, and even more S's 🐍🐍 Mamba-2 aims to advance the theory of sequence models, developing a framework of connections between SSMs and (linear) attention that we call state space duality (SSD) w/@tri_dao
9
183
977
119,535
Nikos Pappas retweeted
Our 2020 paper "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" with @angeloskath @apoorv2904 and @nik0spapp reached 1000 citations! proceedings.mlr.press/v119/k…

5
22
231
41,490
We're recruiting research interns to work on next-generation conversational modeling in AWS AI Labs @awscloud 🤖💬 DM me or apply directly if interested to join us! #NLProc #deeplearning #conversationalAI  #internship
2
5
827
Nikos Pappas retweeted
Today we are excited to announce a new partnership with @awscloud! 🔥 Together, we will accelerate the availability of open-source machine learning 🤝 Read the post 👉 huggingface.co/blog/aws-part…
10
151
691
123,196
Nikos Pappas retweeted
Updating ML models can introduce unseen errors, such as a virtual assistant 🤖 suddenly not understanding your often used command. How to avoid this? ✨ Backward Compatibility During Data Updates by Weight Interpolation ✨ 📜 arxiv.org/abs/2301.10546 ⌨️ github.com/amazon-science/re…
1
5
26
5,352
Nikos Pappas retweeted
Linear-complexity models are cool, but shouldn't they work best on loooong documents? We tried RFA on doc-level translation, and got >2x speedup with memory savings, >7x when memory is controlled, and >19x on CPU. Similar/better BLEU; some consistency scores are slightly hurt 1/2
2
4
26
Nikos Pappas retweeted
We also found that adding a gate to control information flow helps, which can be easily done with the RFA formulation. #emnlp2022 findings. Come check us out in the @SustaiNLP2022 workshop on Wednesday! With @haopeng_nlp @nik0spapp @nlpnoah arxiv.org/abs/2210.08431 2/2

1
11
Nikos Pappas retweeted
I am looking for PhD students to join my lab @UMRobotics @UMich in Fall 2023! You'll already find my name on the #robotics dept/application website. Deadline is Dec 1, GRE not required, and there are application fee waivers!
8
180
357
Nikos Pappas retweeted
Introducing 📑 The Stack - a 3TB dataset of permissively licensed code in 30 programming languages. hf.co/datasets/bigcode/the-s… You want your code excluded from the model training? There is an opt-out form and data governance plan: bigcode-project.org/docs/abo… Let's take a tour🧵
9
219
1,060
Nikos Pappas retweeted
12X faster transformer model, possible? Yes, with @OpenAI Triton kernels! We release Kernl, a lib to speedup inference of transformer models. It's very fast (sometimes SOTA), 1 LoC to use, and hackable to match most transformer architectures. github.com/ELS-RD/kernl 🧵
6
106
554
Nikos Pappas retweeted
13 Oct 2022
It's a joke that all NLP talks must include this graph. But if you are a student it is a bit intimidating. How can you become an expert in where we are going if you can barely run BERT? I asked twitter for specific advice that you might focus on:
22
126
750
Nikos Pappas retweeted
We release the public beta for bnb-int8🟪 for all @huggingface 🤗models, which allows for Int8 inference without performance degradation up to scales of 176B params 📈. You can run OPT-175B/BLOOM-176B easily on a single machine 🖥️. You can try it here: docs.google.com/document/d/1… 1/n
26
216
882
Can’t wait to attend #NAACL2022 with Lex AWS AI and meet colleagues in person next week. If you are into efficiency, out-of-domain generalization and calibration/robustness topics, let’s chat!
1
55
Nikos Pappas retweeted
We are hiring and we are attending ACL 2022, please find me to chat if you are interested! #ACL2022
1
6
27
A great number of recent methods successfully scale attention in transformers to long sequences but conceptually grouping them can be daunting. How can we view them in a unified way? (1/5)
1
1
12
This raises the question: can we also learn a contextualized control strategy? The answer is yes! To learn more about it and how Linformer can be applied in autoregressive settings watch the talk by the amazing Hao Peng (@haopeng01) at #ACL2022: underline.io/events/284/sess… (4/5)

1
1