Elad Hoffer

Elad Hoffer

Photos and videos

Tweets

Daniel Soudry retweeted

May 8

Excited to share our new arXiv preprint: "Retrieval from Within: An Intrinsic Capability of Attention-Based Models" We introduce INTRA, a framework where attention-based models retrieve from their own internal representations. arxiv.org/abs/2605.05806 1/5 🧵

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own...

arxiv.org

6,862

Daniel Soudry

Daniel Soudry @soudry_daniel

11 Dec 2025

Accelerate your transformer model with the new Block-Sparse-Flash-Attention! github.com/Danielohayon/Bloc… This training-free, drop-in replacement extends FlashAttention-2 with minimal code changes (CUDA Kernels Included). Paper: arxiv.org/abs/2512.07011

GitHub - Danielohayon/Block-Sparse-Flash-Attention

Contribute to Danielohayon/Block-Sparse-Flash-Attention development by creating an account on GitHub.

github.com

486

PapersAnon

Daniel Soudry retweeted

PapersAnon @papers_anon

20 Sep 2024

Scaling FP8 training to trillion-token LLMs From Intel. Trained a 7B model using FP8 precision on 256 Gaudi2 accelerators. Matched BF16 with 34% throughput improvement while using 30% less memory. Introduces Smooth-SwiGLU as solution to outlier amplification. Links below

105

6,749

Gon Buzaglo

Daniel Soudry retweeted

Gon Buzaglo @gon_buzaglo

22 Jul 2024

I'm back in Vienna to present our paper at @icmlconf with Itamar Harel on Wednesday at 1:30 pm, at poster #907. Looking forward to meeting and chatting! #ICML2024

Gon Buzaglo @gon_buzaglo

12 Feb 2024

Q: You sample random neural networks until you find one with perfect training accuracy. What will be the generalization error? A: Typically good — We prove that when a “simple explanation” exists, such sampled NNs (MLP/CNNs) generalize well! arxiv.org/abs/2402.06323

3,469

Gon Buzaglo

Daniel Soudry retweeted

Gon Buzaglo @gon_buzaglo

12 Jun 2024

Selected as Spotlight for #ICML2024 ! 🥳

Gon Buzaglo @gon_buzaglo

12 Feb 2024

4,601

Gon Buzaglo

Daniel Soudry retweeted

Gon Buzaglo @gon_buzaglo

12 Feb 2024

How Uniform Random Weights Induce Non-uniform Bias: Typical...

Background. A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained...

arxiv.org

155

37,999