Associate Professor at the Technion, try to understand how AI works, and how to make it more efficient.

Joined December 2019
Photos and videos
Daniel Soudry retweeted
Excited to share our new arXiv preprint: "Retrieval from Within: An Intrinsic Capability of Attention-Based Models" We introduce INTRA, a framework where attention-based models retrieve from their own internal representations. arxiv.org/abs/2605.05806 1/5 đź§µ
1
6
25
6,862
Accelerate your transformer model with the new Block-Sparse-Flash-Attention! github.com/Danielohayon/Bloc… This training-free, drop-in replacement extends FlashAttention-2 with minimal code changes (CUDA Kernels Included). Paper: arxiv.org/abs/2512.07011
7
18
486
Daniel Soudry retweeted
Scaling FP8 training to trillion-token LLMs From Intel. Trained a 7B model using FP8 precision on 256 Gaudi2 accelerators. Matched BF16 with 34% throughput improvement while using 30% less memory. Introduces Smooth-SwiGLU as solution to outlier amplification. Links below
4
18
105
6,749
Daniel Soudry retweeted
I'm back in Vienna to present our paper at @icmlconf with Itamar Harel on Wednesday at 1:30 pm, at poster #907. Looking forward to meeting and chatting! #ICML2024
Q: You sample random neural networks until you find one with perfect training accuracy. What will be the generalization error? A: Typically good — We prove that when a “simple explanation” exists, such sampled NNs (MLP/CNNs) generalize well! arxiv.org/abs/2402.06323
1
3
17
3,469
Daniel Soudry retweeted
Selected as Spotlight for #ICML2024 ! 🥳
Q: You sample random neural networks until you find one with perfect training accuracy. What will be the generalization error? A: Typically good — We prove that when a “simple explanation” exists, such sampled NNs (MLP/CNNs) generalize well! arxiv.org/abs/2402.06323
2
2
34
4,601
Daniel Soudry retweeted
Q: You sample random neural networks until you find one with perfect training accuracy. What will be the generalization error? A: Typically good — We prove that when a “simple explanation” exists, such sampled NNs (MLP/CNNs) generalize well! arxiv.org/abs/2402.06323
6
36
155
37,999