PyTorch

PyTorch

17 Photos and videos

Tweets

DeepSpeed (日本語アカウント) retweeted

PyTorch

@PyTorch

May 18

Don't miss @DeepSpeedAI virtual office hours on May 26 at 12:00 PM America/New_York to ask questions of @toh_tana member of DeepSpeed TSC & get the latest recent key updates, including AutoSP (sequence parallel), AutoEP (expert parallel), and AutoTP (tensor parallel).

DeepSpeed

@DeepSpeedAI

May 14

x.com/i/article/205507461708…

8,241

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

Apr 30

DeepSpeed の新機能 AutoSP のPyTorch公式ブログが公開されました！ - コンパイラレベルでの最適化により、既存モデルに設定変更だけで Sequence Parallel を適用 - 長い系列の学習に最適化された Sequence-aware AC (activation checkpointing) これにより、長い系列の学習を、より高いGPU効率で容易に実現できます。 pytorch.org/blog/introducing…

DeepSpeed

@DeepSpeedAI

Apr 30

Great News! Thanks to DeepSpeed AutoSP, efficient long context LLM training is now easily accessible.

417

Stas Bekman

DeepSpeed (日本語アカウント) retweeted

Stas Bekman

@StasBekman

Mar 9

Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into @huggingface Trainer, Accelerate and TRL For extensive details please see this writeup: huggingface.co/blog/ulysses-… Thanks a lot to @krasul for helping make it happen. Also the others in the HF team who helped with integration.

116

17,818

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

Feb 26

PyTorchブログで最新のDeepSpeedアップデートが紹介されました！ - PyTorch互換の backward API: Rayを用いたマルチモーダルの大規模学習をよりシンプルに実装可能に - 省メモリな BF16/FP16 モード: torch.autocastとの組み合わせにより、ピークメモリ削減（最大40%）ご意見・ご要望、お待ちしてます。Issue/PRもぜひ！

PyTorch

@PyTorch

Feb 25

New @DeepSpeedAI updates make large-scale multimodal training simpler and more memory-efficient. Our latest blog introduces a PyTorch-identical backward API that helps code multimodal training loops easy, plus low-precision model states (BF16/FP16) that can reduce peak memory by up to 40% when combined with torch.autocast. 🖇️ Read the full post for details: hubs.la/Q044yYVs0 #DeepSpeed #PyTorch #MemoryEfficiency #MultimodalTraining #OpenSourceAI

6,641

PyTorch

DeepSpeed (日本語アカウント) retweeted

PyTorch

@PyTorch

12 Dec 2025

Zhipeng (Jason) Wang, PhD (@PKUWZP) explains how @DeepSpeedAI supports ML training research and why joining PyTorch Foundation benefits researchers and developers working on AI training workloads. 🔗youtu.be/67719mlOSp0 #PyTorch #DeepSpeed #OpenSourceAI #AIInfrastructure

1:02

111

11,813

DeepSpeed

DeepSpeed (日本語アカウント) retweeted

DeepSpeed

@DeepSpeedAI

9 Oct 2025

UIUC, AnyScale, and Snowflake significantly enhanced LLM offloading for the Superchip era!

Minjia Zhang @_Minjia_Zhang_

9 Oct 2025

🚀 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips Superchips like the NVIDIA GH200 offer tightly coupled GPU-CPU architectures for AI workloads. But most existing offloading techniques were designed for traditional PCIe-based systems. Are we truly tapping into their full potential for LLM training? 🎯 SuperOffload is our answer to this challenge, a new DeepSpeed component rethinking offloading from the ground up, specially designed for LLM training on Superchips. ✨ SuperOffload is exact -- no approximation, no heuristics, and no changes to your training algorithm. Just faster, larger model with longer sequence training using the same code, which are made possible by system-level optimizations exploiting Superchip architecture. 🧪 SuperOffload allows you: - Finetune models like GPT-OSS-20B, Qwen3-14B, and Phi-4 on a single GH200 - Up to 4X faster speed than previous approaches like ZeRO-Offload - Effortlessly scales to: -- Qwen3-30B-A3B and Seed-OSS-36B on 2 x GH200s -- LLaMA2-70B on 4 x GH200s -- 1M sequence length on 8x GH200 with 55% MFU - Easy-to-use: Fully integrated and open-sourced in DeepSpeed. Just a few lines of code to enable! 📚 Read more through official PyTorch blog: pytorch.org/blog/superoffloa… 🧠 For more technical details, please read our technical report: arxiv.org/abs/2509.21271 🛠️ SuperOffload is fully open-sourced through DeepSpeed. Try it now: github.com/deepspeedai/DeepS… 📄 SuperOffload has been accepted to ASPLOS 2026! Kudos to Xinyu Lian (@Alexlian0806), Masahiro Tanaka (@toh_tana), and Olatunji Ruwase. 🎤 Featured at PyTorch Conference 2025 SuperOffload will be featured in the DeepSpeed & vLLM keynote at this year's PyTorch Conference in San Francisco. 🔥Come see how we're rethinking large-scale LLM training for the Superchip era: events.linuxfoundation.org/p…

2,715

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

10 Sep 2025

10/22-23 にサンフランシスコ開催の PyTorch Conference で、DeepSpeedチームからのキーノートスピーチが行われます。 PyTorch Conference にご参加の方は、ぜひご聴講ください。 events.linuxfoundation.org/p…

PyTorch Conference North America | LF Events

The premier event driving the future of open source AI - bringing together pioneers, researchers, and developers to shape what’s next.

events.linuxfoundation.org

DeepSpeed

@DeepSpeedAI

9 Sep 2025

Step into the future of AI at #PyTorchCon 2025, Oct 22–23 in San Francisco 🔥 Join the DeepSpeed keynote and technical talks. Register: events.linuxfoundation.org/p… Oct 21 co-located events: Measuring Intelligence, Open Agent & AI Infra Summits / Startup Showcase & PyTorch Training

1,228

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

10 Jul 2025

DeepSpeed の Universal Checkpointing に関する論文が、ソフトウェアシステム分野のトップカンファレンスである ATCで発表されました。

Minjia Zhang @_Minjia_Zhang_

10 Jul 2025

📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data, tensor, pipeline, and expert parallelism. Existing checkpointing systems struggle in this setting because they are tightly coupled to specific training strategies (e.g., ZeRO-style data parallelism or 3D model parallelism), which break down when the training configs need to dynamically reconfigure over time. This makes it difficult to have resilient and fault-tolerant training. UCP solves this by decoupling distributed checkpointing from parallelism strategies. Our design introduces a unified checkpoint abstraction -- atomic checkpoint, and a full pattern matching-based transformation pipeline, which enables scalable and low-overhead checkpointing with reconfigurable parallelism across arbitrary model sharding strategies. We show that UCP supports state-of-the-art models trained with hybrid 3D/4D parallelism (ZeRO, TP, PP, SP) while incurring less than 0.001% overhead of the total training time. UCP is fully open-sourced in DeepSpeed. It has been adopted by Microsoft, BigScience, UC Berkeley and others for large-scale model pre-training and fine-tuning, including Phi-3.5-MoE (42B), BLOOM (176B), and many more. It also has been selected for presentation at PyTorch Day 2025 and FMS 2025(the Future of Memory and Storage). Big thanks to the amazing collaborators from Microsoft and Snowflake: @samadejacobs , @LevKurilenko, @MasahiroTanaka, @StasBekman , and @TunjiRuwase. 🔗 Project: lnkd.in/gG6j4vJe 📄 Paper: lnkd.in/gUiC5kcR 💻 Code: lnkd.in/g6uS29nH 📚 Tutorial: lnkd.in/gi_zWSWh #ATC2025 #LLM #Checkpointing #SystemsForML #DeepLearning #DistributedTraining #UIUC #DeepSpeed

746

Minjia Zhang

DeepSpeed (日本語アカウント) retweeted

Minjia Zhang @_Minjia_Zhang_

10 Jul 2025

7,582

PyTorch

DeepSpeed (日本語アカウント) retweeted

PyTorch

@PyTorch

8 May 2025

PyTorch Day France marked the launch of a global PyTorch Day series—and the announcement of a major milestone: PyTorch Foundation is now an umbrella foundation. First new projects: @vllm_project @DeepSpeedAI. Next Stop: PyTorch Day China, June 7 🇨🇳 hubs.la/Q03lJvHh0 #PyTorch #OpenSourceAI #vLLM #DeepSpeed

11,695

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

7 May 2025

DeepSpeedプロジェクトのPyTorch Foundationへの参加が発表されました。幅広いステークホルダーとのオープンな連携を通じて、コミュニティに一層貢献していきます。公式アナウンス: pytorch.org/blog/pytorch-fou… pytorch.org/projects/deepspe…

PyTorch

@PyTorch

7 May 2025

PyTorch Foundation has expanded into an umbrella foundation. @vllm_project and @DeepSpeedAI have been accepted as hosted projects, advancing community-driven AI across the full lifecycle. Supporting quotes provided by the following members: @AMD, @Arm, @AWS, @Google, @Huawei, @huggingface, @IBM, @Intel, @LightningAI, @Meta, @NVIDIA, and @Snowflake. 🔗💡 Read the full announcement: hubs.la/Q03lmJNH0 #PyTorchFoundation #PyTorch #OpenSourceAI #vLLM #DeepSpeed

536

PyTorch

DeepSpeed (日本語アカウント) retweeted

PyTorch

@PyTorch

7 May 2025

228

70,585

Horace He

DeepSpeed (日本語アカウント) retweeted

Horace He

@cHHillee

17 Apr 2025

This is pretty neat. They insert into torch.compile and insert some profile-guided optimizations as well as a bunch of other specific optimizations like offloading. Since torch.compile is all in Python all their compiler passes are fairly accessible too! github.com/deepspeedai/DeepS…

DeepCompile for enhanced compiler integration by tohtana · Pull Request #7154 · deepspeedai/DeepS...

This PR introduces DeepCompile, a new feature that efficiently integrates compiler optimizations with other DeepSpeed features. DeepCompile utilizes torch's dynamo to capture the computatio...

github.com

DeepSpeed

@DeepSpeedAI

16 Apr 2025

Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading tinyurl.com/8cys28xk

223

22,275

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

16 Apr 2025

DeepSpeedの新機能 "DeepCompile" をリリースしました！ ✅プロファイルに基づく並列処理の自動最適化 ✅ ZeROやオフロードをコンパイラの最適化パスとして実現 ✅ ZeRO1 / ZeRO3 / オフロードの 1.2〜7倍の高速化を達成詳細は下記をご覧くださいブログ(英語): tinyurl.com/8cys28xk

DeepSpeed

@DeepSpeedAI

16 Apr 2025

19,926

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

1 Apr 2025

ありがとうございます、ぜひご活用ください！

hiroshi matsuda @hmtd223

1 Apr 2025

deepspeedでtensor parallelとzero optimizerを組み合わせられるようになったとのこと🎉 zeroだけだとノード数を増やして学習を加速したくてもper_device_micro_batch_size * gpu_per_node * num_nodes <= 1536の制約がネックになりやすかったのが、tp=8にできればノード数も理論上は8倍に増やせる。

822

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

1 Apr 2025

HuggingFaceモデルに自動でテンソル並列 (TP) を適用する機能がリリースされました！ - HuggingFaceモデルハブの大規模モデルをより大きいバッチサイズ・系列長で訓練可能に - Llama3のfine-tuningを4倍高速化 - ユーザによるコード変更が不要！ブログ(英語): tinyurl.com/5n8nfs2w

DeepSpeed

@DeepSpeedAI

1 Apr 2025

AutoTP ZeRO Training for HF Models - Enhance HF post-training with larger models, batches, & contexts - 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1 - No code changes needed Blog: tinyurl.com/5n8nfs2w

7,045

LF AI & Data Foundation

DeepSpeed (日本語アカウント) retweeted

LF AI & Data Foundation @LFAIDataFdn

3 Feb 2025

🚀 Excited to introduce DeepSpeed, a deep learning optimization library from @Microsoft! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. Learn more 👉 hubs.la/Q0351DJC0 #DeepSpeed #AI #OpenSource #LFAIData

10,806

Microsoft Research

DeepSpeed (日本語アカウント) retweeted

Microsoft Research

@MSFTResearch

6 Jan 2025

Microsoft Research congratulates Yasuyuki Matsushita on being named a 2025 IEEE Fellow for his outstanding contributions to photometric 3D modeling and computational photography. msft.it/6018oINMG

ALT A graphic with a blue and pink geometric background. The text on the left side reads: Yasuyuki Matsushita named IEEE Fellow.

10,923

DeepSpeed (日本語アカウント)

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

5 Dec 2024

限られたGPUリソースで、非常に長い系列を学習するための新機能 Ulysses-Offload をリリースしました！ - A100-80GB 4台だけで LLaMA3-8B を系列長2Mトークンで訓練可能 - 55%を超えるMFUを達成ブログ: shorturl.at/Spx6Y チュートリアル: shorturl.at/bAWu5

DeepSpeed/blogs/ulysses-offload/README.md at master · deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed

github.com

DeepSpeed

@DeepSpeedAI

5 Dec 2024

🚀Introducing Ulysses-Offload🚀 - Unlock the power of long context LLM training and finetuning with our latest system optimizations - Train LLaMA3-8B on 2M tokens context using 4xA100-80GB - Achieve over 55% MFU Blog: shorturl.at/Spx6Y Tutorial: shorturl.at/bAWu5

6,540

日本マイクロソフト株式会社

DeepSpeed (日本語アカウント) retweeted

日本マイクロソフト株式会社

@mskkpr

18 Nov 2024

【 Microsoft Research Asia - Tokyo を設立】アジア太平洋地域における人工知能研究とイノベーションの推進を強化するため、東京に新たな研究拠点である「Microsoft Research Asia-Tokyo（マイクロソフトリサーチアジア東京）」を設立したことを発表します。 msft.it/6018WqGNw

103

293

208,970