ray

ray

9 Photos and videos

Tweets

Philipp Moritz retweeted

ray

@raydistributed

Jun 4

Congratulations to the Microsoft AI team on MAI-Thinking-1! Exciting to see Ray used in multiple parts of frontier-model development. - Fast pre-training recovery via in-job restarts with hot standbys - Async RL orchestration (managing learners, inference servers, rollout workers, and routers, each with distinct placement and fault-tolerance needs) - A two-pool Ray cluster for building and grading SWE environments on 30K CPU cores

Microsoft AI

@MicrosoftAI

Jun 3

MAI-Thinking-1 is our first in-house reasoning model developed from scratch that is competitive with models of similar size on STEM reasoning and coding tasks. 35B active/1T total MOE. 💻Coding: 52.8% on SWE Bench Pro competitive with Opus 4.6 🧐 Reasoning: 97% on AIME 25 🤝Preferred to Sonnet 4.6 on blind side-by-side tests

ALT Table comparing MAI‑Thinking‑1 with other models on STEM and coding benchmarks, showing performance scores across multiple tests.

15,257

Philipp Moritz

Philipp Moritz

@pcmoritz

May 27

RT @charlie_ruan: Excited to have supported @trajectorylabs with the SkyRL team over the past month, bringing training onto their own clust…

221

Nithin Chalapathi

Philipp Moritz retweeted

Nithin Chalapathi

@nithinch10

Apr 24

SkyRL now supports end-to-end vision-language post-training, from SFT to agentic RL, and adds vision model support to SkyRL’s Tinker interface! Existing multimodal cookbooks, e.g. VLM classification, work out of the box:

4,242

Philipp Moritz

Philipp Moritz

@pcmoritz

Mar 3

We just merged a clean Qwen 3.5 implementation for SkyRL's Jax backend: github.com/NovaSky-AI/SkyRL/… Currently only for dense models, but should be easy to adapt to MoE models, contributions welcome! Also if anybody wants to contribute chunkwise training for the gated delta net or layer stacking for the model, it would be welcome!

[tx] Implement Qwen 3.5 model architecture by pcmoritz · Pull Request #1228 · NovaSky-AI/SkyRL

This PR implements the Qwen 3.5 model architecture, supporting mixed linear and full attention layers. For now we don't stack the layers yet to keep it simple. This PR also doesn't ...

github.com

3,931

Jeffrey Wang

Philipp Moritz retweeted

Jeffrey Wang

@jeffreyycwang

Feb 20

We just published how Ray Data LLM unlocks up to 2x higher throughput vs plain vLLM offline inference by fixing orchestration bottlenecks. Offline batch inference is critical for synthetic data, evals, indexing – but vLLM alone doesn’t fully scale. We compare: • Plain vLLM • Ray Data vLLM offline engine • Ray Data LLM 🧵

187

13,656

Charlie Ruan

Philipp Moritz retweeted

Charlie Ruan

@charlie_ruan

Feb 17

Releasing the official SkyRL Harbor integration: a standardized way to train terminal-use agents with RL. From the creators of Terminal-Bench, Harbor is a widely adopted framework for evaluating terminal-use agents on any task expressible as a Dockerfile instruction test script. This integration extends it: the same tasks you evaluate on, you can now RL-train on. Blog: novasky-ai.notion.site/skyrl… 🧵

244

34,552

Tinker

Philipp Moritz retweeted

Tinker

@tinkerapi

Feb 13

You can now access SkyRL's backends for distributed training and inference with Tinker scripts so you can take advantage of Tinker's separation of infrastructure from training on your own hardware. Exciting work from Tyler, @pcmoritz, and the team!

Tyler Griggs @tyler_griggs_

Feb 13

SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl… 🧵

137

10,571

Tyler Griggs

Philipp Moritz retweeted

Tyler Griggs @tyler_griggs_

Feb 13

234

57,962

Philipp Moritz

Philipp Moritz

@pcmoritz

Feb 13

We just released the full SkyRL Tinker integration: novasky-ai.notion.site/skyrl… This is the evolution of what we have been working on with SkyRL tx and now also supports the SkyRL train backend (fsdp and megatron) which could previously only be used through Python APIs. I'm excited about the standardization the Tinker API will bring to the ecosystem and hopefully having a great open-source implementation will accelerate the adoption!

2,129

Philipp Moritz

Philipp Moritz

@pcmoritz

Feb 9

We are excited to announce the release of SkyRL tx 0.3 novasky-ai.notion.site/skyrl…, our MultiLoRA native inference and training engine that exposes the Tinker API. A lot has happened since the last release, in terms of big features we implemented expert parallelism, the DeepseekV3 model architecture (e.g. GLM 4.7 Flash) and a number of features to support longer context! Also, there are lots of smaller improvements and a few small bug fixes.

11,542

Philipp Moritz

Philipp Moritz

@pcmoritz

Feb 5

Thanks a lot for the call-out @tinkerapi For anybody who is interested, we have made a bunch more releases since 0.1.0 came out, they are listed in github.com/NovaSky-AI/SkyRL/… (and 0.3.0 with expert parallelism, a bunch of long context optimizations, and DeepSeekV3 is coming very soon).

Tinker

@tinkerapi

Feb 4

Replying to @tinkerapi

SkyRL-tx by @BerkeleySky is an open-source backend that implements the Tinker API itself, letting users train on their own hardware. It supports end-to-end RL, faster sampling, and gradient checkpointing — giving users flexibility and control. novasky-ai.notion.site/skyrl…

1,058

Philipp Moritz

Philipp Moritz

@pcmoritz

Jan 11

If anybody is looking for a fun weekend project and is interested in kernels and how ragged_dot can be used to implement MultiLoRA for training and inference as well as MoE models, check out github.com/NovaSky-AI/SkyRL/…. Contributions very welcome, happy to discuss more in the issue!

[tx] Implement efficient kernel for ragged_dot that supports expert parallelism · Issue #862 ·...

We now have support for expert parallelism with #842. Currently this is implemented by running ragged_dot on the subset of local experts. In order to implement this efficiently with JAX JIT (i.e. s...

github.com

2,600

Tyler Griggs

Philipp Moritz retweeted

Tyler Griggs @tyler_griggs_

Jan 2

We just pushed one of the biggest updates yet SkyRL tx (an OSS Tinker backend) including FSDP and multi-node support, custom loss functions, and Llama 3. We also ran some comparisons to @thinkymachines Tinker service to validate tx. Check it out! novasky-ai.notion.site/skyrl…

SkyRL tx v0.0.3 Release

Philipp Moritz, Tyler Griggs, and the SkyRL Team

novasky-ai.notion.site

2,729

Philipp Moritz

Philipp Moritz

@pcmoritz

Jan 2

Happy new year! We are excited to announce SkyRL tx 0.2.1, see novasky-ai.notion.site/skyrl…. Some highlights of the release include FSDP and multi-node support, Llama 3 model support, custom loss functions, a number of performance improvements and also lots of small fixes that implement more functionality of the Tinker API. The blog post also includes a performance comparison with the Tinker Service! Enjoy the release and happy hacking!

11,226

Philipp Moritz

Philipp Moritz

@pcmoritz

9 Dec 2025

We are happy to announce SkyRL tx 0.2, see our blog post novasky-ai.notion.site/skyrl…. It comes with lots of performance improvements, all parts of the execution now use jax jit, so there is very little overhead. Now is probably the best time to try it out if you haven't already 🧸

12,677

NovaSky

Philipp Moritz retweeted

NovaSky

@NovaSkyAI

8 Dec 2025

We recently released SkyRL-Train v0.3.0! Highlights include: - Experimental support for Pipeline-RL style Async-RL - Updated E2E Recipes page with Math, Search, SQL runs - Migration from mbridge -> Megatron-Bridge - 14 new OSS contributors! (1/n) 🧵

3,129

Tyler Griggs

Philipp Moritz retweeted

Tyler Griggs @tyler_griggs_

3 Nov 2025

SkyRL tx is now bumped to v0.1, which adds support for running @thinkymachines Tinker Cookbook RL loops unmodified out of the box! We'll be talking more about tx at Ray Summit tomorrow at 4pm at the Ray Summit, please join if you're around novasky-ai.notion.site/skyrl…

5,363

Philipp Moritz

Philipp Moritz

@pcmoritz

3 Nov 2025

We are happy to release SkyRL tx 0.1 novasky-ai.notion.site/skyrl…, an open source unified training and inference engine that supports the Tinker API. This release has many performance enhancements and also new features but most importantly RL training is now working end-to-end. If you are interested in the project and are coming to #RaySummit, we are giving a talk about SkyRL tx tomorrow (Nov 4) at 4pm, come join us!

10,263

Robert Nishihara

Philipp Moritz retweeted

Robert Nishihara

@robertnishihara

29 Oct 2025

Cursor just released a frontier coding model with 4x faster generation. They will be speaking at Ray Summit about their journey building a frontier coding model. - Training on 1000s of GPUs - Scaling 100,000s of sandboxed coding environments - Custom training infrastructure with PyTorch and Ray - Custom MoE kernels, expert parallelism, hybrid sharded data parallelism

165

24,877