This paper turns Linux kernel tuning into a reinforcement learning task, training an agent to choose better configurations.
Up to 5.6% system speedup over expert heuristics.
Kernel tuning means flipping thousands of options in the Linux build that tweak scheduling, memory, files, and networking.
They shrink the option space into grouped decision units and assemble a 3K dataset where every case compiles and boots.
The agent learns in 2 phases, first to follow a strict answer format and use a knowledge tool, then to chase real speed.
Rewards cover 3 things, format discipline, valid configuration edits by type, and measured performance gain judged by an evaluator model.
Training uses group relative policy optimization, which samples several candidate changes per step and moves policy weights toward higher scoring ones.
On UnixBench, the tuned configs lift overall system score and often beat strong baselines, including larger models without this procedure.
Across Nginx, Redis, and PostgreSQL, it brings consistent throughput and latency wins using the same trained policy.
Bottom line, OS-R1 turns plain goals like reduce latency into concrete, safe kernel switches that generalize across workloads.
----
Paper – arxiv. org/abs/2508.12551
Paper Title: "OS-R1: Agentic Operating System Kernel Tuning with Reinforcement Learning"