ByteDance's new open-source AI controls your entire computer. 100% local.
Most desktop agents today rely on cloud APIs.
Your screen, clicks, and files all leave the machine.
ByteDance just open-sourced TARS, a multimodal agent stack that runs fully on your computer.
It hit 36k stars on GitHub.
The system uses a vision-language model to see pixels directly. It then drives your mouse and keyboard through plain English instructions.
No screen reader. No accessibility hooks. Just raw visual grounding across any app on Windows, MacOS, or a browser.
Screenshots, recognition, and execution all stay local.
Two pieces ship together:
1. CLI plus Web UI frontend
2. Native desktop GUI operator
The kernel is built on MCP, plugging into real-world tools without custom glue. Hybrid browser control switches between visual targeting and DOM parsing mid-task.
One prompt can book flights, edit spreadsheets, or scrape dashboards offline.