Apple finally did it.
Its new framework, Core AI, runs models entirely on Apple silicon, so inference happens on the user's device with zero server calls and zero token bills.
That means Qwen, Mistral, and SAM3 running natively across iPhone, iPad, Mac, and Vision Pro.
It's a memory-safe Swift API that compiles models ahead of time for near-instant load. Pulling one in takes a few lines:
let segmenter = try await ImageSegmenter(resourcesAt: sam3ModelURL)
let response = try await segmenter.segment(image: inputImage, prompt: "flower")
The launch goes beyond the runtime, though.
It ships curated open models packaged for Swift, PyTorch extensions to convert your own, and an optimizer that shrinks models layer by layer with minimal accuracy loss.
There's also a macOS debugger that profiles performance and traces behavior back to your original Python, plus Xcode tools to validate models before they ship.
For any team that wanted real on-device AI without a cloud bill attached to every user, this is the answer.
Models repo:
github.com/apple/coreai-mode…