This is what AI deployment is starting to look like
Not a notebook
Not a Hugging Face model card
An actual deployable artifact!
Qwen3 → Core AI → Xcode
The model gets quantized, packaged with its tokenizer, converted into an executable graph, specialized for the target Apple device, and then cached for future runs.
The more I learn about Core AI, the more it feels like Apple is treating AI models the same way traditional software is treated:
Build → Compile → Optimize → Deploy
That's a fascinating shift
The .aimodel might end up becoming as important to AI apps as .app bundles are to traditional software đź‘€
#wwdc26
Spent some time digging into Apple's new Core AI framework, and I think it's much more important than it initially appears
At first glance, it looks like "just another inference framework." It's not !!
Core AI is essentially Apple's production stack for running AI models on Apple Silicon, and it's the same framework powering Apple Intelligence
A few things stood out:
- Models can be developed in PyTorch and converted directly into Core AI models
- Inference automatically utilizes CPU, GPU, and Neural Engine without developers manually orchestrating hardware
- Core AI introduces first-class support for model states, which is particularly interesting for transformer workloads
Another interesting concept is model specialization. The model you ship isn't the exact model that ultimately runs on the user's device
Core AI ships a portable representation, then specializes it for the specific hardware and OS version. The specialized artifact is cached, which explains why the first load may take time but subsequent loads are much faster
Apple also exposes APIs to:
- Trigger specialization ahead of time
- Inspect specialization caches
- Share caches across app groups
- Manage cache persistence
This feels very similar to how serious inference systems think about compilation and deployment rather than just model execution
The other thing I found interesting is how Core AI fits into Apple's broader AI stack
MLX helps you train, fine-tune, and experiment with models
Core AI helps you deploy, optimize, specialize, debug, and run them efficiently on Apple devices
That's a much cleaner separation than I initially understood
The biggest takeaway for me is that Apple seems less focused on winning the "best model" race and more focused on building the infrastructure required to run AI locally at scale
And after looking at features like state management, specialization, ahead-of-time compilation, caching, dedicated profiling tools, and Metal-backed custom kernels, it's clear they're investing heavily in that layer