i created a llama.cpp provider for vercel ai sdk that runs directly in the Node process (no separate server required, it uses llama.cpp bindings)
the llama.cpp provider supports reasoning, tool calling, image inputs, prompt caching (for a single conversation)