Finally got the first version of ext-infer polished and released. It's still very much a v0.x but this is a fully operational demonstration of native PHP loading and executing pre-trained models.
Use it for LLM chats, embeddings, you name it. Local inference, native PHP.
ALT Live demo of ext-infer in PHP generating content from a pre-trained Qwen3 model and a user prompt.