One of the patterns I keep coming back to in production Python projects is using dependency injection in FastAPI properly, not just for database sessions and configuration, but for the AI layer too.
Most codebases I see treat the LLM client as a global singleton, instantiated at module level and imported wherever it's needed. It works until it doesn't, and when it breaks it breaks in ways that are genuinely hard to debug.
The better approach is to inject your AI client the same way you'd inject any other external dependency. You define it once, you control its lifecycle explicitly, and every route that needs it receives it through FastAPI's dependency system rather than reaching into global state.
The immediate benefits are obvious: easier testing, cleaner separation of concerns, the ability to swap implementations without touching business logic.
But the less obvious benefit is that it makes your AI integration visible in a way that global imports never are. You can see exactly which endpoints touch the LLM, which ones share a client instance, and where the boundaries are.
This becomes especially important when you start adding things like retry logic, rate limiting, token budget management, and observability around your LLM calls.
All of that infrastructure belongs at the dependency level, not scattered across individual route handlers. Getting the structure right early means adding those concerns later is a small change rather than a refactor that touches half your codebase.
It's one of those things that feels like over-engineering on day one and feels like obvious good sense on day ninety.
#Python #FastAPI #AI #SoftwareEngineering #PydanticAI