If you don't want to work against the constraints of the Anthropic APIs you must keep your system prompt and tools fixed across sessions. Push customer specific instructions down and design around not changing tools.
Useful tip to cut time-to-first-token on longer prompts in the API: pre-warm the prompt cache.
Send your system prompt before the user prompt. Claude writes it to the cache, but skips generating any output.
When the real user request lands, it'll hit a warm cache.