To say that “Le chat is lying” would imply that it's a guard system like LLMGuard placed in front that would generate the response, detecting the content as sensitive. To say that the LLM is lying would mean that the tokens predicted at the output of the model are giving erroneous information, whether voluntarily (under the influence of the System Prompt) or not (in which case it's a hallucination).
I was able to extract a modified version of the System Prompt (the original prompt seems protected, but you can easily get around this by asking it in a similar direction)
These are just a few extracts, but you can reproduce the manipulation if you like. We can see here that the location information is indeed injected into System Prompt.
What's more, you can see that it's not explicitly asked to protect the location information, but the entire system prompt on numerous occasions, leading LLM to hide this part of the prompt.
However, there seems to be censorship in the chat as well. When I ask the model who gave him this information, he ends up saying he doesn't know it anymore.
However, if I ask it not to tell me explicitly who the user is, but instead to answer with a number, this time I get the right information!