okie, so I added comparison btw baseline and multi agent system on same queries
same medgemma model, completely diff behaviour once wrapped in the system
raw-> 15/100 safety
SafeToSay -> 97/100, 25 fails eliminated, no emergency escalated
architecture makes the difference
bie 💗
most medical chatbots are optimised to answer EVERYTHING!
in healthcare, that's risky
so... i've been experimenting with SafeToSay - a boundary constrained multi agent structure with deterministic gates.
i felt that it is possible for safety to lie above a model, not inside 💗