I don't trust Qwen models, even the large ones. I trust small ones even less.
Qwen has been benchmaxing its models since the very beginning, so the benchmark performance of a 4B parameter model beating Gemini 2.5 Flash Lite is most likely just the benchmark performance only.
4B is just not enough to store any general intelligence because most of the parameters will still be used to store facts. Well, not facts, but something that remains of them.