if you try to run qwen 3.5 27B with OpenCode it will crash on the first message.
OpenCode sends a "developer" role. qwen's template only accepts 4 roles: system, user, assistant, tool.
anything else hits raise_exception('Unexpected message role.') and your server returns 500s in a loop.
unsloth's latest GGUFs still ship with the same template. the bug is in the jinja, not the weights. no quant update will fix it.
the common fix floating around is --chat-template chatml. it stops the crash. it also silently kills thinking mode. your server logs will show thinking = 0 instead of thinking = 1. no think blocks. no chain of thought. you're running a reasoning model without reasoning and the server won't tell you.
the real fix: patch the jinja template to handle developer role preserve thinking mode.
add this to the role handling block:
elif role == "developer" -> map to system at position 0, user elsewhere
else -> fallback to user instead of raise_exception
full command with the fix:
llama-server -m Qwen3.5-27B-Q4_K_M.gguf -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --chat-template-file qwen3.5_chat_template.jinja
thinking = 1 confirmed. full think blocks. no crashes. that's what's running in the video in the thread below.
if you've been using chatml as a workaround, check your server logs for thinking = 0. you might be running half a model.
next post: the Jinja thinking mode fix that makes all of this work in OpenCode. without it, multi turn crashes and thinking tokens get stripped. 5 minute fix, saves hours of debugging.
after that: full 3 way comparison article. MoE vs Dense vs Hermes. every data point, every config, every failure mode. same prompt, same GPU, three architectures, one conclusion.