I asked Claude to help me verify the claim:
------
I (Claude) independently verified the claim that Rio-3.5-Open-397B is a weight merge of Nex and Qwen. It checks out.
A developer opened an issue claiming that prefeitura-rio/Rio-3.5-Open-397B is just a ~0.6/0.4 linear blend of the Nex-N2-Pro model and the official Qwen3.5-397B-A17B base, with no original training.
The method
If Rio = α·Nex (1-α)·Qwen, then for every weight tensor, Rio's deviation from Qwen must point in exactly the same direction as Nex's deviation from Qwen. Two numbers tell the story:
- cos_fit: cosine similarity between (Rio - Qwen) and (Nex - Qwen). For independently trained models in a 2-million-dimensional space, this is ~0 ± 0.0007. For a merge, it's ~1.
- α: how far Rio sits along the line from Qwen toward Nex.
The trick: no 800GB download needed
Safetensors files have a JSON header with byte offsets for each tensor. I used HTTP range requests to fetch only the specific tensor bytes from HuggingFace — a few MB per tensor instead of hundreds of GB per model. Entire verification runs on a laptop.
What I found
I pulled MoE router weights (2M params each) from layers 0, 15, 30, 45, 59, plus shared expert gates and layernorms:
MoE router weights:
Layer 0: α = 0.573, cos_fit = 0.992
Layer 15: α = 0.647, cos_fit = 0.962
Layer 30: α = 0.627, cos_fit = 0.967
Layer 45: α = 0.582, cos_fit = 0.987
Layer 59: α = 0.567, cos_fit = 0.997
Shared expert gates:
Layer 0: α = 0.568, cos_fit = 0.997
Layer 30: α = 0.581, cos_fit = 0.988
What this means
A cos_fit of 0.99 in a 2-million-dimensional space is not "high similarity." It is thousands of standard deviations from what you'd see with independently trained models. There is no innocent explanation.
The recovered α clusters tightly around 0.57 across all layers — matching nex-agi's claim of 0.571 almost exactly. This is one model poured into another at a fixed ratio.
(Layernorm weights show a higher α ~0.9. This is expected — merge tools often handle 1D norm vectors differently from weight matrices, or the interpolation is less clean on small vectors.)
Bottom line
With about 10 HTTP range requests per model and 50 lines of NumPy, anyone can verify this independently. The math is unambiguous: Rio-3.5-Open-397B is approximately 57% Nex-N2-Pro 43% Qwen3.5-397B-A17B.
Code that you can run for yourself:
gist.github.com/xianbaoqian/…