we just posted the data to resolve the level 3 deep funding market!
we got 15 open source repos sharing relative value between 685 of their dependencies
using prediction markets, we scaled their evaluation to 3,677 dependencies across 98 open source repos
how does this work? using leverage through whats called conditional prediction markets
traders put money across all repos & their dependencies, but their P&L is based on only the ones that did get evaluated.
so if they just trade on 2 markets, one of which is evaluated and one isnt, their leverage is 2x as their entire position is based on the one that did get evaluated
so by evaluating 685 dependencies out of 3,677, our conditional prediction market for value of dependencies in an open source repo traded at a leverage of 5.36
the link with all the data is posted in the tweet below, really hope we start to see more conditional prediction markets & not just the bread and butter binary ones that are the current staple of the space
ongoing markets for originality (how much stays with repo vs its dependencies) and relative value between the 98 repos still available to trade
Deep Funding Level III has closed.
$5,000 in prizes, and a leaderboard determined by how closely each model's predicted weights match the scores given by human maintainers.
One of the write-ups, from a participant (Rohith10) who built a model called FWDIS (Frequency-Weighted Dependency Importance Scoring), is worth pulling out as an example of what serious participation in this round actually looks like.
The interesting part is the hypothesis the participant decided to test.
The baseline approach to scoring dependencies treats each one as if its importance lives entirely inside the relationship between that dependency and its parent repo, which means you're scoring web3py's contribution to ethers, web3py's contribution to eth-brownie, and web3py's contribution to every other repo that uses it as if each were a separate question.
Rohith10 noticed that this misses something obvious, a dependency that 20 repos rely on is probably more foundational than one that a single repo uses, and that observation can be turned into a feature the model uses to adjust its weights.
The implementation is three lines of code.
Count how many of the 98 repos use each dependency, normalize by 98, multiply the baseline weight by one plus a tuning coefficient times that frequency score, and renormalize so each repo's dependency weights still sum to 1.
The participant found through grid search that a coefficient of 0.42 produced the best alignment with jury scores, and the model landed in the top tier of the leaderboard with a score of 0.2402 against the baseline's 0.2472.
The reason this is worth sharing is because this participant's process is exactly what the mechanism is built to reward.
They formed a hypothesis about what human jurors actually value when they score dependencies (foundational utility across the ecosystem, not just contribution to one parent), turned that hypothesis into a feature, tested it against the data, and shipped a model that aligned with expert judgment better than the baseline did.
This is what distilled human judgment looks like when it works. Markets reward whoever predicts the jury most accurately, and the path to predicting the jury accurately runs through understanding what experts actually care about.
The contest is open for Level 1 and Level 2 until May 10.