This is contradicts mainstream portfolio theory which is rooted in mean-variance optimisation and Sharpe ratio type stuff.
Instead you optimise for your actual intended end state, optimal compounded return growth aka optimal log growth and use information theory maths to assess how well your a specific strategy “compresses information surprise” vs a baseline you’re comparing against.
Compression in information surprise is effectively a measure of prediction improvement aka increased edge in your strategy (if any).
Sharpe and mean-variance doesn’t handle practical realities of trading/investing well:
- distribution skews
- fat-tails
- path-dependency as a property of your strategy, not “overfitting” !
- regime-sensitivity of your strategy
So with mean variance maths you end up adding crude “patches” and dirty heuristics to deal with these realities.
If you apply information theory you have a fundamental first principles approach to modelling predictive power.
Credits Kelly 1956; Claude Shannon; KL divergence dudes; Thomas M Cover; Stiffelman (2026, arXiv)
See screenshot. Trading portfolio optimisation is about COMPOUNDED return growth not arithmetic return growth, for obvious reasons. If you lose all your chips, you can’t bet.
Log(0) = negative infinity.
So that’s why we use log return growth in Kelly sizing not arithmetic.
Typical expectancy formula EV=p(win) (1-p)*loss becomes a naive abstraction. You only realise it’s naïveté when you confront risk of ruin modelling.
Optimal trading portfolio allocation, ie allocation of position weights across a number of assets is given by
G(W) = term 1 - term 2 - term 3
Term (1)
the payoffs of the optimal set of return paths.
Think of it as
Optimal market “opportunity”/forward return distribution (it is never known or knowable in principle, but if only you could know it…
Term (2)
2nd term is the natural irreducible uncertainty of the optimal market opportunity distribution. (Hint: information entropy)
Term (3)
is how much your actual allocation differs from the optimal. Ie let’s say it the true optimal allocation is (20% to stock A, 80% to stock B).
And you are allocated 50,50. You can measure the “information loss” or edge loss from your allocation vs the optimum. (Hint: KL divergence).
Now, W* in H(W*) is the collection (vector) of position allocations that would maximize expected your COMPOUNDED account growth if you knew the true joint market’s return distribution.
That is the theoretical optimum. In reality you can never know W*.
That’s why an achievable PRACTICAL optimum is: how can we get as close as I can to W* without knowing the true distribution?
Crucially, in the equation above, which btw isn’t just made up but is deterministically derivable from Kelly Cover’s universal portfolio theory,
Your allocation / trading decisions only ever affect term 3. So optimisation problem reduces to optimising the KL divergence between your allocation to the true optimal allocation.
Re-formulation of portfolio return optimisation
In this way is breaking new ground, because it gives you an ordering principle/heuristic to use in your backtesting:
You can now evaluate the quality of your (backtested or realised) return distribution and the quality of your allocation in the same unit: log-growth / information bits.
Each factor/feature of the strategy has a “usefulness” score as given by its KL divergence.
You can use these scores to optimise sizing.
This enables you to calculate m a “practical optimum” :
What is the best position size to use on a setup that has a known historical return distribution GIVEN you know certain features of a the setup in advance.