I have a neat idea for a language model. A bit like a Markov chain or HMM, but it's for sentences of a fixed size N. Then there are K clusters. For each cluster, you have some tree spanning the N word indices, so each word's distribution depends on a single parent.