The view that MDL is an approximation to Bayesian model comparison is explained in
David MacKay's *Information Theory, Inference, and Learning Algorithms*. (see link below)
As Shannon showed, the optimal description length for data D, given assumptions H, is the `Shannon information content' log_2(1/P(D|H)).
And in Bayesian inference, the likelihood of the model H (also known as the evidence for the model) is P(D|H).
Thus an accurate implementatin of MDL should return precisely the evidence.

- On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; compelling arguments in favour of Bayesian methods; state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.