The key data-dependent term, P(D|H), is sometimes called the evidence for model H, and evaluating it correctly is the key to Bayesian model comparison. The evidence is usually the normalizing constant or partition function of another inference, namely the inference of the parameters of model H given the data D.

- Richard O. Duda, Peter E. Hart, David G. Stork (2000) ''Pattern classification'\' (2nd edition), Section 9.6.5, p. 487-489, Wiley, ISBN 0471056693
- Chapter 24 in Probability Theory - The logic of science by E. T. Jaynes, 1994.
- David J.C. MacKay (2003) Information theory, inference and learning algorithms, CUP, ISBN 0521642981, (also available online)

- The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay, has many chapters on Bayesian methods, including introductory examples; compelling arguments in favour of Bayesian methods; state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.