A Bayesian network is a representation of the joint distribution over all the variables represented by nodes in the graph. Let the variables be *X(1)*, ..., *X(n)*. Let *parents(A)* be the parents of the node *A*. Then the joint distribution for *X(1)* through *X(n)* is represented as the product of the probability distributions *p(X(i)* | *parents(X(i)))* for *i* from 1 to *n*. If *X* has no parents, its probability distribution is said to be **unconditional**, otherwise it is **conditional**.

Questions about dependence among variables can be answered by studying the graph alone. It can be shown that the graphical notion called d-separation corresponds to the notion of **conditional independence**: if nodes *X* and *Y* are d-separated (given specified evidence nodes), then variables *X* and *Y* are independent given the evidence variables.

In order to carry out numerical calculations, it is necessary to further specify for each node *X* the probability distribution for *X* conditional on its parents. The distribution of *X* given its parents may have any form. However, it is common to work with discrete or Gaussian distributions, since that simplifies calculations.

The goal of inference is typically to find the conditional distribution of a subset of the variables, conditional on known values for some other subset (the **evidence**), and integrating over any other variables. Thus a Bayesian network can be considered a mechanism for automatically constructing extensions of Bayes' theorem to more complex problems.

Bayesian networks are used for modelling knowledge in medicine, engineering, text analysis, image processing, and decision support systems.

**See also:** Machine learning