Relational algebra
The
relational algebra is a set of operations that manipulate relations as they are defined in the
relational model and as such describes part of the data manipulation aspect of this
data model. Because of their algebraic properties these operations are often used in database query optimization as an intermediate representation of a query to which certain rewrite rules can be applied to obtain a more efficient version of the query.
The exact set of operations may differ per definition and also depends on whether the unlabeled relational model (that uses mathematical relations) or the labeled relational model (that used the labeled generalization of mathematical relations) is used. We will assume the labeled case here as this is the most common way the relational model is defined.
The six basic operations of the algebra are the selection, the projection, the cartesian product, the set union, the set difference, and the rename.
The selection is a unary operation that is written as σ_{a=b}(R) or σ_{a=v}(R) where a and b are attribute names, v is a value constant and R is a relation. The first selection selects all those tupels in R that have the same value in the a and the b attribute. The second selection all those tuples in R that have the value v in the a attribute. More formally:
- σ_{a=b}(R) = { t : t ∈ R, t(a) = t(b) }
- σ_{a=v}(R) = { t : t ∈ R, t(a) = v }
The result of the selection is only defined if the attribute names that it mentions are in the header of the relation that it operates upon.
- We need here (and also for all the other operations) an example.
The Projection
The projection is a unary operation that is written as π_{a1,..,an}(R) where a_{1},..,a_{n} is a set of attribute names. The result of such a projection is defined as the set that is obtained when all tuples in R are restricted to the set {a_{1},..,a_{n}}. More formally:
- π_{a1,..,an}(R) = { t|_{{a1,..,an}} : t ∈ R }
where f|_{A} is defined as the restriction of the function f to the set A, i.e., f|_{A} = { (x, y) | (x, y) ∈ f, x ∈ A }.
The cartesian product is a binary operation that is very similar to the Cartesian product in set theory. It is written as R × S where R and '\'S are relations. The result of the cartesion product is the set of all combinations of tupels in R and S''. More formally:
- R × S = { t ∪ s : t ∈ R, s ∈ S }
To ensure that the result of the cartesian product is again a relation it is required that the headers of
R and
S are disjoint, i.e., do not contain the same attribute.
The set union is a binary operation that is written as R ∪ S and is defined as the usual set union in set theory:
- R ∪ S = { t : t ∈ R ∨ t ∈ S }
The result of the set union is only defined when the two relations have the same headers.
The set difference is a binary operation that is written as R - S and is defined as the usual set difference in set theory:
- R - S = { t : t ∈ R, ¬ t ∈ S }
The result of the set difference is only defined when the two relations have the same headers.
The rename operation is a unary operation that is written as ρ_{a/b}(R) where a and b are attribute names and R is a relation. The result is identical to R except that the b field in all tupels is renamed to a a field. More formally:
- ρ_{a/b}(R) = { t[a/b] : t ∈ R }
where t[a/b] is defined as the tuple t with the b attribute renamed to a, i.e., t[a/b] = { (c, v) | (c, v) ∈ t, c ≠ b } ∪ { (a, t(b)) }.
The result of the rename is only defined when the attribute a did not appear already in the header of the operand.
These operations are enough to express all queries that can be expressed in tuple calculus and domain calculus which is essentially the same as first-order logic.
- insert a sketch of proof here
Other Operations expressible with the Basic Operations
Next to the six basic operations some other operations are also often included even though they can be expressed by combinations of the basic operations. This is because they either have interesting algebraic properties or because they can be implemented more efficiently than their simulations. These operations are the set intersection, the natural join and the division.
The set intersection is a binary operation that is written as R ∩ S and is defined as the usual set intersection in set theory:
- R ∩ S = { t : t ∈ R, t ∈ S }
The result of the set intersection is only defined when the two relations have the same headers. This operation can be simulated in the basic operations as follows:
- R ∩ S = R - (R - S)
The Natural Join
The natural join is a binary operation that is written as R |×| S where R and S are relations. The result of the cartesion product is the set of all combinations of tupels in R and S that are equal on their common attribute names. More formally:
- R |×| S = { t ∪ s : t ∈ R, s ∈ S, fun(t ∪ s) }
where fun(r) is a predicate that is true for a binary relation r iff
r is a functional binary relation. This operation can be regarded as a generalization of the previously defined cartesian product since it is the special case where
R and
S have no common attributes.
- someting it being as used as the most common operation to link related relations, i.e., relations with relationships between them, see also foreign key
The simulation of the natural join with the basic operations is as follows. Assume that
a_{1},...,a_{n} are the attribute names unique to
R,
b_{1},...,b_{m} are the attribute names common to
R and
S and
c_{1},...,c_{m} are the attribute unique to
S. Furthermore assume that the attribute names
d_{1},...,d_{m} are neither in
R nor in
S. In a first step we can now rename the common attribute names in
S:
- S' := ρ_{d1/b1}(...ρ_{dm/bm}( S)...)
Then we take the cartesion product and select the tuples that are to be joined:
- T := σ_{b1=d1}(...σ_{bm=dm}(R × S')...)
Finally we take a projection to get rid of the renamed attributes:
- U := π_{a1,...,an,b1,...,bm,c1,...,cm}(T)
The Division
The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R. More formally:
- R ÷ S = { t|_{{a1,...,an}} : ∀ s ∈ S ( (t|_{{a1,...,an}} ∪ s) ∈ R) }
where a_{1},...,a_{n} is the set of attribute names unique to R. It is usually required that the attribute names in the header of S are a subset of those of R because otherwise the result of the operation will always be empty.
The simulation of the division with the basic operations is as follows. We assume that a_{1},...,a_{n} are the attribute names unique to R and b_{1},...,b_{m} are the attribute names of S. In the first step we project R on its unique attribute names and construction all combinations with tuples in S:
- T := π_{a1,...,an}(R) × S
In the next step we subtract R from this relation:
- U := T - R
Note that in U we have the combinations that "should have been in R but weren't". So if we now take the projection on the attribute names unique to R then we have the restrictions of the tuples in R for which not all combinations with tuples in S were present in R:
- V := π_{a1,...,an}(U)
So what remains to be done is take the projection of R on its unique attribute names and subtract those in V:
- W := π_{a1,...,an}(R) - V
Generalized Operations
- Allows general propositional formulas and other comparison operators
The θ-join
- Allow other comparison operators
Operations for null values
- Discuss the outer joins here
Algebraic properties