Relational algebra

The relational algebra is a set of operations that manipulate relations as they are defined in the relational model and as such describes part of the data manipulation aspect of this data model. Because of their algebraic properties these operations are often used in database query optimization as an intermediate representation of a query to which certain rewrite rules can be applied to obtain a more efficient version of the query.

The exact set of operations may differ per definition and also depends on whether the unlabeled relational model (that uses mathematical relations) or the labeled relational model (that used the labeled generalization of mathematical relations) is used. We will assume the labeled case here as this is the most common way the relational model is defined.

Table of contents

1 The Fundamental Operations

1.1 The Selection
1.2 The Projection
1.3 The Cartesian Product
1.4 The Set Union
1.5 The Set Difference
1.6 The Rename
1.7 The Expressive Power

2 Other Operations expressible with the Basic Operations

2.8 The Set Intersection
2.9 The Natural Join
2.10 The Division

3 Generalized Operations

3.11 The General Selection
3.12 The θ-join

4 Operations for null values
5 Algebraic properties
6 Use of Algebraic Properties for Query Optimization

The Fundamental Operations

The six basic operations of the algebra are the selection, the projection, the cartesian product, the set union, the set difference, and the rename.

The Selection

The selection is a unary operation that is written as σ_a=b(R) or σ_a=v(R) where a and b are attribute names, v is a value constant and R is a relation. The first selection selects all those tupels in R that have the same value in the a and the b attribute. The second selection all those tuples in R that have the value v in the a attribute. More formally:

σ_a=b(R) = { t : t ∈ R, t(a) = t(b) }

σ_a=v(R) = { t : t ∈ R, t(a) = v }

The result of the selection is only defined if the attribute names that it mentions are in the header of the relation that it operates upon.

We need here (and also for all the other operations) an example.

The Projection

The projection is a unary operation that is written as π_{a₁,..,a_n}(R) where a₁,..,a_n is a set of attribute names. The result of such a projection is defined as the set that is obtained when all tuples in R are restricted to the set {a₁,..,a_n}. More formally:

π_{a₁,..,a_n}(R) = { t|_{{a₁,..,a_n}} : t ∈ R }

where f|_A is defined as the restriction of the function f to the set A, i.e., f|_A = { (x, y) | (x, y) ∈ f, x ∈ A }.

The Cartesian Product

The cartesian product is a binary operation that is very similar to the Cartesian product in set theory. It is written as R × S where R and '\'S are relations. The result of the cartesion product is the set of all combinations of tupels in R and S''. More formally:

R × S = { t ∪ s : t ∈ R, s ∈ S }

To ensure that the result of the cartesian product is again a relation it is required that the headers of R and S are disjoint, i.e., do not contain the same attribute.

The Set Union

The set union is a binary operation that is written as R ∪ S and is defined as the usual set union in set theory:

R ∪ S = { t : t ∈ R ∨ t ∈ S }

The result of the set union is only defined when the two relations have the same headers.

The Set Difference

The set difference is a binary operation that is written as R - S and is defined as the usual set difference in set theory:

R - S = { t : t ∈ R, ¬ t ∈ S }

The result of the set difference is only defined when the two relations have the same headers.

The Rename

The rename operation is a unary operation that is written as ρ_a/b(R) where a and b are attribute names and R is a relation. The result is identical to R except that the b field in all tupels is renamed to a a field. More formally:

ρ_a/b(R) = { t[a/b] : t ∈ R }

where t[a/b] is defined as the tuple t with the b attribute renamed to a, i.e., t[a/b] = { (c, v) | (c, v) ∈ t, c ≠ b } ∪ { (a, t(b)) }. The result of the rename is only defined when the attribute a did not appear already in the header of the operand.

The Expressive Power

These operations are enough to express all queries that can be expressed in tuple calculus and domain calculus which is essentially the same as first-order logic.

insert a sketch of proof here

Other Operations expressible with the Basic Operations

Next to the six basic operations some other operations are also often included even though they can be expressed by combinations of the basic operations. This is because they either have interesting algebraic properties or because they can be implemented more efficiently than their simulations. These operations are the set intersection, the natural join and the division.

The Set Intersection

The set intersection is a binary operation that is written as R ∩ S and is defined as the usual set intersection in set theory:

R ∩ S = { t : t ∈ R, t ∈ S }

The result of the set intersection is only defined when the two relations have the same headers. This operation can be simulated in the basic operations as follows:

R ∩ S = R - (R - S)

The Natural Join

The natural join is a binary operation that is written as R |×| S where R and S are relations. The result of the cartesion product is the set of all combinations of tupels in R and S that are equal on their common attribute names. More formally:

R |×| S = { t ∪ s : t ∈ R, s ∈ S, fun(t ∪ s) }

where fun(r) is a predicate that is true for a binary relation r iff r is a functional binary relation. This operation can be regarded as a generalization of the previously defined cartesian product since it is the special case where R and S have no common attributes.

someting it being as used as the most common operation to link related relations, i.e., relations with relationships between them, see also foreign key

The simulation of the natural join with the basic operations is as follows. Assume that a₁,...,a_n are the attribute names unique to R, b₁,...,b_m are the attribute names common to R and S and c₁,...,c_m are the attribute unique to S. Furthermore assume that the attribute names d₁,...,d_m are neither in R nor in S. In a first step we can now rename the common attribute names in S:

S' := ρ_d₁/b₁(...ρ_{d_m/b_m}( S)...)

Then we take the cartesion product and select the tuples that are to be joined:

T := σ_b₁=d₁(...σ_{b_m=d_m}(R × S')...)

Finally we take a projection to get rid of the renamed attributes:

U := π_{a₁,...,a_n,b₁,...,b_m,c₁,...,c_m}(T)

The Division

The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R. More formally:

R ÷ S = { t|_{{a₁,...,a_n}} : ∀ s ∈ S ( (t|_{{a₁,...,a_n}} ∪ s) ∈ R) }

where a₁,...,a_n is the set of attribute names unique to R. It is usually required that the attribute names in the header of S are a subset of those of R because otherwise the result of the operation will always be empty.

The simulation of the division with the basic operations is as follows. We assume that a₁,...,a_n are the attribute names unique to R and b₁,...,b_m are the attribute names of S. In the first step we project R on its unique attribute names and construction all combinations with tuples in S:

T := π_{a₁,...,a_n}(R) × S

In the next step we subtract R from this relation:

U := T - R

Note that in U we have the combinations that "should have been in R but weren't". So if we now take the projection on the attribute names unique to R then we have the restrictions of the tuples in R for which not all combinations with tuples in S were present in R:

V := π_{a₁,...,a_n}(U)

So what remains to be done is take the projection of R on its unique attribute names and subtract those in V: