Main Page | See live article | Alphabetical index

Relational model

 Database Management Systems 
Model Type:
 Hierarchical model
 Network model
 Relational model
The relational model for database management is a data model based on predicate logic and set theory. Other models are the hierarchical model and network model, which predate the relational model in practice but not theory: they were defined a posteriori to describe existing databases after the relational model was defined. Some systems using these older architectures are still in use today in data centers with high data volume needs or where existing systems are so complex it would be cost prohibitive to migrate to systems employing the relational model; also of note are newer so-called object-oriented databases, even if many of them are rather DBMS-construction kits than DBMSs proper.

The relational model was invented by Ted Codd and subsequently maintained and developed by Chris Date and Hugh Darwen, as a general model of data. More recently they have published The Third Manifesto in which they show how the relational model can be extended with object oriented features without compromising its fundamental principles.

The standard language for relational databases, SQL, is only vaguely reminiscent of the mathematical model. Usually it is adopted, despite its restrictions, because it is far and away more popular than any other database language.

The fundamental assumption of the relational models is that all data is represented as mathematical relations, i.e., a subset of the Cartesian product of n sets. In the mathematical model (unlike SQL), reasoning about such data is done in two-valued predicate logic (that is, without a null value), meaning there are two possible evaluations for each proposition: either true or false. The data is operated upon by means of a relational calculus and algebra.

The relational data model permits the designer to create a consistent logical model of the information to be stored. This logical model can be refined through a process of database normalization. A database built on the pure relational model would be entirely normalized. The access plans and other implementation and operation details are handled by the DBMS engine, and should not be reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the logical model.

The basic relational building block is the domain, or data type. A tuple is a set of attributes, which are ordered pairs of domain and value. A relvar (relation variable) is an unordered set of ordered pairs of domain and name, which serves as the header for a relation. A relation is an unordered set of tuples. Although these relational concepts are mathematically defined, they correspond loosely to traditional database concepts. A relation is similar to the traditional concept of table. A tuple is similar to the concept of row.

The basic principle of the relational model is the Information Principle: all information is represented by data values. Thus, the relvars are not related to each other at design time: rather, designers use the same domain in several relvars, and if one attribute is dependent on another, this dependency is enforced through referential integrity.

Table of contents
1 Example Database
2 Example: Customer Relation
3 References
4 External Links

Example Database

An idealized, very simple example of a description of some relvars and their attributes:

Customer(Customer ID, Tax ID, Name, Address, City, State, Zip, Phone)

Order(Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)

Order Line(Order No, Order Line No, Product Code, Qty)

Invoice(Invoice No, Customer ID, Order No, Date, Status)

Invoice Line(Invoice No,Line No,Product Code, Qty Shipped)

In this design we have five relvars: Customer, Order, Order Line, Invoice, and Invoice Line. The bold, underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys.

Usually one candidate key is arbitrarily chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys.

A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. A key can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.

Example: Customer Relation

Customer ID     Tax ID              Name                 Address                 [More fields....]
== 1234567890 555-5512222 Jo Lee 323 Broadway ... 2223344556 555-5523232 Dorothy Red 1200 Main Street ... 3334445563 555-5533322 Linda de la Cruz 871 1st Street ... 4232342432 555-5325523 E. F. Codd 123 It Way ...

If we attempted to insert a new customer with the ID 1234567890, this would violate the design of the relvar since Customer ID is a primary key and we already have a customer 1234567890. The DBMS must reject a transaction such as this that would render the database inconsistent by a violation of an integrity constraint.

Foreign keys are integrity constraints enforcing that the value of the attribute set is drawn from a candidate key in another relation, for example in the Order relation the attribute Customer ID is a foreign key. A join is the operation that draws on information from several relations at once. By joining relvars from the example above we could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a restriction condition.

If we wanted to retrieve all of the Orders for Customer 1234567890, we could query the database to return every row in the Order table with Customer ID 1234567890 and join the Order table to the Order Line table based on Order No.

There is a flaw in our database design above. The Invoice relvar contains an Order No attribute. So, each tuple in the Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in reality an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words there can be many Invoices per Order and many Orders per Invoice. This is a many-to-many relationship between Order and Invoice (also called a non-specific relationship). To represent this relationship in the database a new relvar should be introduced whose role is to specify the correspondence between Orders and Invoices:

OrderInvoice(Order No,Invoice No)

Now, the Order relvar has a one-to-many relationship to the OrderInvoice table, as does the Customer relvar. If we want to retrieve every Invoice for a particular Order, we can query for all orders where Order No in the Order relation equals the Order No in OrderInvoice, and where Invoice No in OrderInvoice equals the Invoice No in Invoice.

Database normalization is usually performed when designing a relational database, to improve the logical consistency of the database design.

There are two commonly used systems of diagramming to aid in the visual representation of the relational model: the entity-relationship diagram (ERD), which is very deficient (but more popular), and the IDEF diagram used in the IDEF1X method created by the U.S. Air Force based on ERDs.


External Links