Main Page | See live article | Alphabetical index

Abstraction (computer science)

In computer science, abstraction is the process of combining multiple smaller operations into a single unit that can be referred to by name. It is a technique to factor out details and ease use of code and data. It is by analogy with abstraction in mathematics. The mathematical technique of abstraction begins with mathematical definitions; this has the fortunate effect of finessing some of the vexing philosophical issues of abstraction.

Abstraction allows programmers to think simply about a problem, by deferring unimportant detail for later, while still allowing thought about more important goals, in stages of thinking, not all-at-once. For example, in both computing and in mathematics, numbers are concepts in the programming languages, as founded in mathematics. Implementation details depend on the hardware and software, but this is not a restriction because the computing concept of number is still based on the mathematical concept.

The concept of abstraction is itself a declarative statement in programming languages such as C++ or Java, using the keywords virtual or abstract, respectively. After such a declaration, it is the responsibility of the programmer to implement a class to instantiate the object of the declaration. Or, if the specification language is UML, for example, the abstract classes are simply left abstract during the architecture and specification phase of the project.

Abstraction can be either that of control or data. Roughly speaking, control abstraction is the abstraction of actions while data abstraction is that of data strcutures. Control abstraction, seen in structured programming, is use of subprograms and control flows. Data abstraction is primary motivation of introducing datatype and subsequently abstract data types.

Object-oriented programming can be seen as an attempt to abstract both data and code.

Table of contents
1 Control abstraction
2 Data abstraction
3 Abstraction in object oriented programming
4 Further reading
5 See also

Control abstraction

Control abstraction is one of main purposes of use of programming languages. Computer machines understand operations at the very low level such as moving some bits from one location of the memory to another location and producing the sum of two sequences of bits. Programming languages allow this to be done in the higher level. For example,

a = (1 + 2) * 5

Structured programming

Structured programming involves the splitting of complex program tasks into smaller pieces with clear flow control and interfaces between components, with reduction of the complexity potential for side-effects.

In a simple program, this may be trying to ensure that loops have single or obvious exit points and trying, where it's most clear to do so, to have single exit points from functions and procedures.

In a larger system, it may involve breaking down complex tasks into many different modules. Consider a system handling payroll on ships and at shore offices:

These layers produce the effect of isolating the implementation details of one component and its assorted internal methods from the others. This concept was embraced and extended in object-oriented programming.

Data abstraction

Data abstraction is the enforcement of a clear separation between the abstract properties of a data type and the concrete details of its implementation. The abstract properties are those that are visible to client code that makes use of the data type--the interface to the data type--while the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behaviour.

For example, one could define an abstract data type called lookup table, where keys are uniquely associated with values, and values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree, or even a simple linear list (which is actually quite efficient for small data sets). As far as client code is concerned, the abstract properties of the type are the same in each case.

Of course, this all relies on getting the details of the interface right in the first place, since any changes there can have major impacts on client code. Another way to look at this is that the interface forms a contract on agreed behaviour between the data type and client code; anything not spelled out in the contract is subject to change without notice.

Languages that implement data abstraction include Ada and Modula-2. Object-oriented languages are commonly claimed to offer data abstraction; however, their inheritance concept tends to put information in the interface that more properly belongs in the implementation; thus, changes to such information ends up impacting client code, leading directly to the fragile base class problem.

Abstraction in object oriented programming

In object-oriented programming, control and data of programs are abstracted into entities called objects. Each object comprise data as [[state (computer science) and actions performed on that data. Operations over objects are abstracted in its interface specifying what kind of operations are available and accessible depending on context.

The simplest form of it extends the concept of data type from earlier programming languages to associate behavior more strongly with the data. This is done to achieve encapsulation and a limited degree of polymorphism. These terms are very often used in contradicatory ways by users of various object-oriented progamming languages, which offer similar facilities for abstraction.

For example, Linda abstracts the concepts of server and shared data-space to facilitate distributed programming. In CLOS or self, for example, there is less of a class-instance distinction, more use of delegation for polymorphism, and individual objects and functions are abstracted more flexibly to better fit with a shared functional heritage from Lisp. In Java, abstraction takes place at the level of extended data types. Such types are called a classes, and objects are instances of some class.

For example, here is a sample Java fragment to represent some common farm "animals" to a level of abstraction suitable to model simple aspects of their hunger and feeding. It defines an Animal class to represent both the state of the animal and its functions:

 class Animal extends LivingThing {
   Location m_loc;
   double m_energy_reserves;
   
   boolean is_hungry() {
     if (m_energy_reserves < 2.5) { return true; }
     else { return false; }
   }
   void eat(Food f) {
     // Consume food
     m_energy_reserves += f.getCalories();
   }
   void moveto(Location l) {
     // Move to new location
     m_loc = l;
   }
 }

With the above definition, one could create objects of type Animal and call their methods like this:

 thePig = new Animal();
 theCow = new Animal();
 if (thePig.is_hungry()) { thePig.eat(table_scraps); }
 if (theCow.is_hungry()) { theCow.eat(grass); }
 theCow.move(theBarn);

In the above example, the class animal is an abstraction used in place of an actual animal, LivingThing is a further abstraction (in this case a generalisation) of animal.

Further reading

See also

The opposite abstraction is concretisation.

This article (or an earlier version of it) contains material from FOLDOC, used with permission. Modify if needed.