Sequence alignment

Sequence alignment is concerned with the relationships between biological sequences (e.g. protein sequences or DNA sequences). Two major types exist: pairwise and multiple sequence alignments.

Table of contents

1 Pairwise alignment

1.1 Global Alignment
1.2 Local Alignment
1.3 Significance of Alignments

2 Multiple alignment
3 Algorithms
4 Needleman-Wunsch
5 Smith-Waterman
6 Software
7 SSearch
8 BLAST
9 Fasta
10 Clustal
11 See also

Pairwise alignment

Pairwise sequence alignment methods are concerned with finding the best-matching piecewise (local) or global alignments of protein / amino acid or dna / nucleic acid sequences.

Typically, the purpose of this is to find homologues (relatives) of a gene or gene-product in a database of known examples. This information is useful for answering a variety of biological questions. The most important application of pairwise alignment methods is for identifying sequences of unknown structure/function. Another important use of these techniques is in studies of molecular evolution.

The second question is purely statistical. A lot of work on the part of a lot of people has determined a few hard theoretical rules and many approximations. It is now generally accepted that the scores of alignments between random sequences follow the extreme value distribution. Pairwise alignment programs such as BLAST use simulation methods to estimate the parameters of this distribution for a particular parameter set (consisting of the query, database, substitution matrix and certain other parameters). Alignments can then be given a statistical significance value, allowing judgements on possible relationships between sequences to be inferred.

Multiple alignment

Multiple alignment is an extension of pairwise alignment to incorporate several sequences. Several methods for this exist, one of the most popular being the progressive alignment strategy as used by the CLUSTAL family of programs. Instead of searching a database, multiple alignment methods take a few sequences and find common regions between them all. This is typically used in cladistics as a method for building phylogenetic trees, as well as for creating sequence profiles which can be used to search sequence database for more distant relatives (the two most popular methods for remote-homologue detection, PSI-BLAST and Hidden Markov model (HMM) based methods both work on this principle).

Many strategies exist, however as yet no algorithm has been described which is guaranteed to find the best possible alignment.

Blast Server at the NCBI

Fasta

Pairwise local search. Superseded by BLAST.

Clustal

Progressive multiple alignment method. Comes in several varieties (ClustalW, ClustalX etc.)