Main Page | See live article | Alphabetical index

Simplified molecular input line entry specification

The simplified molecular input line entry specification (SMILES) is a specification for unambiguously describing the structure of chemicals using ASCII character strings. With a little bit of practice, these strings can be written, read, and understood directly; several molecular software packages can read or generate SMILES strings.

Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. Hydroxide anion is [OH-]. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O and that for ethanol is CCO. The double-bonded carbon dioxide is represented as O=C=O and the triple-bonded hydrogen cyanide as C#N. Cyclohexane is represented as C1CCCCC1, the idea being that the two ones label the same position in the molecule, thus forming a ring with six carbons. Branches are described with parentheses, as in CCC(=O)O for propionic acid and FC(F)F, or alternatively C(F)(F)F, for fluoroform.

The SMILES specification was developed by David Weininger in the late 1980s.

External links: