Chemistry International Blank Image
Chemistry International Text Image Link to Chemistry International Blank Image Chemistry International Blank Image Chemistry International Blank Image
Chemistry International Blank Image
Chemistry International Blank Image
Chemistry International Text Image Link to Current Issue
Chemistry International Text Image Link to Past Issues
Chemistry International Text Image Link to Officer's Columns
Chemistry International Text Image Link to Features
Chemistry International Blank Image
Chemistry International Text Image Link to Up for Discussion
Chemistry International Text Image Link to IUPAC Wire
Chemistry International Text Image Link to Project Place
Chemistry International Text Image Link to imPACt
Chemistry International Text Image Link to Bookworm
Chemistry International Text Image Link to Internet Connections
Chemistry International Text Image Link to Conference Call
Chemistry International Text Image Link to Where 2B and Y
Chemistry International Text Image Link to Symposia
Chemistry International Text Image Link to CI Indexes
Chemistry International Text Image Link to CI Editor
Chemistry International Text Image Link to Search Function
Chemistry International Text Image Link to Information

 

Chemistry International Text Image Link to Previous Issue Chemistry International Text Image Link to Previous Page Chemistry International Text Image Link to This TOC Chemistry International Text Image Link to Next Page Chemistry International Text Image Link to Next Issue

Vol. 31 No. 1
January-February 2009

The IUPAC International Chemical Identifier (InChI)

by Stephen R. Heller and Alan D. McNaught

The properties and behaviors of chemical substances are generally interpreted and discussed in terms of their molecular structures, and to convey structural information, chemists use diagrammatic representations supplemented by verbal descriptions. In order to have a means of specifying or describing a chemical structure in words, conventional chemical nomenclature was developed.

Systematic nomenclature provides an unambiguous description of a structure; a diagram of which can be reconstructed from its systematic name. However, there are other means of specifying molecular structures. Those based on “connection tables” (coded specifications of atomic connectivities) are more suitable than conventional nomenclature for processing by computer, as they are matrix representations of molecular graphs readily governed and handled by graph theory. In parallel with its continued development of conventional nomenclature, IUPAC has developed a structural identifier that can be readily interpreted by computers, or more precisely, by computer algorithms.

The IUPAC International Chemical Identifier (InChI) is a freely available, nonproprietary identifier for chemical substances that can be used in both printed and electronic data sources. It is generated from a computerized representation of a molecular structure diagram, produced by chemical structure-drawing software. Its use enables linking of diverse data compilations and unambiguous identification of chemical substances. A full description of the Identifier and software for its generation are available from the IUPAC website.1 In addition, an unofficial, but helpful compilation of answers to frequently asked questions has been compiled by Nick Day of the Unilever Centre for Molecular Science Informatics as part of his Ph.D. project on the Chemical Semantic Web.2 A full account of the InChI project is in preparation.3 Commercial structure-drawing software that generates the Identifier is available from several organizations, listed on the IUPAC website.1

The conversion of structural information to the Identifier is based on a set of IUPAC structure conventions, and rules for normalization and canonicalization (conversion to a single, predictable sequence) of an input structure representation. The resulting InChI is simply a series of characters that serve to uniquely identify the structure from which it was derived. This conversion of a graphical representation of a chemical substance into the unique InChI character string can be carried out automatically by any organization, and the facility can be built into any program dealing with chemical structures.

The InChI uses a layered format to represent all available structural information relevant to compound identity. InChI layers are listed below. Each layer in an InChI representation contains a specific type of structural information. These layers, automatically extracted from the input structure, are designed so that each successive layer adds additional detail to the Identifier. The specific layers generated depend on the level of structural detail available and whether or not allowance is made for tautomerism. Of course, any ambiguities or uncertainties in the original structure will remain in the InChI.

This layered structure design offers a number of advantages. If two structures for the same substance are drawn at different levels of detail, the one with the lower level of detail will, in effect, be contained within the other. Specifically, if one substance is drawn with stereo-bonds and the other without, the layers in the latter will be a subset of the former. The same will hold for compounds treated by one author as tautomers and by another as exact structures with all H-atoms fixed. This can work at a finer level. For example, if one author includes double bond and tetrahedral stereochemistry, but another omits stereochemistry, the latter InChI will be contained in the former.

The InChI layers are:

1. Formula
2. Connectivity (no formal bond orders)
a. disconnected metals
b. connected metals
3. Isotopes
4. Stereochemistry
a. double bond (Z/E)
b. tetrahedral (sp3)
5. Tautomers (on or off)

Charges are not part of the basic InChI, but rather are added at the end of the InChI string.

InChI=1/C5H5N5O/c6-5-9-3-2(4(11)10-5)7-1-8-3/h1H,(H4,6,7,8,9,10,11)/f/h8,10H,6H2

Two examples of InChI representations are given below. It is important to recognize, however, that InChI strings are intended for use by computers and end users need not understand any of their details. In fact, the open nature of InChI and its flexibility of representation, after implementation into software systems, may allow chemists to be even less concerned with the details of structure representation by computers.

The layers in the InChI string are separated by the ‘/’ character followed by a lowercase letter (except for the first layer, the chemical formula) with the layers arranged in predefined order. In the examples, the following segments are included:

InChI version number
/ chemical formula
/c connectivity-1.1 (excluding terminal H)
/h connectivity-1.2 (locations of terminal H, including mobile H attachment points)
/q charge
/p proton balance
/t sp3 (tetrahedral) parity
/m parity inverted to obtain relative stereo
(1 = inverted, 0 = not inverted)
/s stereo type (1 = absolute, 2 = relative, 3 = racemic)
/f chemical formula of the fixed-H structure if it is
different
/h connectivity-2 (locations of fixed mobile H)
/q charge
/t sp3 (tetrahedral) parity
/m parity inverted to obtain relative stereo
(1 = inverted, 0 = not inverted, . = inversion does not affect the parity)
/s stereo type (1 = absolute, 2 = relative, 3 = racemic)

InChI=1/C5H9NO4.Na/c6-3(5(9)10)1-2-4(7)8;/h3H,1-2,6H2,(H,7,8)(H,9,10); /q;+1/p-1/t3;/m1./s1/fC5H8NO4.Na/ h7H;/q-1;m

One of the most important applications of InChI is the facility to locate mention of a chemical substance using Internet-based search engines. This is made easier by using a shorter (compressed) form of InChI, known as InChIKey. The InChIKey is a 27-character representation that, because it is compressed, cannot be reconverted into the original structure, but it is not subject to the undesirable and unpredictable breaking of longer character strings by some search engines. The usefulness of the InChIKey as a search tool is enhanced by its derivation from a “standard” InChI, (i.e., an InChI produced with standard option settings for features such as tautomerism and stereochemistry). An example is shown below; the “standard” InChI is denoted by the letter “S” after the version number.

InChIKey also allows searches based solely on atomic connectivity (first 14 characters). Software for generating InChIKey is available from the IUPAC website.1

The enormous databases compiled by organizations such as PubChem,4
the U.S. National Cancer Institute, and ChemSpider5 contain millions of InChIs and InChIKeys, which allow sophisticated searching of these collections. PubChem provides InChI-based structure-search facilities (for both identical and similar structures),6 and ChemSpider offers both search facilities and web services enabling a variety of InChI and InChIKey conversions.7 The NCI Chemical Structure Lookup Service8 provides InChI-based search access to over 39 million chemical structures from over 80 different public and commercial data sources.

In the age of the computer, the IUPAC International Chemical Identifier is an essential component of the chemist’s armory of information tools, enabling location and manipulation of chemical data with unprecedented ease and precision.

References
1. www.iupac.org/inchi
2. wwmm.ch.cam.ac.uk/inchifaq/
3. Pure and Applied Chemistry, in preparation.
4. http://pubchem.ncbi.nlm.nih.gov
5. www.chemspider.com
6. http://pubchem.ncbi.nlm.nih.gov/search
7. www.chemspider.com/InChI.asmx
8. http://cholla.chemnavigator.com/cgi-bin/lookup/new/search


Alan McNaught <[email protected]>, retired from RSC, is one of InChI’s fathers; with a broad expertise in publication and nomenclature, he has been involved in IUPAC activities for many years (including ICTNS, CPEP, and Div VIII) and with InChI since day one. Steve Heller <[email protected]>, from NIST, is also a father of InChI, stimulating development and making the identifier known to the community.

www.iupac.org/inchi


Page last modified 6 January 2009.
Copyright © 2003-2009 International Union of Pure and Applied Chemistry.
Questions regarding the website, please contact [email protected]
   
Link to CI Home Page Link to IUPAC E-News Link to IUPAC Home Page