Depiction of the adenine-thymine Watson-Crick base pair.

Base pairs (unit: bp), which form between specific nucleobases (also termed nitrogenous bases), are the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, Watson-Crick base pairs (guanine-cytosine and adenine-thymine) allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence.[1] The complementary nature of this based-paired structure provides a backup copy of all genetic information encoded within double-stranded DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA, and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base pairing patterns that identify particular regulatory regions of genes.

Intramolecular base pairs can occur within single-stranded nucleic acids. This is particularly important in RNA molecules (e.g., transfer RNA), where Watson-Crick base pairs (G-C and A-U) permit the formation of short double-stranded helices, and a wide variety of non-Watson-Crick interactions (e.g., G-U or A-A) allow RNAs to fold into a vast range of specific three-dimensional structures. In addition, base-pairing between transfer RNA (tRNA) and messenger RNA (mRNA) forms the basis for the molecular recognition events that result in the nucleotide sequence of mRNA becoming translated into the amino acid sequence of proteins.

The size of an individual genome is often measured in base pairs because DNA is usually double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands (with the exception of non-coding single-stranded regions of telomeres). The haploid human genome (23 chromosomes) is estimated to be about 3.2 billion bases long and to contain 20,000–25,000 distinct protein-coding genes.[2][3][4] A kilobase (kb) is a unit of measurement in molecular biology equal to 1000 base pairs of DNA or RNA.[5] The total amount of related DNA base pairs on Earth is estimated at 5.0 x 1037, and weighs 50 billion tonnes.[6] In comparison, the total mass of the biosphere has been estimated to be as much as 4 TtC (trillion tons of carbon).[7]

Hydrogen bonding and stability

Top, a GC base pair with three hydrogen bonds. Bottom, an AT base pair with two hydrogen bonds. Non-covalent hydrogen bonds between the pairs are shown as dashed lines.

Hydrogen bonding is the chemical interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high GC-content is more stable than DNA with low GC-content, but, contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly, and stabilization is mainly due to stacking interactions.[8]

The larger nucleobases, adenine and guanine, are members of a class of double-ringed chemical structures called purines; the smaller nucleobases, cytosine and thymine (and uracil), are members of a class of single-ringed chemical structures called pyrimidines. Purines are complementary only with pyrimidines: pyrimidine-pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine-purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. Purine-pyrimidine base pairing of AT or GC or UA (in RNA) results in proper duplex structure. The only other purine-pyrimidine pairings would be AC and GT and UG (in RNA); these pairings are mismatches because the patterns of hydrogen donors and acceptors do not correspond. The GU pairing, with two hydrogen bonds, does occur fairly often in RNA (see wobble base pair).

Paired DNA and RNA molecules are comparatively stable at room temperature but the two nucleotide strands will separate above a Thermus thermophilus are particularly GC-rich. On the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often-transcribed genes — are comparatively GC-poor (for example, see TATA box). GC content and melting temperature must also be taken into account when designing primers for PCR reactions.


The following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the 5' end to the 3' end; thus, the bottom strand is written 3' to 5'.

A base-paired DNA sequence:
The corresponding RNA sequence, in which uracil is substituted for thymine where uracil takes its place in the RNA strand:

Base analogs and intercalators

Chemical analogs of nucleotides can take the place of proper nucleotides and establish non-canonical base-pairing, leading to errors (mostly point mutations) in DNA replication and DNA transcription. This is due to their isosteric chemistry. One common mutagenic base analog is 5-bromouracil, which resembles thymine but can base-pair to guanine in its enol form.

Other chemicals, known as DNA intercalators, fit into the gap between adjacent bases on a single strand and induce frameshift mutations by "masquerading" as a base, causing the DNA replication machinery to skip or insert additional nucleotides at the intercalated site. Most intercalators are large polyaromatic compounds and are known or suspected carcinogens. Examples include ethidium bromide and acridine.

Unnatural base pair (UBP)

An unnatural base pair (UBP) is a designed subunit (or nucleobase) of DNA which is created in a laboratory and does not occur in nature. DNA sequences have been described which use newly created nucleobases to form a third base pair, in addition to the two base pairs found in nature, A-T (adenine - thymine) and G-C (guanine - cytosine). A few research groups have been searching for a third base pair for DNA, including teams led by Steven A. Benner, Philippe Marliere, Floyd Romesberg and Ichiro Hirao.[9] Some new base pairs have been reported.[10][11][12]

In 1989 Steven Benner, then at the Swiss Federal Institute of Technology in Zurich, and his team led with modified forms of cytosine and guanine into DNA molecules in vitro.[13] The nucleotides, which encoded RNA and proteins, were successfully replicated in vitro. Since then, Benner's team has been trying to engineer cells that can make foreign bases from scratch, obviating the need for a feedstock.[14]

In 2002, Ichiro Hirao’s group in Japan developed an unnatural base pair between 2-amino-8-(2-thienyl)purine (s) and pyridine-2-one (y) that functions in transcription and translation, for the site-specific incorporation of non-standard amino acids into proteins.[15] In 2006, they created 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) as a third base pair for replication and transcription.[16] Afterward, Ds and 4-[3-(6-aminohexanamido)-1-propynyl]-2-nitropyrrole (Px) was discovered as a high fidelity pair in PCR amplification.[17][18] In 2013, they applied the Ds-Px pair to DNA aptamer generation by in vitro selection (SELEX) and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins.[19]

In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at the [21]

In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as a [21][22] Romesberg said he and his colleagues created 300 variants to refine the design of nucleotides that would be stable enough and would be replicated as easily as the natural ones when the cells divide. This was in part achieved by the addition of a supportive algal gene that expresses a nucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP into E. coli bacteria.[21] Then, the natural bacterial replication pathways use them to accurately replicate a plasmid containing d5SICS–dNaM. Other researchers were surprised that the bacteria replicated these human-made DNA subunits.[23]

The successful incorporation of a third base pair is a significant breakthrough toward the goal of greatly expanding the number of proteins.[9] The artificial strings of DNA do not encode for anything yet, but scientists speculate they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses.[24] Experts said the synthetic DNA incorporating the unnatural base pair raises the possibility of life forms based on a different DNA code.[23][24]

Length measurements

The following abbreviations are commonly used to describe the length of a D/RNA molecule:

  • bp = base pair(s)— one bp corresponds to approximately 3.4 Å (340 pm) of length along the strand, and to roughly 618 or 643 daltons for DNA and RNA respectively.
  • kb (= kbp) = kilo base pairs = 1,000 bp
  • Mb (= Mbp) = mega base pairs = 1,000,000 bp
  • Gb = giga base pairs = 1,000,000,000 bp.

For case of single-stranded DNA/RNA units of nucleotides are used, abbreviated nt (or knt, Mnt, Gnt), as they are not paired. For distinction between units of computer storage and bases kbp, Mbp, Gbp, etc. may be used for base pairs.

The [25][26]

See also


  1. ^
  2. ^
  3. ^
  4. ^
  5. ^
  6. ^
  7. ^
  8. ^ Peter Yakovchuk, Ekaterina Protozanova and Maxim D. Frank-Kamenetskii. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Research 2006 34(2):564-574.
  9. ^ a b c
  10. ^
  11. ^
  12. ^
  13. ^
  14. ^ a b
  15. ^
  16. ^
  17. ^ Kimoto, M. et al. (2009) An unnatural base pair system for efficient PCR amplification and functionalization of DNA molecules. Nucleic acids Res. 37, e14
  18. ^
  19. ^
  20. ^ a b
  21. ^ a b c d
  22. ^
  23. ^ a b
  24. ^ a b
  25. ^
  26. ^

Further reading

  • (See esp. ch. 6 and 9)

External links

  • DAN—webserver version of the EMBOSS tool for calculating melting temperatures