Deoxyribonucleic acid (DNA) is a natural polymer which encodes the genetic information required for the growth, development, and reproduction of an organism. Found in all cells, it consists of chains of units called nucleotides. Each nucleotide unit contains three components: the sugar deoxyribose, a phosphate group, and a nitrogen-containing ring structure called a base. There are four different bases in DNA: adenine, cytosine, guanine and thymine.
DNA molecules are very long and threadlike. They consist of two polymeric strands twisted about each other into a spiral shape known as a double helix, which resembles a twisted ladder. In eukaryotic cells, DNA is found within the cell nucleus in the chromosomes, which are extremely condensed structures in which DNA is associated with proteins. Each species contains a characteristic number of chromosomes in their cells. In humans, every cell contains 46 chromosomes (except for egg and sperm cells which contain only 23). The total genetic information in a cell is called its genome. In prokaryotic cells such as bacteria, DNA is not contained within the specialized nuclear membrane, but rather is dispersed in the interior substance of the cell (cytoplasm)
The fundamental units of heredity are genes. A gene is a segment of a DNA molecule that encodes the information necessary to make a specific protein. The many proteins encoded by DNA contribute to a cell’s structure and chemical activities.
DNA not only encodes the “blueprints” for cellular proteins but also the instructions for when and where they will be made. For example, the oxygen carrier hemoglobin is made in red blood cells but not in nerve cells, though both contain the same total genetic content. Thus, DNA also contains the information necessary for regulating how its genetic messages are used.
The sequencing of the human genome has determined that a human cell contains approximately 30,000 genes, far less than the previously estimated 50,000–100,000. Except in the case of identical twins, a comparison of the genes from different individuals always reveals a number of differences. Therefore, each person is genetically unique. This is the basis of DNA fingerprinting, a forensic procedure used to match DNA collected from a crime scene with that of a suspect, and of the use of DNA to establish who is the biological parent of a child.
Genes direct the function of all organs and systems in the body. In some cases, the defects in the DNA of just one gene can cause a genetic disorder that results in disease because the protein encoded by the defective gene is abnormal. The abnormal hemoglobin produced by people afflicted with sickle cell anemia is an example. Defects in certain genes called oncogenes, which regulate growth and development, give rise to cancer. Therefore, defects in DNA can affect the two kinds of genetic information it carries, messages directing the manufacture of proteins and information regulating the expression, or carrying out, of these messages.
Prior to the discovery of the nucleic acids, the Austrian monk Gregor Mendel (1822–1884) worked out the laws of inheritance by the selective breeding of pea plants. As early as 1865 he proposed that some then-undefined factors from each parent were responsible for the inheritance of certain characteristics in plants. The Swiss biochemist Friedrich Miescher (1844–1895) discovered the nucleic acids in 1868 in nuclei isolated from pus cells scraped from surgical bandages. However, research on the chemical structure of nucleic acids lagged until new analytical techniques became available in the mid twentieth century.
Despite knowledge of the chemical structure of nucleotides and how they were linked together to form DNA, the possibility that DNA was the genetic material was regarded as unlikely. As late as the mid twentieth century, proteins were thought to be the molecules of heredity because they appeared to be the only cellular components diverse enough to account for the large variety of genes. In 1944, Oswald Avery (1877–1955) and his colleagues showed that non-pathogenic strains of pneumococcus, the bacterium that causes pneumonia, could become pathogenic (disease-causing) if treated with a DNA-containing extract from heat-killed pathogenic strains. Based on this evidence, Avery concluded that DNA was the genetic material. However, widespread acceptance of DNA as the bearer of genetic information did not come until a report by other workers in 1952 that DNA, not protein, enters a bacterial cell infected by a virus. This showed that the genetic material of the virus was contained in its DNA, confirming Avery’s hypothesis.
In 1953, James Watson (1928–) and Francis Crick (1916–2004) proposed their double helix model for the three-dimensional structure of DNA. They correctly deduced that the genetic information was encoded in the form of the sequence of nucleotides in the molecule. With their landmark discovery began an era of molecular genetics in biology. Eight years later investigators cracked the genetic code. They found that specific trinucleotide sequences—sequences of three nucleotides—are codes for each of 20 amino acids, the building blocks of proteins.
In 1970 scientists found that bacteria contained enzymes that recognize a particular sequence of 4-8 nucleotides and will always cut DNA at or near that sequence to yield specific (rather than random), consistently reproducible DNA fragments. These enzymes were dubbed restriction enzymes. Two years later it was found that the bacterial enzyme DNA ligase could be used to rejoin these fragments. This permitted scientists to construct what were termed recombinant DNA; DNA composed of segments from two different sources, even from different organisms. With the availability of these tools, genetic engineering became possible and biotechnology began.
By 1984 the development of DNA fingerprinting allowed forensic chemists to compare DNA samples from a crime scene with that of suspects. The first conviction using this technique came in 1987. Three years later doctors first attempted to treat a patient unable to produce a vital immune protein using gene therapy. This technique involves inserting a portion of DNA into a patient’s cells to correct a deficiency in a particular function. The Human Genome Project also began in 1990. The aim of this project was to determine the nucleotide sequence in DNA of the entire human genome, which consists of about three billion nucleotide pairs. In 2001, researchers announced the completion of the sequencing of a human genome.
Deoxyribose, the sugar component in each nucleotide, is so-named because it has one less oxygen atom than ribose, which is present in ribonucleic acid (RNA). Deoxyribose contains five carbon atoms, four of which lie in a ring along with one oxygen atom. The fifth carbon atom is linked to a specific carbon atom in the ring. A phosphate group is always linked to deoxyribose via a chemical bond between an oxygen atom in the phosphate group and the carbon atom in deoxyribose by a chemical bond between a nitrogen atom in the base and a specific carbon atom in the deoxyribose ring.
The nucleotide components of DNA are connected to form a linear polymer in a very specific way. A phosphate group always connects the sugar component of a nucleotide with the sugar component of the next nucleotide in the chain. Consequently, the first nucleotide bears an unattached phosphate group, and the last nucleotide has a free hydroxyl group. Therefore, DNA is not the same at both ends. This directionality plays an important role in the replication of DNA.
DNA molecules contain two polymer chains or strands of nucleotides and so are said to be double-stranded. (In contrast, RNA is typically single-stranded.) Their shape resembles two intertwined spiral staircases in which the alternating sugar and phosphate groups of the nucleotides compose the sidepieces. The steps consist of pairs of bases, each attached to the sugars on their respective strands. The bases are held together by weak attractive forces called hydrogen bonds. The two strands in DNA are antiparallel, which means that one strand goes in one direction (first to last nucleotide from top to bottom) and the other strand goes in the opposite direction (first to last nucleotide from bottom to top).
Because the sugar and phosphate components which make up the sidepieces are always attached in the same way, the same alternating phosphate-sugar sequence repeats over and over again. The bases attached to each sugar may be one of four possible types. Because of the geometry of the DNA molecule, the only possible base pairs that will fit are adenine (A) paired with thymine (T), and cytosine (C) paired with guanine (G).
The DNA in our cells is a masterpiece of packing. The double helix coils itself around protein cores to form nucleosomes. These DNA-protein structures resemble beads on a string. Flexible regains between nucleosomes allows these structures to be wound around themselves to produce an even more compact fiber. The fibers can then be coiled for even further compactness. Ultimately, DNA is paced into the highly condensed chromosomes. If the DNA in a human cell is stretched, it is approximately 6 ft (1.82 m) long. If all 46 chromosomes are laid end-to-end, their total length is still only about eight-thousandths of an inch. This means that DNA in chromosomes is condensed about 10,000 times more than that in the double helix. Why all this packing? The likely answer is that the fragile DNA molecule would get broken in its extended form. Also, if not for this painstaking compression, the cell might be mired in its own DNA.
DNA directs a cell’s activities by specifying the structures of its proteins and by regulating which proteins and how much are produced, and where. In so doing, it never leaves the nucleus. Each human cell contains about 6 ft (2 m) of highly condensed DNA, which encodes some 30,000 genes. If a particular protein is to be made, the DNA segment corresponding to the gene for that protein acts as a template (pattern) for the synthesis of an RNA molecule in a process known as transcription. This messenger RNA molecule travels from the nucleus to the cytoplasm where it in turn acts as the template for the construction of the protein by the protein assembly apparatus of the cell. This latter process is known as translation and requires an adaptor molecule, transfer RNA, which translates the genetic code of DNA into the language of proteins.
Eventually, when a cell divides, its DNA must be copied so that each daughter cell will have a complete set of genetic instructions. The structure of DNA is perfectly suited to this process. The two intertwined strands unwind, exposing their bases, which then pair with bases on free nucleotides present in the cell. Because of the base-pairing rules, the sequence of bases along one strand of DNA determines the sequence of bases in the newly forming complementary strand. An enzyme then joins the free nucleotides to complete the new strand. Since the two new DNA strands that result are identical to the two originals, the cell can pass along an exact copy of its DNA to each daughter cell.
Sex cells, the eggs and sperm, contain half the number of chromosomes as other cells. When the egg and sperm fuse during fertilization, they form the first cell of a new individual with the complete complement of DNA—46 chromosomes. Each cell (except the sex cells) in the new person carries DNA identical to that in the fertilized egg cell. In this way the DNA of both parents is passed from one generation to the next. Thus, DNA plays a crucial role in the propagation of life.
Replication of DNA
DNA replication, the process by which the double-stranded DNA molecule reproduces itself, is a complicated process, even in the simplest organisms. DNA synthesis—making new DNA from old—is complex because it requires the interaction of a number of cellular components and is rigidly controlled to ensure the accuracy of the copy, upon which the very life of the organism depends. This adds several verification steps to the procedure. Though the details vary from organism to organism, DNA replication follows certain rules that are universal to all.
DNA replication (duplication, or copying) is always semi-conservative. This means that during DNA replication the two strands of the parent molecule unwind and each becomes a template for the synthesis of the complementary strand of the daughter molecule. As a result both daughter molecules contain one new strand and one old strand (from the parent molecule). The replication of DNA always requires a template, an intact strand from the parent molecule. This strand determines the sequence of nucleotides on the new strand, because of the A-withT and C-with-G base pairing requirement.
Replication begins at a specific site called the replication origin when the enzyme DNA helicase binds to a portion of the double stranded helix and “melts” the bonds between base pairs. This unwinds the helix to form a replication fork consisting of two separated strands, each serving as a template. Specific proteins then bind to these single strands to prevent them from re-pairing. Another enzyme called DNA polymerase proceeds to assemble the daughter strands using a pool of free nucleotide units which are present in the cell in an “activated” form.
High fidelity in the copying of DNA is vital to the organism and, incredibly, only about one error per one trillion replications ever occurs. This high fidelity results largely because DNA polymerase is a “self-editing” enzyme. If a nucleotide added to the end of the chain mismatches the complementary nucleotide on the template, pairing does not occur. DNA polymerase then clips off the unpaired nucleotide and replaces it with the correct one.
Occasionally errors are made during DNA replication and passed along to daughter cells. Such errors are called mutations. They have serious consequences because they can cause the insertion of the wrong amino acid into a protein. For example, the substitution of a T for an A in the gene encoding hemoglobin causes an amino acid substitution that results in sickle cell anemia. To understand the significance of such mutations requires knowledge of the genetic code.
The genetic code
Genetic information is stored as nucleotide sequences in DNA (or RNA) molecules. This sequence specifies the identity and position of the amino acids in a particular protein. Amino acids are the building blocks of proteins in the same way that nucleotides are the building blocks of DNA. However, though there are only four possible bases in DNA (or RNA), there are 20 possible amino acids in proteins. The genetic code is a sort of “bilingual dictionary” which translates the language of DNA into the language of proteins. In the genetic code the letters are the four bases A, C, G, and T (or U instead of T in RNA). Obviously, the four bases of DNA are not enough to code for 20 amino acids. A sequence of two bases is also insufficient, because this permits coding for only 16 of the 20 amino acids in proteins. Therefore, a sequence of three bases is required to ensure enough combinations to code for all 20 amino acids. Since all the combinations in this DNA language, called codons, consist of three letters, the genetic code is often referred to as the triplet code.
Each codon specifies a particular amino acid. Because there are 64 possible codons and only 20 amino acids, several different codons specify the same amino acid, so the genetic code is said to be degenerate. However, the code is unambiguous because each codon specifies only one amino acid.
Since in eukaryotes DNA never leaves the nucleus, the information it stores is not transferred to the cell directly. Instead, a DNA sequence must first be copied into a messenger RNA molecule, which carries the genetic information from the nucleus to protein assembly sites in the cytoplasm. There it serves as the template for protein construction. The sequences of nucleotide triplets in messenger RNA are also referred to as codons.
Expression of genetic information
Genetic information flows from DNA to RNA to protein. Ultimately, the linear sequence of nucleotides in DNA directs the production of a protein molecule with a characteristic three-dimensional structure essential to its proper function. Initially, information is transcribed from DNA to RNA. The information in the resulting messenger RNA is then translated from RNA into protein by small transfer RNA molecules.
In some exceptional cases the flow of genetic information from DNA to RNA is reversed. In retroviruses, such as the AIDS virus, RNA is the hereditary material. An enzyme known as reverse transcriptase makes a copy of DNA using the virus’ RNA as a template. In still other viruses which use RNA as the hereditary material, DNA is not involved in the flow of information at all.
Most cells in the body contain the same DNA as that in the fertilized egg. (Some exceptions to this are the sex cells, which contain only half of the normal complement of DNA, as well as red blood cells, which lose their nucleus when fully developed.) Some so-called housekeeping genes are expressed in all cells because they are involved in the fundamental processes required for normal function. (A gene is said to be expressed when its product, the protein it codes for, is actively produced in a cell.) For example, since all cells require ribosomes, structures that function as protein assembly lines, the genes for ribosomal proteins and ribosomal RNA are expressed in all cells. Other genes are only expressed in certain cell types, such as genes for antibodies in certain cells of the immune system. Some are expressed only during certain times in development. How is it that some cells express certain genes while others do not, even though all contain the same DNA? A complete answer to this question is still in the works. However, the main way is by controlling the start of transcription. This is accomplished by the interaction of proteins called transcription factors with DNA sequences near the gene. By binding to these sequences transcription factors may turn a gene on or off.
Another way is to change the rate of messenger RNA synthesis. Sometimes the stability of the messenger RNA is altered. The protein product itself may be altered, as well as its transport or stability. Finally, gene expression can be altered by DNA rearrangements. Such programmed reshuffling of DNA is the means of generating the huge assortment of antibody proteins found in immune cells.
Genetic engineering and recombinant DNA
Cells that contain the same recombinant DNA fragment are clones. A clone harboring a recombinant DNA molecule that contains a specific gene can be isolated and identified by a number of techniques, depending upon the particular experiment. Thus, recombinant DNA molecules can be introduced into rapidly growing microorganisms, such as bacteria or yeast, to produce large quantities of medically or commercially important proteins normally present only in scant amounts in the cell. For example, human insulin and interferon have been produced in this manner.
In recent years a technique has been developed which permits analysis of very small samples of DNA without repeated cloning, which is laborious. Known as the polymerase chain reaction, this technique involves “amplifying” a particular fragment of DNA by repeated synthesis using the enzyme DNA polymerase. This method can increase the amount of the desired DNA fragment by a million-fold or more.