Now that we understand the individual components of a DNA molecule lets look at its overall structure in more depth.
DNA or deoxyribonucleic acid is the instruction kit of our cells directing it to produce proteins that control what our cells do and how they do it. DNA can be single-stranded (ssDNA) or double-stranded (dsDNA) and is composed of nucleotides bound together by covalent bonds. It differs from ribonucleic acids (RNA) in that it has a ribose sugar that is missing its C2 OH.
Anatomically speaking a single strand of DNA is broken into two major components an outer backbone that holds individual nucleotides together and a series of base pairs in the center that hold multiple strands or more complex shapes together. The backbone terminates in two distinct ends a 5′ phosphate group and a 3′ hydroxyl group.
When DNA is double-stranded it assumes a double helix shape with the strands arranged in antiparallel. This means that the strands twist around each other with one strand having its 5′ phosphate at the top and the other having its 3′ hydroxyl group at the top. As a result, dsDNA has grooves in its structure that act as access points for enzymes and other cellular machinery.
With this big picture in view let’s explore each of the individual components in more detail.
The DNA backbone is composed of a series of phosphodiester bonds (O-P-O) that link the 5′ phosphate of one nucleotide to the 3′ oxygen of another nucleotide’s sugar. Here the 3′ and 5′ refer to the numbered carbons for a sugar molecule and give DNA strands two distinct ends. On one end of a DNA strand there will be a free 5′ phosphate and on the other a free 3′ OH.
Although a bit backward, it is standard to display a DNA sequence starting with its 5′ end and ending with its 3′ end. This notation is extremely similar to peptide notation where peptides are presented from their N to their C terminus. Since both are kind of backwards, peptides alphabetically and DNA numerically we can group the two notations together for ease of memorization.
Lastly, the phosphodiester bonds of the DNA backbone are covalent linkages, and as a result, enzymes are required to catalyze DNA formation and cleavage. These enzymes include DNA polymerase which synthesizes new DNA strands or endonucleases that break DNA backbones to help repair mismatched bases. As we go through and explore replication and DNA repair we will discuss many more enzymes involved with DNA and their specific functions.
A DNA’s base pairs consist of hydrogen-bonding nitrogenous bases that hold together multiple strands of DNA. Alternatively, base pairs can form between a single strand of DNA and allow it to assume a non-linear shape such as a hairpin. Since these interactions are intermolecular forces DNA strands will spontaneously base pair with other strands or with themselves.
Nucleotides are picky though. Due to this they only pair with other complementary base pairs that share the proper amount of H-bond acceptor and H-bond donor spots.
This results in A pairing with either T or U (if it is in RNA) and G paring with C. As seen above G-C base pairs have three hydrogen bonds between them while A-T base pairs only have two. It was first thought that this lent significantly greater stability to DNA strands with a greater composition of G-C pairs but it turns out this is wrong. Instead, G-C pairs have much stronger pi-stacking interactions that account for the stability differences seen.
Mutations within a DNA strand can result in mismatches where an A might erroneously pair with a C instead of its normal T. Thankfully our cells are equipped with repair systems that fix base-pairing errors and restore DNA back to its original state. Avoiding mutation and the negative consequences they can lead to such as cancer.
Since base pairs are specific we can determine the composition of a double-stranded DNA (dsDNA) molecule if we know the number or percentage of one of the four bases present. For example, imagine we have a 220 base pair (bp) DNA strand that is composed of 22 adenine nucleotides.
From this small piece of information we can determine a bunch of information to include:
To determine the total number of nucleotides present we have to realize that a base pair consists of two nucleotides that are paired together (A-T and G-C). So in total, our dsDNA has 440 nucleotides.
From there we know that 22 are A so 22 must also be T to complete the base pairs present. This leaves 396 nucleotides unaccounted for the remainder of which must be G and C. Since there will always be a G for every C present in the DNA molecule half of the 396 will be G and the other half will be C. So in summary our dsDNA has 22 As, 22 Ts, 198 Gs, and 198 Cs. Adding up the As and Gs we can see that there are 220 purines therefore there must also be 220 pyrimidines.
We could then convert all of these into percentages to find the percent composition of our dsDNA but we are going to start from scratch for extra practice. Since 22 over 440 is 0.05 our dsDNA is 5% As and 5% Ts in order to complete the AT base pairs. This leaves us with 90% of the dsDNA composition unaccounted for which must again be split between the Gs and Cs. So our strand is 5% As, 5%Ts, 45% Gs, and 45% Cs. If we add the purine and pyrimidine percentages up we will discover that both are 50%.
Collectively this information is referred to as Chargaff’s rules, which more formally states that the number of purines and pyrimidines in an unmutated dsDNA molecule must always be equal.
Last but not least DNA is also stabilized by pi-stacking bases between its aromatic nitrogenous bases. Pi-stacking interactions are a bit difficult to classify but result from delocalized electrons that create semi-fixed negative portions on a nitrogenous base. These negative regions then interact with semi-fixed positive regions on other nitrogenous bases resulting in London-dispersion-like intermolecular forces.
Since G-C base pairs form better pi-stacking interactions they lead to greater dsDNA stability and are often found in the DNA of thermophiles.
These thermophiles are able to resist DNA denaturation which would normally result in the breaking apart of complementary DNA strands. Due to this G-C rich primers are often used in PCR to make sure the high temperature used throughout the procedure doesn’t dislodge primers from their complementary strands.
Typically DNA denaturation results from increased temperatures, but alkaline pH and certain solvents such as urea can disrupt the hydrogen bonds that hold strands together. This process is nearly identical to protein denaturation except here DNA only relies on hydrogen bonding while proteins rely on H-bonds, disulfide linkages, ionic interactions, and hydrophobic attraction.
Lastly, none of the covalent phosphodiester bonds are broken throughout this process in the same way that protein denaturation doesn’t affect the peptide bonds of a protein’s primary structure.
Scientists can quantify the overall stability of a DNA molecule by determining its melting temperature. Unlike melting temperature in organic chemistry which is concerned with phase changes, the Tm of DNA is concerned with denaturation. Specifically, the Tm of DNA is the temperature at which 50% of a DNA sample is denatured.
Since more stable DNA takes more energy to denature a higher Tm represents increasing DNA stability. If we think back to our thermophile from above its high G-C content and robust pi-stacking interactions result in its DNA having a much higher Tm than our DNA which by comparison has far fewer G-C base pairs.
The reverse of denaturation is annealing where two single strands of DNA come together to form a double stranded DNA helix. In order for this to occur the strands must be complementary to one another. For example, 5′-ATGCCGTA-3′ would need a strand that contained the matching base pairs in order to anneal.
Since dsDNA is comprised of two antiparallel strands the 3′ end must match the 5′ end of the DNA given above. So 3′-TACGGCAT-5′ would be complementary to the above strand and the two strands would spontaneously anneal under non-denaturing conditions.