> next up previous
Next: Just say no! Up: Consensus Sequence Zen Previous: Missing the trees in

Flipping the light on to see an unseen world.

Two recently published examples demonstrate some of the interesting biology that one can miss by using a consensus sequence. The first example is the RepA binding site (Fig. 5). The sine wave over the logo represents the twist of B-form DNA. The crests of the wave represent the protein facing a major groove and the troughs represent the protein facing a minor groove. Experimental data indicate that the major groove sides of Gs at positions 1 and 12 (black dots) are facing the protein, while the major groove sides of those base pairs at positions 6 and 8 (open circles) are facing away from the protein [Papp et al., 1993]. Hydroxyl radical and ethylation interference data also support this assignment and indicate that RepA binds to only one face of the DNA [Papp & Chattoraj, 1994].

RepA and other DNA binding proteins show sequence conservation up to 2 bits where they contact the major groove and only 1 bit where they face a minor groove [Papp et al., 1993,Schneider, 2001]. The upper bound of 2 bits is achievable because all 4 bases can be distinguished using contacts in the major groove [Seeman et al., 1976]. In contrast, the minor groove of B-form DNA is essentially symmetrical and can only provide up to 1 bit of sequence conservation.

Intriguingly, as seen in Fig. 5, RepA violates this rule at positions and where the protein faces a minor groove [Papp et al., 1993]. The violation implies that the DNA is not B-form. To understand this anomaly, we substituted a variety of chemically modified base pairs at position (and its complement ) and found that the N3 proton on the thymine is responsible for contacting RepA through the minor groove [Lyakhov et al., 2001]. Since the N3 proton is normally sequestered in the center of the DNA helix, the DNA must indeed be distorted, as predicted from the sequence logo. Furthermore, the acceptable contact points for hydrogen bonding vary by several angstroms more than an H-bond could withstand in a rigid structure, suggesting that the base may rotate towards the minor groove for binding to occur. In other words, the T at may be `flipping' out of the DNA.

Base flipping was discovered by Rich Roberts in the co-crystal of the HhaI methyltransferase [Roberts, 1995,Roberts & Cheng, 1998]. This solved a puzzle of how that enzyme functions, since the chemistry of methylation requires attack from above or below the plane of the base. Such an attack is not possible inside the DNA helix. The HhaI methyltransferase solves the problem by flipping the base out of the helix and into a pocket of the enzyme. Other DNA modification proteins also flip bases [Cheng et al., 1993,Klimašauskas et al., 1994,Verdine, 1994,Reinisch et al., 1995].

Why would RepA be flipping a base? RepA is used by the bacteriophage P1 plasmid for DNA replication [Abeles, 1986,Abeles et al., 1989]. DNA replication requires that the helix be opened before synthesis can begin. The first step of this process would be the binding of RepA to the DNA. A very simple second step would be the flipping of a base out of the DNA, since DNA `breathing' occurs naturally on a millisecond scale [Guéron et al., 1987,Leroy et al., 1988]. If the thymine at flips, is captured, and then held out of the DNA helix by RepA, weakened stacking could allow the remainder of the DNA to be more easily opened by a DNA helicase.

Sequence logos of other DNA replication protein binding sites have similar anomalies [Schneider, 2001], suggesting that base flipping may be a general mechanism for the second step of DNA replication.

How is this related to consensus sequences? The consensus sequence for RepA sites can be determined by reading the top letters of the sequence logo (Fig. 5) because the letters are sorted so that the most frequent base is on top. One finds: ATGTGTGCTGGAGGGAAA . By viewing the binding site through the restrictive glasses of a consensus sequence, the unusual base becomes indistinguishable from the other bases!

A second example is the TATAAT sites mentioned earlier, for which the sequence logo is shown in Fig. 6. The logo shows that there is much lower sequence conservation in positions , and than in positions , and , but the low region is significantly above background since the error bars are so small. (Contrast these tiny error bars to the ones in Fig. 3 and Fig. 5, where there are fewer sites.) The DNA region opened by RNA polymerase straddles the gap, leaving a highly conserved T at near the edge of the opened region. We propose that is the first base opened during RNA transcriptional initiation, and a reasonably large body of experimental evidence supports this hypothesis [Schneider, 2001]. As with RepA sites, the unusual thymine is obscured if one uses the consensus sequence.


next up previous
Next: Just say no! Up: Consensus Sequence Zen Previous: Missing the trees in
Tom Schneider 2002-12-05