> next up previous
Next: About this document ... Up: Consensus Sequence Zen Previous: Acknowledgments.

Bibliography

Abeles, 1986
Abeles, AL. 1986.
P1 plasmid replication. Purification and DNA-binding activity of the replication protein RepA.
J. Biol. Chem. 261, 3548-3555.

Abeles et al., 1989
Abeles, AL, Reaves, LD & Austin, SJ. 1989.
Protein-DNA interactions in regulation of P1 plasmid replication.
J. Bacteriol. 171, 43-52.

Barrett et al., 1991
Barrett, M, Donoghue, MJ & Sober, E. 1991.
Against Consensus.
Syst. Zool. 40, 486-493.

Barrett et al., 1993
Barrett, M, Donoghue, MJ & Sober, E. 1993.
Crusade? A Reply to Nelson.
Syst. Biol. 42, 216-217.

Basharin, 1959
Basharin, GP. 1959.
On a statistical estimate for the entropy of a sequence of independent random variables.
Theory Probability Appl. 4 (3), 333-336.

Box, 1979
Box, GEP. 1979.
Robustness is the strategy of scientific model building.
In Robustness in Statistics, (Launer, RL & Wilkinson, GN, eds), pp. 201-236, Academic Press, New York, NY.
`ALL MODELS ARE WRONG BUT SOME ARE USEFUL' is a subheader in the chapter on page 202.

Cheng et al., 1993
Cheng, X, Kumar, S, Posfai, J, Pflugrath, JW & Roberts, RJ. 1993.
Crystal structure of the HhaI DNA methyltransferase complexed with S-Adenosyl-L-Methionine.
Cell, 74, 299-307.

Day & McMorris, 1992
Day, WHE & McMorris, FR. 1992.
Critical comparison of consensus methods for molecular sequences.
Nucleic Acids Res. 20, 1093-1099.

de Queiroz, 1993
de Queiroz, A. 1993.
For Consensus (Sometimes).
Syst. Biol. 42, 368-372.

Eagleman & Sejnowski, 2000
Eagleman, DM & Sejnowski, TJ. 2000.
Motion integration and postdiction in visual awareness.
Science, 287, 2036-2038.

Fishel et al., 1993
Fishel, R, Lescoe, MK, Rao, MRS, Copeland, NG, Jenkins, NA, Garber, J, Kane, M & Kolodner, R. 1993.
The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer.
Cell, 75, 1027-1038.

Guéron et al., 1987
Guéron, M, Kochoyan, M & Leroy, JL. 1987.
A single mode of DNA base-pair opening drives imino proton exchange.
Nature, 328, 89-92.

Hengen et al., 1997
Hengen, PN, Bartram, SL, Stewart, LE & Schneider, TD. 1997.
Information analysis of Fis binding sites.
Nucleic Acids Res. 25 (24), 4994-5002.
http://www.lecb.ncifcrf.gov/~toms/paper/fisinfo/.

Hübner & Arber, 1989
Hübner, P & Arber, W. 1989.
Mutational analysis of a prokaryotic recombinational enhancer element with two functions.
EMBO J. 8, 577-585.

Kendrick & Baldwin, 1987
Kendrick, KM & Baldwin, BA. 1987.
Cells in temporal cortex of conscious sheep can respond preferentially to the sight of faces.
Science, 236, 448-450.

Klimašauskas et al., 1994
Klimašauskas, S, Kumar, S, Roberts, RJ & Cheng, X. 1994.
HhaI methyltransferase flips its target base out of the DNA helix.
Cell, 76, 357-369.

Kuhn, 1970
Kuhn, TS. 1970.
The Structure of Scientific Revolutions.
second edition, The University of Chicago Press, Chicago.

Leach et al., 1993
Leach, FS, Nicolaides, NC, Papadopoulos, N, Liu, B, Jen, J, Parsons, R, Peltomäki, P, Sistonen, P, Aaltonen, LA, Nyström-Lahti, M, Guan, XY, Zhang, J, Meltzer, PS, Yu, JW, Kao, FT, Chen, DJ, Cerosaletti, KM, Fournier, REK, Todd, S, Lewis, T, Leach, RJ, Naylor, SL, Weissenbach, J, Mecklin, JP, Järvinen, H, Petersen, GM, Hamilton, SR, Green, J, Jass, J, Watson, P, Lynch, HT, Trent, JM, de la Chapelle, A, Kinzler, KW & Vogelstein, B. 1993.
Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer.
Cell, 75, 1215-1225.

Leroy et al., 1988
Leroy, JL, Kochoyan, M, Huynh-Dinh, T & Guéron, M. 1988.
Characterization of base-pair opening in deoxynucleotide duplexes using catalyzed exchange of the imino proton.
J. Mol. Biol. 200, 223-238.

Lewin, 1997
Lewin, B. 1997.
Genes VI.
Oxford University Press, Oxford.

Lisser & Margalit, 1993
Lisser, S & Margalit, H. 1993.
Compilation of E. coli mRNA promoter sequences.
Nucleic Acids Res. 21, 1507-1516.

Lyakhov et al., 2001
Lyakhov, IG, Hengen, PN, Rubens, D & Schneider, TD. 2001.
The P1 Phage Replication Protein RepA Contacts an Otherwise Inaccessible Thymine N3 Proton by DNA Distortion or Base Flipping.
Nucl. Acid Res. 29 (23), 4892-4900.
http://www.lecb.ncifcrf.gov/~toms/paper/repan3/.

Miller, 1955
Miller, GA. 1955.
Note on the bias of information estimates.
In Information Theory in Psychology, (Quastler, H, ed.), pp. 95-100, Free Press, Glencoe, IL.

Nelson, 1993
Nelson, G. 1993.
Why Crusade against Consensus? A Reply to Barrett, Donoghue, and Sober.
Syst. Biol. 42, 215-216.

Papp & Chattoraj, 1994
Papp, PP & Chattoraj, DK. 1994.
Missing-base and ethylation interference footprinting of P1 plasmid replication initiator.
Nucleic Acids Res. 22, 152-157.

Papp et al., 1993
Papp, PP, Chattoraj, DK & Schneider, TD. 1993.
Information analysis of sequences that bind the replication initiator RepA.
J. Mol. Biol. 233, 219-230.

Pribnow, 1975
Pribnow, D. 1975.
Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter.
Proc. Natl. Acad. Sci. USA, 72, 784-788.

Purves et al., 2002
Purves, D, Lotto, RB & Nundy, S. 2002.
Why we see what we do.
Amer. Sci. 90 (3), 236-243.
http://www.americanscientist.org/articles/02articles/Purves.html.

Rager & Singer, 1998
Rager, G & Singer, W. 1998.
The response of cat visual cortex to flicker stimuli of variable frequency.
Eur. J. Neurosci. 10, 1856-1877.

Reinisch et al., 1995
Reinisch, KM, Chen, L, Verdine, GL & Lipscomb, WN. 1995.
The crystal structure of HaeIII methyltransferase covalently complexed to DNA: an extrahelical cytosine and rearranged base pairing.
Cell, 82, 143-153.

Robberson et al., 1990
Robberson, BL, Cote, GJ & Berget, SM. 1990.
Exon definition may facilitate splice site selection in RNAs with multiple exons.
Mol. Cell. Biol. 10, 84-94.

Roberts, 1995
Roberts, RJ. 1995.
On base flipping.
Cell, 82, 9-12.

Roberts & Cheng, 1998
Roberts, RJ & Cheng, X. 1998.
Base flipping.
Annu Rev Biochem, 67, 181-198.

Rogan et al., 1998
Rogan, PK, Faux, BM & Schneider, TD. 1998.
Information analysis of human splice site mutations.
Human Mutation, 12, 153-171.
http://www.lecb.ncifcrf.gov/~toms/paper/rfs/.

Rogan & Schneider, 1995
Rogan, PK & Schneider, TD. 1995.
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.
Human Mutation, 6, 74-76.
http://www.lecb.ncifcrf.gov/~toms/paper/colonsplice/.

Schneider, 1995
Schneider, TD. 1995.
Information Theory Primer.
http://www.lecb.ncifcrf.gov/~toms/paper/primer/.

Schneider, 1996
Schneider, TD. 1996.
Reading of DNA sequence logos: prediction of major groove binding by information theory.
Meth. Enzym. 274, 445-455.
http://www.lecb.ncifcrf.gov/~toms/paper/oxyr/.

Schneider, 1997a
Schneider, TD. 1997a.
Information content of individual genetic sequences.
J. Theor. Biol. 189 (4), 427-441.
http://www.lecb.ncifcrf.gov/~toms/paper/ri/.

Schneider, 1997b
Schneider, TD. 1997b.
Sequence walkers: a graphical method to display how binding proteins interact with DNA or RNA sequences.
Nucleic Acids Res. 25, 4408-4415.
http://www.lecb.ncifcrf.gov/~toms/paper/walker/, erratum: NAR 26(4): 1135, 1998.

Schneider, 2000
Schneider, TD. 2000.
Evolution of biological information.
Nucleic Acids Res. 28 (14), 2794-2799.
http://www.lecb.ncifcrf.gov/~toms/paper/ev/.

Schneider, 2001
Schneider, TD. 2001.
Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation.
Nucl. Acid Res. 29 (23), 4881-4891.
http://www.lecb.ncifcrf.gov/~toms/paper/baseflip/.

Schneider, 2002
Schneider, TD. 2002.
Consensus Sequence Zen.
Applied Bioinformatics, 1 (3), 111-119.
http://www.lecb.ncifcrf.gov/~toms/papers/zen/.

Schneider & Mastronarde, 1996
Schneider, TD & Mastronarde, D. 1996.
Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method.
Discrete Applied Mathematics, 71, 259-268.
http://www.lecb.ncifcrf.gov/~toms/paper/malign.

Schneider & Stephens, 1990
Schneider, TD & Stephens, RM. 1990.
Sequence logos: a new way to display consensus sequences.
Nucleic Acids Res. 18, 6097-6100.
http://www.lecb.ncifcrf.gov/~toms/paper/logopaper/.

Schneider et al., 1986
Schneider, TD, Stormo, GD, Gold, L & Ehrenfeucht, A. 1986.
Information content of binding sites on nucleotide sequences.
J. Mol. Biol. 188, 415-431.
http://www.lecb.ncifcrf.gov/~toms/paper/schneider1986/.

Seeman et al., 1976
Seeman, NC, Rosenberg, JM & Rich, A. 1976.
Sequence-specific recognition of double helical nucleic acids by proteins.
Proc. Natl. Acad. Sci. USA, 73, 804-808.

Shannon, 1948
Shannon, CE. 1948.
A mathematical theory of communication.
Bell System Tech. J. 27, 379-423, 623-656.
http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html.

Slany & Kersten, 1992
Slany, RK & Kersten, H. 1992.
The promoter of the tgt/sec operon in Escherichia coli is preceded by an upstream activation sequence that contains a high affinity FIS binding site.
Nucleic Acids Res. 20, 4193-4198.

Stephens & Schneider, 1992
Stephens, RM & Schneider, TD. 1992.
Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites.
J. Mol. Biol. 228, 1124-1136.
http://www.lecb.ncifcrf.gov/~toms/paper/splice/.

Verdine, 1994
Verdine, GL. 1994.
The flip side of DNA methylation.
Cell, 76, 197-200.

Zheng et al., 1999
Zheng, M, Doan, B, Schneider, TD & Storz, G. 1999.
OxyR and SoxRS regulation of fur.
J. Bacteriol. 181, 4639-4643.
http://www.lecb.ncifcrf.gov/~toms/paper/oxyrfur/.


Table 1: and consensus information for sequence logos.
Name Rs SD CON Consensus From To
bits bits bits
donor 7.9 0.0 16 NAGGTAAGTN -3 +6
acceptor 9.3 0.0 29 NNNNNNNNNNNTTTTTTTTTYTNCAGGN -25 +2
random 0.3 2.0 49 NSCNGNNNNAGTNNNACTNTANGATTTNCNANATTCAANCN -20 +20
repa 24.5 0.6 36 ATGTGTGCTGGAGGGAAA -1 +16
-10 5.2 0.0 12 TATAAT -12 -7
Rs is ; SD is the standard deviation of Rs (to one decimal place); CON is the consensus information. The range of the site used is shown in columns From and To. The lowest frequency for using a base in each consensus was 0.4 and the consensus was computed using the program consensus (version 1.16, http://www.lecb.ncifcrf.gov/~toms/delila/consensus.html).


Figure 1: Sequence logos [Schneider & Stephens, 1990] for human donor and acceptor splice junctions [Stephens & Schneider, 1992] compared to the consensus sequence for both sites. Source: Adapted from [Stephens & Schneider, 1992].

Figure 2: Sequence walkers [Schneider, 1997b] for a human acceptor site in the iduronidase synthetase gene and a mutation (indicated by an arrow). On the top sequence, the normal end of exon 4 is shown by a bracket and dashed line. The vertical rectangle on a sequence walker is the `zero base' used to identify the location of the walker. The vertical rectangles also indicate a scale from to bits. A 12.7 bit acceptor at 5154 directs splicing to the correct location. Source: Adapted from [Rogan et al., 1998].

Figure 3: Sequence logo for random sequences.
Error bars, shown by I beams, indicate one standard deviation of the stack height. Note that a small-sample correction [Schneider et al., 1986] suppresses the stack height so that a position such as , which is 50% C and 50% G, is lower than 1 bit. The correction is needed to counter a statistical bias that causes an apparent information to appear when one substitutes frequencies for probabilities in Shannon's equation [Schneider et al., 1986,Miller, 1955,Basharin, 1959]. The same effect makes one tend to see patterns where there are none. The consensus sequence on the bottom was chosen from positions that have 50% or more of one base. S is the two-letter code for C or G.

Figure 4: Region upstream of the tgt/sec promoter of E. coli analyzed by Fis sequence walkers. The information for each Fis site was computed from models that are 21 bases wide ( to ) but only the range to is shown by walkers. The sine waves represent major (peaks) and minor (valley) grooves faced by the Fis protein. Source: Adapted from [Schneider, 1997b].

Figure 5: Sequence logo for RepA binding sites.
Error bars indicate standard deviations of the entire stack height. Source: Adapted from [Schneider, 2001].

Figure 6: Sequence logo for the region of E. coli promoters.
The promoters were from the Lisser-Margalit database [Lisser & Margalit, 1993]. The dashed and solid boxes show the regions opened by the polymerase, while the arrow shows the start points of transcription. Source: Adapted from [Schneider, 2001].

Figure 7: Consensus versus .



The information for the 5 sequence logos in figures 1, 3, 5, and 6 was graphed by comparing the information content () to the information content of the corresponding consensus sequence. is the average information in a set of binding sites. It is also the summed area under the sequence logo. The line at 45 represents equality between the two measures. The data are summarized in Table 1.


next up previous
Next: About this document ... Up: Consensus Sequence Zen Previous: Acknowledgments.
Tom Schneider 2002-12-05