> next up previous
Next: Statistical effects of making Up: Consensus Sequence Zen Previous: A Paradox: How can

Walking along the genome.

One can depict individual sites using another graphic called a sequence walker, in which the height of a letter above or below zero shows how much that base contributes to the average sequence conservation of the entire collection of sites shown in the logo (Fig. 2) [Schneider, 1997b,Schneider, 1997a]. Instead of counting matches to a consensus, one sums the information contributions for a given sequence to obtain the information for an individual binding site. (The Shannon information measure is unique in that it is the only measure that allows addition for statistically independent components [Shannon, 1948]. Generally, binding site bases are independent [Stephens & Schneider, 1992].)

Sequence walkers can be stepped along the sequence (hence the name) to discover positions that match a particular model, and one can predict whether or not a sequence change will destroy the site and cause a genetic disease [Rogan et al., 1998]. In the case shown, splicing is normally accomplished using a 12.7 bit acceptor at position 5154. Nearby, however, is an 8.9 bit `cryptic' acceptor that is not used apparently because the strongest site in any local region normally wins the competition for splice factors. An A to G mutation at 5153 destroys the normal site, making it 4.5 bits while simultaneously raising the cryptic site to 16.5 bits. This results in a single base frame shift, the loss of the protein, and Hunter disease. Cases like this are difficult to understand using consensus sequences because sites are affected by all of their parts and quantitative differences are missed. Using information theory and sequence walkers we have interpreted about 100 mutations in two human workdays (the computer time is only a few seconds).


next up previous
Next: Statistical effects of making Up: Consensus Sequence Zen Previous: A Paradox: How can
Tom Schneider 2002-12-05