Scanning Genomes for Binding Sites:
Sequence ``Walkers'' Display the Information Content of Individual Sites

Paul N. Hengen and Thomas D. Schneider
National Cancer Institute
Frederick Cancer Research and Development Center
Frederick, Maryland 21702-1201 USA
(http://www-lmmb.ncifcrf.gov/~pnh/) & (http://www-lmmb.ncifcrf.gov/~toms/)


We have developed an information theory based method for modeling interactions of DNA-binding proteins with their respective binding sites on DNA. We show the feasibility of using such a method for scanning DNA sequences to predict sites bound by the Factor for Inversion Stimulation (Fis), a pleiotropic protein that enhances site-specific recombination, controls DNA replication, and regulates transcription of a number of genes in Escherichia coli and Salmonella typhimurium.

When scanning various DNA sequences with a weight matrix derived from the information analysis of 60 known Fis binding sites, we identified Fis sites that correlated well with published DNaseI protection experiments and other biochemical data. Sites we predicted in many different genetic systems were missed by others because the DNA sequence there did not match the Fis consensus sequence and most likely because the protected regions overlapped with other Fis sites.

A graphical method was created to show how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. By displaying the information at individual binding sites as letter graphics, these ``sequence walkers'' can be stepped along raw sequence data to visually search for binding sites. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or displayed upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the sequence conservation of the binding site.

Fis binding sites spaced 11 base pairs apart at the E. coli origin of chromosome replication.

Using walkers, we were able to quickly visualize overlapping Fis binding sites spaced 7 or 11 base pairs apart in several genetic systems. Gel shift experiments showed that pairs of Fis sites have two distinct binding modes, suggesting that Fis competes with itself for binding and therefore acts as a molecular flip-flop mechanism. The positioning of Fis binding sites relative to one another and to the binding sites of other proteins appears to be key for the ability of Fis to perform many diverse functions.

As a general sequence analysis tool, walkers can be used to investigate the effects of particular mutations. With a walker, one can interactively alter the DNA sequence to quantitatively engineer binding sites to one's own specifications, predict whether a change is likely to be a polymorphism or a mutation, and detect anomalies in sequence databases.