Suppose the symbols ``a,'' ``c,'' ``g,'' and ``t'' are the four
symbols a machine uses to
generate a twelve letter sequence ``gattttctcttt''.
So far, we know that
N = 12,
M = 4,
Na = 1,
Nc = 2,
Ng = 1, and
Nt = 8.
We find that the frequencies are
,
,
,
and
.
Now, let's say that the frequencies are always the same no matter how many
sequences the machine creates.
In other words, if the set was infinite, then the frequency of each letter would equal
its probability, and this makes Pi = Fi.
So,
bits. Similarly,
uc = 2.58,
ug = 3.58, and
ut = 0.58 bits.
Using equation (13), and substituting in the
values for
Pa, Pc, Pg, and Pt, we obtain the following: