Notes on the Beale Ciphers
The first 121 words of the Key for B1 would decipher 1/2 of the
message. This would include a maximum stretch of 10
clear text letters in a row.
Using the DOI as a key for B1 gives mostly garbage, except for
the curious ocurrance of part of the alphabet in the
early part of the paper:
seq# 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
code# 147 436 194 320 37 122 113 6 140 8 120 305 42 58 461 44 106
a b c d e f g h i i j k l m m n o
What are the odds that this is chance?
Other sequences of the first letters of the alphabet appear when using
the corrections described by Aaron and Matyas: "How the Message in
Paper No. 2 was Recovered"
150 251 284 308 231 124 211 486 225 401
a a a b b c d e f f
25 485 18 436 65 84 200 283 118 320 138
a b b b c c c c d d e
24 283 134 92 63 246 486
a c b c d d e
147 436 195 320 37 122 113 6 140 8 120 305 42 58 461 44 106 301 13 408
a b c d e f g h i i j k l m m n o h p p
Note that the largest number in any of the 4 sequences is 486.
Reworked my copy of B2 to match the Ward pamphlet. I
included corrections for what are almost surely printing
errors, and left in the counting errors introduced by
Beale. Also tried generating a version of the DOI
numbered the way Beale might have done it by hand. The
assumed method is to number only every tenth (or possibly
every fifth) word of the document. The numbering errors
can most easily be explained if the ORIGINAL VERSION of
the DOI is used. The original is written with very long
lines that might cause the type of counting errors seen
in B2. Most of the numbering shifts can be attributed to
Beale miscounting when going from the end of a line to
the beginning of the next. My corrections are:
1) Between `new' and `government' insert a filler word
`X'. The X would be encoded as 156, but is never
used (in any of the 3 papers). Since Beale would
count from the nearest `10-mark' when converting
a letter to its position, he would probably not
see his error once the document was numbered.
This is the only error that requires inserting a
word into the document. All others are caused by
dropping a word (or merging them to show how they
were derived).
Note that Ward just added the word `a' at this
point (a new government).
2) Merge the words `object' and `evinces'. Thus code
word 244 could be read as `o' or `e' in B3. This
error is also unlikely to be seen by Beale once
made. Merging really means dropping one of the
two words merged. The program that reads such a
merged pair will use the first letter of the
string.
3) Number 480=people, then number 480=dissolutions. This
error is similar to the others, except the
`10-marks' are miscounted instead of just the
distance between them. Again, the mistake is
across a line boundary. For counting purposes,
the safest thing to do seems to be to drop the
sequence: `He has refused for a long time after
such dissolutions' (Just as Ward did). Code
words 475-484 aren't used anywhere. Note also
that none of the Gillogly Strings contain numbers
higher than 486. The break at this point could
be related to these strings. Unfortunately, the
numbers 485 and 486 occur AFTER the break.....
4) `meantime' should be counted as 2 words. This is
clear from inspecting the DOI. mean=509,
time=510. In this case, most modern texts are
wrong, and Beale counted correctly. Or: count
`remaining' as two words since it's hyphenated
across a line break.
5) Merge `among' and `us' as word 627. From this point
on, the adjustments have little justification
other than that they are made in the same manner
as the previous ones.
6) Merge `boundaries' and `so' as word 778. There are 4
places that this error could have been made. It
only affects a few code words. This corrects the
counting errors through code element #811 and
leaves only the `x' needing adjustment.
There are 4 words remaining in the DOI that contain an
`x': executioners, excited, sexes, and extend. Which (if
any) of these did Beale use as element 1005? I suggest
an alternative to `sexes' as is commonly assumed:
`Executioners' is the sixth word of a line and this could be
element #1005 if the numbering was restarted at 1000 at
the beginning of this line. Actually, this is pretty weak
reasoning. I just haven't seen a good of explanation as
to why 1005=X in B2.
Just recieved material ordered from the BCA: Ward's 1885
pamphlet, Hart's version and the '81 proceedings. I
found a few irritating differences between what I thought
were correct versions of the 3 ciphers and the values
published in Ward's paper. In particular in B1 I found
the following differences:
Position Hart,etc. Ward
260 320 324
405 90 290
462 858 868
516 820 826
In B3 the following differences exist:
401 11 1
554 29 28
Where did these errors come from? Since the cleartext
for B2 is known, the errors there are understandable as
either typesetting errors or mis-counts by the author of
the ciphers.
Extending the `Gillogly strings':
Another string emerges and the longest string is extended
if a count of 5 is added to elements above 604. The
string:
604 230 436 664 582
is `aabad' without the correction, and `aabcd' with it.
Even more interesting is the cipher element #208 at the
end of the string: `abcdefghiijklmmnohpp'. The element
is 680, and is deciphered as `a' without adding 5, but
becomes `q' by adding 5. Note that there is only one
word in the entire DOI that begins with `q'. Against
this argument is the clear(!) requirement that the
counting not be shifted by 5 for decoding B2.
Hammer's 1971 CACM article also notes significant biases
for multiples of 5 in B3.
Also, the second `h' in the string is represented by
301. The 302nd word in my version of the DOI is `of'.
Explanation for the Gillogly strings:
Assume the method for encoding B1 and B2 went something like this:
A partial list of numbers is prepared by writing the
alphabet down the left side of a piece of paper. Words
beginning with this letter are then noted and their
position in the DOI is written on the appropriate line.
This process continues until most of the lines contain
enough letters for the expected task. B2 is then encoded
using this list; with reference back to the DOI when a
needed letter isn't in the prepared list, or the encoder
thinks a number has been used too often. New numbers may
be added to the list during this process.
In order to encode B1, the preparer then writes the
alphabet ACROSS THE TOP of his prepared list of cipher
elements and proceeds as before; this time picking
numbers from the columns instead of rows. Thus when
encoding a particular word, it would be natural to stick
to the top of the columns and work down while encoding a
word. Note that some of the Gillogly strings use numbers
that do not appear in B2 and that this list must have
been made up before either of the two messages were
encoded.
If this scenario is correct, then the appearance of (say)
four C's in a row probably indicate four different
letters in the cleartext of B1.
Problems with this explanation: Some rows of the list
would have only a few numbers in them and thus would be
unlikely to appear in B1(doi). This is contradicted by
the string: `ijkl'. There are only 6 words that start
with `j' and only 2 that start with `k' in the first 811
words of the DOI. Some rows of the list would also have
many more than 26 numbers and thus shouldn't appear at
all in B1. Finally, the BCA newsletter (June 82) article
by Aaron mentions that the key to B1 was in a format of
25 letters per line, basing this observation on the bias
of numbers toward the center of a key list. (3/30/83:
This tendency is very weak; my modulo program shows only
one significant peak in a chart as described by Aaron)
From the recent discussion in the BCA newsletter, it
seems that Ward really was the agent for the author.
Modulo tests. Wrote a program to display the remainders
after division of the cipher elements. For example,
there is a definite preference for multiples of 5 in all
3 ciphers:
B1 % 5, mean: 86.20, sigma: 8.30
B1 %5 = 0: 78 5
B1 %5 = 1:125 5++++
B1 %5 = 2: 59 5---
B1 %5 = 3: 80 5
B1 %5 = 4: 89 5
B2 % 5, mean:138.00, sigma: 10.51
B2 %5 = 0:187 5++++
B2 %5 = 1:134 5
B2 %5 = 2:145 5
B2 %5 = 3:140 5
B2 %5 = 4: 84 5-----
B3 % 5, mean:117.80, sigma: 9.71
B3 %5 = 0: 81 5----
B3 %5 = 1:152 5+++
B3 %5 = 2:111 5
B3 %5 = 3:121 5
B3 %5 = 4:124 5
For each message, the expected number of remainders for a
completely random distribution is printed (the mean),
followed by the number of counts corresponding to one
standard deviation away from the mean (sigma). Each
subsequent line shows the remainder being calculated, the
number of cipher elements with this remainder, and a
graphical representation of the deviation. +'s and -'s
after the charted number indicate the number of standard
deviations away from the mean that the count represents.
Sigmas of +/- 3 seem to be significant.
B2 prefers numbers evenly divisible by 5, while B3 avoids
them. The pattern for all 3 ciphers is similar; One
remainder is preferred, one avoided, and the remaining
ones about random.
It's not surprising to find a particular remainder
preferred over others, but the pattern for the Beale
ciphers is peculiar. The excess use of a particular is
not balanced by a general avoidance of the other 4
remainders. Instead a single other remainder accounts
for the excess of another. What could cause this?
The pattern for B2%10 also shows significant deviations
from random:
B2 % 10, mean: 69.00, sigma: 7.88
B2 %10= 0:116 10++++++
B2 %10= 1: 60 10-
B2 %10= 2: 69 10
B2 %10= 3: 70 10
B2 %10= 4: 55 10-
B2 %10= 5: 71 10
B2 %10= 6: 74 10
B2 %10= 7: 76 10
B2 %10= 8: 70 10
B2 %10= 9: 29 10-----
B3 % 10, mean: 58.90, sigma: 7.28
B3 %10= 0: 30 10----
B3 %10= 6: 87 10+++
Again B2 prefers numbers evenly divisible by 10, and
avoids numbers with remainders of 9. B3 avoids evenly
divisible numbers, and concentrates on remainders of 6
(which is related to remainders of 1 when dividing by
5).
Conclusions/Observations:
1) The original DOI was the key for B2; numbering errors
all ocurr at line break boundaries of the original DOI.
2) A side table arranged alphabetically was prepared before
B1 or B2 were encoded. The Gillogly strings contain
elements that do not appear in B2.
3) All 3 ciphers show a bias for multiples of 5.
4) A shift of 5 for elements >600 will create/extend
the Gillogly strings in B1.
5) X=1005 in B2, but no word near 1005 contains an X.
6) The Ward pamphlet contains the words 'for silver' as the
cleartext for B2, but the cipher contains no such
set of numbers.
7) J.B.Ward was not the author of "The Beale Papers".