Notes on the Beale Ciphers The first 121 words of the Key for B1 would decipher 1/2 of the message. This would include a maximum stretch of 10 clear text letters in a row. Using the DOI as a key for B1 gives mostly garbage, except for the curious ocurrance of part of the alphabet in the early part of the paper: seq# 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 code# 147 436 194 320 37 122 113 6 140 8 120 305 42 58 461 44 106 a b c d e f g h i i j k l m m n o What are the odds that this is chance? Other sequences of the first letters of the alphabet appear when using the corrections described by Aaron and Matyas: "How the Message in Paper No. 2 was Recovered" 150 251 284 308 231 124 211 486 225 401 a a a b b c d e f f 25 485 18 436 65 84 200 283 118 320 138 a b b b c c c c d d e 24 283 134 92 63 246 486 a c b c d d e 147 436 195 320 37 122 113 6 140 8 120 305 42 58 461 44 106 301 13 408 a b c d e f g h i i j k l m m n o h p p Note that the largest number in any of the 4 sequences is 486. Reworked my copy of B2 to match the Ward pamphlet. I included corrections for what are almost surely printing errors, and left in the counting errors introduced by Beale. Also tried generating a version of the DOI numbered the way Beale might have done it by hand. The assumed method is to number only every tenth (or possibly every fifth) word of the document. The numbering errors can most easily be explained if the ORIGINAL VERSION of the DOI is used. The original is written with very long lines that might cause the type of counting errors seen in B2. Most of the numbering shifts can be attributed to Beale miscounting when going from the end of a line to the beginning of the next. My corrections are: 1) Between `new' and `government' insert a filler word `X'. The X would be encoded as 156, but is never used (in any of the 3 papers). Since Beale would count from the nearest `10-mark' when converting a letter to its position, he would probably not see his error once the document was numbered. This is the only error that requires inserting a word into the document. All others are caused by dropping a word (or merging them to show how they were derived). Note that Ward just added the word `a' at this point (a new government). 2) Merge the words `object' and `evinces'. Thus code word 244 could be read as `o' or `e' in B3. This error is also unlikely to be seen by Beale once made. Merging really means dropping one of the two words merged. The program that reads such a merged pair will use the first letter of the string. 3) Number 480=people, then number 480=dissolutions. This error is similar to the others, except the `10-marks' are miscounted instead of just the distance between them. Again, the mistake is across a line boundary. For counting purposes, the safest thing to do seems to be to drop the sequence: `He has refused for a long time after such dissolutions' (Just as Ward did). Code words 475-484 aren't used anywhere. Note also that none of the Gillogly Strings contain numbers higher than 486. The break at this point could be related to these strings. Unfortunately, the numbers 485 and 486 occur AFTER the break..... 4) `meantime' should be counted as 2 words. This is clear from inspecting the DOI. mean=509, time=510. In this case, most modern texts are wrong, and Beale counted correctly. Or: count `remaining' as two words since it's hyphenated across a line break. 5) Merge `among' and `us' as word 627. From this point on, the adjustments have little justification other than that they are made in the same manner as the previous ones. 6) Merge `boundaries' and `so' as word 778. There are 4 places that this error could have been made. It only affects a few code words. This corrects the counting errors through code element #811 and leaves only the `x' needing adjustment. There are 4 words remaining in the DOI that contain an `x': executioners, excited, sexes, and extend. Which (if any) of these did Beale use as element 1005? I suggest an alternative to `sexes' as is commonly assumed: `Executioners' is the sixth word of a line and this could be element #1005 if the numbering was restarted at 1000 at the beginning of this line. Actually, this is pretty weak reasoning. I just haven't seen a good of explanation as to why 1005=X in B2. Just recieved material ordered from the BCA: Ward's 1885 pamphlet, Hart's version and the '81 proceedings. I found a few irritating differences between what I thought were correct versions of the 3 ciphers and the values published in Ward's paper. In particular in B1 I found the following differences: Position Hart,etc. Ward 260 320 324 405 90 290 462 858 868 516 820 826 In B3 the following differences exist: 401 11 1 554 29 28 Where did these errors come from? Since the cleartext for B2 is known, the errors there are understandable as either typesetting errors or mis-counts by the author of the ciphers. Extending the `Gillogly strings': Another string emerges and the longest string is extended if a count of 5 is added to elements above 604. The string: 604 230 436 664 582 is `aabad' without the correction, and `aabcd' with it. Even more interesting is the cipher element #208 at the end of the string: `abcdefghiijklmmnohpp'. The element is 680, and is deciphered as `a' without adding 5, but becomes `q' by adding 5. Note that there is only one word in the entire DOI that begins with `q'. Against this argument is the clear(!) requirement that the counting not be shifted by 5 for decoding B2. Hammer's 1971 CACM article also notes significant biases for multiples of 5 in B3. Also, the second `h' in the string is represented by 301. The 302nd word in my version of the DOI is `of'. Explanation for the Gillogly strings: Assume the method for encoding B1 and B2 went something like this: A partial list of numbers is prepared by writing the alphabet down the left side of a piece of paper. Words beginning with this letter are then noted and their position in the DOI is written on the appropriate line. This process continues until most of the lines contain enough letters for the expected task. B2 is then encoded using this list; with reference back to the DOI when a needed letter isn't in the prepared list, or the encoder thinks a number has been used too often. New numbers may be added to the list during this process. In order to encode B1, the preparer then writes the alphabet ACROSS THE TOP of his prepared list of cipher elements and proceeds as before; this time picking numbers from the columns instead of rows. Thus when encoding a particular word, it would be natural to stick to the top of the columns and work down while encoding a word. Note that some of the Gillogly strings use numbers that do not appear in B2 and that this list must have been made up before either of the two messages were encoded. If this scenario is correct, then the appearance of (say) four C's in a row probably indicate four different letters in the cleartext of B1. Problems with this explanation: Some rows of the list would have only a few numbers in them and thus would be unlikely to appear in B1(doi). This is contradicted by the string: `ijkl'. There are only 6 words that start with `j' and only 2 that start with `k' in the first 811 words of the DOI. Some rows of the list would also have many more than 26 numbers and thus shouldn't appear at all in B1. Finally, the BCA newsletter (June 82) article by Aaron mentions that the key to B1 was in a format of 25 letters per line, basing this observation on the bias of numbers toward the center of a key list. (3/30/83: This tendency is very weak; my modulo program shows only one significant peak in a chart as described by Aaron) From the recent discussion in the BCA newsletter, it seems that Ward really was the agent for the author. Modulo tests. Wrote a program to display the remainders after division of the cipher elements. For example, there is a definite preference for multiples of 5 in all 3 ciphers: B1 % 5, mean: 86.20, sigma: 8.30 B1 %5 = 0: 78 5 B1 %5 = 1:125 5++++ B1 %5 = 2: 59 5--- B1 %5 = 3: 80 5 B1 %5 = 4: 89 5 B2 % 5, mean:138.00, sigma: 10.51 B2 %5 = 0:187 5++++ B2 %5 = 1:134 5 B2 %5 = 2:145 5 B2 %5 = 3:140 5 B2 %5 = 4: 84 5----- B3 % 5, mean:117.80, sigma: 9.71 B3 %5 = 0: 81 5---- B3 %5 = 1:152 5+++ B3 %5 = 2:111 5 B3 %5 = 3:121 5 B3 %5 = 4:124 5 For each message, the expected number of remainders for a completely random distribution is printed (the mean), followed by the number of counts corresponding to one standard deviation away from the mean (sigma). Each subsequent line shows the remainder being calculated, the number of cipher elements with this remainder, and a graphical representation of the deviation. +'s and -'s after the charted number indicate the number of standard deviations away from the mean that the count represents. Sigmas of +/- 3 seem to be significant. B2 prefers numbers evenly divisible by 5, while B3 avoids them. The pattern for all 3 ciphers is similar; One remainder is preferred, one avoided, and the remaining ones about random. It's not surprising to find a particular remainder preferred over others, but the pattern for the Beale ciphers is peculiar. The excess use of a particular is not balanced by a general avoidance of the other 4 remainders. Instead a single other remainder accounts for the excess of another. What could cause this? The pattern for B2%10 also shows significant deviations from random: B2 % 10, mean: 69.00, sigma: 7.88 B2 %10= 0:116 10++++++ B2 %10= 1: 60 10- B2 %10= 2: 69 10 B2 %10= 3: 70 10 B2 %10= 4: 55 10- B2 %10= 5: 71 10 B2 %10= 6: 74 10 B2 %10= 7: 76 10 B2 %10= 8: 70 10 B2 %10= 9: 29 10----- B3 % 10, mean: 58.90, sigma: 7.28 B3 %10= 0: 30 10---- B3 %10= 6: 87 10+++ Again B2 prefers numbers evenly divisible by 10, and avoids numbers with remainders of 9. B3 avoids evenly divisible numbers, and concentrates on remainders of 6 (which is related to remainders of 1 when dividing by 5). Conclusions/Observations: 1) The original DOI was the key for B2; numbering errors all ocurr at line break boundaries of the original DOI. 2) A side table arranged alphabetically was prepared before B1 or B2 were encoded. The Gillogly strings contain elements that do not appear in B2. 3) All 3 ciphers show a bias for multiples of 5. 4) A shift of 5 for elements >600 will create/extend the Gillogly strings in B1. 5) X=1005 in B2, but no word near 1005 contains an X. 6) The Ward pamphlet contains the words 'for silver' as the cleartext for B2, but the cipher contains no such set of numbers. 7) J.B.Ward was not the author of "The Beale Papers".