| Ivars Peterson's MathTrek |
October 19, 1998
In the District of Columbia, where I live, the motor vehicle office uses the driver's Social Security number, unless the driver requests a randomly generated substitute. Social Security numbers have the following form: xxx-xx-xxxx.
The first three numbers generally denote where you were born or, if you're a recent immigrant, where you lived when you received your entry papers. For example, 001 to 003 correspond to New Hampshire, 004 to 007 to Maine, and so on. Refugees from Southeast Asia between April 1975 and November 1979 got numbers that begin with 574, 580, or 586. Railroad workers were once issued Social Security numbers starting in the range 700 to 728.
The next six numbers identify the individual. They also incorporate some sort of mathematical scheme that allows the Social Security Administration to tell whether a Social Security number is fraudulent. For example, the middle two digits can't be 00.
Some states use more complicated schemes for establishing driver's license identification numbers. The numbers may include check digits to detect errors or fraud, or they may encode such personal data as the month and date of birth, year of birth, and sex. In several instances, however, the coding strategy is secret.
Breaking those license codes has proved an irresistible challenge for Joseph A. Gallian, a mathematician at the University of Minnesota in Duluth. "It's something I've had fun with for many years," he says.
Driver's license numbers in Minnesota encode personal data. Gallian started with his own identification number, then collected those of his students and anyone else who would participate in his effort to work out the Minnesota scheme. "I had lots of data," he says. "My idea was to look for patterns."
Here's a typical driver's license number: S-530-429-085-151. It turns out that the first four characters represent the last name, the first and middle names account for the next six characters, and the final three digits are based on the month and date of birth, but not year.
Initially, Gallian conjectured that only the first two letters matter in determining the three digits for the first name. From his data, he could tell that Aaron was encoded as 028 and Adam as 031. This meant that names starting with Ab would get the number 029, and those starting with Ac would get 030. Nice pattern!
There were anomalies, however. Gallian discovered that a popular name might get a number all to itself. Edward is 189, John is 429, and Mary is 587. Sometimes, three letters of the name count instead of just two. Various rules take care of cases where a person has no middle name or a double given name (such as Mary Lou). Indeed, the complete coding tables used by clerks are quite complicated.
The date coding system also has quirks. For example, March 1 is 159, and March 2 is 162. That might mean the numbers go up by three for each successive date. Actually, subsequent days are assigned values by increments of 3 and 2 in alternating fashion, and the jump from March 8 to March 9 is 5. There is also a jump of 20 between March 19 and March 20.
"These gaps serve a practical purpose," Gallian says. They play an important role in breaking ties when people with very similar names and the same birthday apply for licenses. The custom of naming a son after the father also brings the rule into play when both share the same birthday.
The first character of the license number is the first character of the surname. The three digits representing the rest of the name are obtained by using the so-called Soundex Coding System.
1. Delete all occurrences of h and w.
2. Assign numbers to the remaining letters as follows: b, f, p, v become 1; c, g, j, k, q, s, x, z become 2; d, t become 3; l becomes 4; m, n become 5, and r becomes 6. No values are assigned to the vowels and y.
3. If two or more letters with the same numeric value are adjacent, omit all but the first. For example, Schworer becomes Sorer and Hughgill becomes Ugil.
4. Delete the first character of the original name, if still present.
5. Delete all occurrences of a, e, i, o, u, and y.
6. Use the first three digits corresponding to the remaining letters; append trailing zeros if fewer than three letters remain.
Whew!
Here are some examples: Schworer (-> Scorer -> 220606 -> Sorer -> orer -> 0606 -> rr -> 66) becomes S-660, Hughgill is H-240, Shaw is S-000, Sachs is S-200, Lennon is L-550, and McCartney is M-263.
"The Soundex System was designed so that likely misspellings of a name would nevertheless result in the correct coding of the name," Gallian remarks. "For example, frequent misspellings of my name are: Gallion, Gillian, Galian, Galion, Gilliam, Gallahan, and Galliam. Observe that all of these yield the same coding as Gallian." At the same time, however, very different names can collapse to the same number.
According to the Minnesota coding scheme, the complete number for John Bennett Smith, born Feb. 27, would be S-530-429-085-151.
You can also try going in the opposite direction, Gallian notes. "I am a resident of Minnesota. I was born on Jan. 5, 1942, and my middle name is Anthony. From this can you deduce my driver's license number?"
"Many states have boring systems," Gallian concludes. The remaining states, like Minnesota, provide lots of fun for the determined code cracker.
Copyright 1998 by Ivars Peterson
References:
Gallian, J.A. 1996. Error detection methods. ACM Computing Surveys 28(September):504.
______. 1992. Breaking the Missouri license code. The UMAP Journal 13(No. 1):37.
______. 1991. Assigning driver's license numbers. Mathematics Magazine 64(February):13.
Mueller, A., et al. 1992. More Missouri breaks. The UMAP Journal 13(No. 4):351.
Peterson, I. 1998. The Jungles of Randomness: A Mathematical Safari. New York: Wiley.
Comments are welcome. Please send messages to Ivars Peterson at ipeterson@maa.org.