![]() | Devlin's Angle |
Anyone who uses a computer or a bank ATM is familiar with the need to enter a password. But what is to prevent a person with criminal intent from simply monitoring the electronic signal emanating from your computer or the ATM you are using and capturing both your login name and your password and then using them to gain illegal access to your account? The answer is that the message that actually gets sent is scrambled. The most common way of scrambling identification is by means of what is called a "hash function". Servers generally do not store your raw password, rather its scrambled version, or "hashed value". To determine whether a login attempt is legitimate, the receiving server compares the incoming hashed login information with the hashed version of your login details stored in its database.
To make this system work in practice, the hashing function, H, has to have two fairly obvious properties:
By requirement 2, even if a hacker gained access to the stored login information, he or she would not be able to obtain your password (though without additional controls they would of course be able to access your account on that machine, since it's the hashed version that the receiving server uses for authorization.)
In practice, the folks who design hash functions usually demand an additional uniformity feature that facilitates efficient storage of the hashed values of identification information and makes for a faster and easier database-lookup procedure to determine identity:
There are dozens of different hash functions in use. The two most widely used ones are MD5 ("Message Digest algorithm 5"), developed by Ronald Rivest at MIT in 1991 as one of a series of hash algorithms he designed, and SHA-1 ("Secure Hash Algorithm 1") developed by the National Security Agency in 1995. (Rivest is well known as one of the three inventors of the widely used RSA cryptographic system - described on the 2006 Mathematics Awareness Month website at http://mathaware.org.)
MD5 produces a hash value of 128 bits, and it would take on average 2^64 guesses to find a collision. SHA-1 generates a hash string of length 160 bits, and it would require an average of 2^80 guesses to find a collision. In theory, both methods would seem to offer a high degree of security - provided that the only feasible way to find a collision is by trial-and- error.
Enter Wang, who seems to exhibit a powerful combination of computer programming skill and a brilliant pattern recognition ability. At the Crypto 04 conference in Santa Barbara, California, Wang astonished the attendants with her announcement that she had found a way to find a collision for MD5 in just 2^37 inputs, a staggering reduction in problem size that made MD5 highly vulnerable.
Wang's approach was to input to the algorithm two strings that differ by just a few bits and look closely at what happens to them, step-by- step, as the algorithm operates on them. This led her to develop a "feel" for the kinds of strings that will result in a collision, allowing her to gradually narrow down the possibilities, resulting eventually in her developing a procedure to generate a collision. Others working in the field remark that her ability to intuit which of the many possible paths to follow, coupled with her tenacity, is remarkable. Commenting to the magazine New Scientist, which covered the story in its 17 December, 2005 issue, Charanjit Jutia, a cryptographer at IBM's Watson Research Center in Yorktown Heights, New York, described the challenge of cracking a hash function like SHA-1 as being "like a giant puzzle." Referring to Wang, he added, "Most people get tired and give up. She did not"
In 1997, Wang used her method to successfully find collisions for SHA-0 (SHA-1's predecessor) in 2^58 computations instead of the believed minimum of 2^80. Then, working with colleagues Dengguo Feng, Xuejia Lai, and Hongbo Yu, she developed and a variation of the same approach with MD5 that she used successfully to crack the system.
Following the announcement at Crypto 04, Wang and Yu teamed up with Yiqun Lisa Yin, now an independent security consultant based in Greenwich, Connecticut, and started work on the crown jewel of current hash functions, SHA-1. This proved a much harder nut to crack, but to the general dismay (and admiration) of the computer security community, at the annual RSA security conference in San Francisco in February last year, they were able to announce that they had developed an algorithm that could generate two SHA-1 colliding files in just 2^69 steps.
Unlike MD5, Wang and her colleagues have not (yet) cracked SHA-1, they have just produced a method that could crack it in far fewer steps than was previously believed possible. That number 2^69 is still sufficiently high to provide some degree of confidence in the system's security - for now. So too is the even lower number of 2^63 steps that Wang and other collaborators managed to achieve in the months following the February 2005 announcement. But many in the cryptographic community now believe that the writing is on the wall, and that, as a result of Wang's work, advances in computing speed and power will rapidly render useless all the hashing algorithms currently in use. Not today - the experts assure us that our ATM transactions are secure for now. But soon. Commenting on the development to New Scientist, Burt Kaliski, the head of RSA Laboratories in Bedford, Massachusetts, declared "This is a crisis for the research community." Mark Zimmerman, a cryptographer with ICSA Labs in Mechanicsburg, Pennsylvania, put it rather more colorfully. "It's not Armageddon," he said, "but it's a good kick in the pants."
I will just add that, what I think may be of particular interest to readers of MAA Online is that the entire story shows that, even in the Moore's Law driven world of massively large computations, the remarkable pattern-recognition capacity and "numerical intuitions" of the human brain, can still play a major - indeed devastating - role. Especially when combined with another valuable human trait: tenacity.