In today's post I'm going to talk a little about digital signatures and hashes. I'm going to talk about hashes and their use in cryptosystems, and then I'm going to give some crypto “recommendations” for how to stay one step ahead of potential digital signature forgers.
I'm not really sure why, but I just love to learn about security mechanisms, how they can be beat, and how you can really foil intruders. If you don't share my passions for math and security, maybe this post won't hold your attention; I promise I'll post something more sensational soon. If, on the other hand, you read Cryptonomicon and were starved for the numerical details behind the characters' plots, read on, and be satisfied!
Let me get one thing clear before we start. Current digital cryptography is secure enough for you to rest easy – your weakest link is not going to be that somebody spends thousands of CPU hours to do a direct attack on your data. If somebody wants to steal your information it's much easier to use a “side channel attack,” in other words it's easier for a data thief to push malicious key-logging software onto your Windows computer, infiltrate your organization, or record the sound of your keyboard to get sensitive information than it is to do a brute-force attack on even a relatively weak cryptosystem.
However, I think cryptology is fun, so today I'll talk about a security practice which will keep your digitally-signed documents über-safe. If that appeals, read on!
The Internet provides a remarkable degree of anonymity to its users, which can be both a blessing and a curse. (LeDopore isn't my real name, by the way. I enjoy being able to post unfiltered opinions that will never be tied to C.V.-related Google searches. I can prepare a face to meet the faces that [I] meet and have only a select few be able to link my masks.)
The Internet would be much less useful if we couldn't establish the authenticity of any particular source. Because of its decentralized nature (which is one of the reasons it's so robust - 0 seconds downtime since the 1970s is pretty impressive), there's no way to have a trusted path between source and sender; we must let the message itself testify to its authenticity.
Public Key Cryptology
When a message is digitally signed with public key (asymmetric) encryption, you can quickly verify that only a particular sender (actually, a sender with access to the key's corresponding private key) could have sent it. We say "asymmetric" and "public" because for every secure channel, there's one public (in other words, you want everyone to know it) and one private (secret) key; these keys are different (hence "asymmetric").
Let me give a hallway analogy to explain what public key encryption can do. Imagine an apartment building hallway with rows of doors with mail slots, and with glassed-in locked message boards beside every door. You can slip a message into anybody's slot without anyone else being able to read it, and you can post anything you like in your locked message board so that anybody can see it an know you sent it. Slipping a message into others' slots is equivalent to encrypting it with their public key: they need their private key to read it. Posting behind glass is equivalent to encrypting it with your private key: if you need your public key to decrypt it, it's impossible that the message was generated with anything but your unique private key.
For a fully-secure connection, you can encrypt a message first with your private key and then with the receiver's public key; then only they will be able to receive it, and they can be sure that the message came from you. (The hallway metaphor breaks down, since with public key cryptology you can do the equivalent of slipping a message board of yours into someone else's mail slot.) Thus people who have never met can exchange fully private information, which is why you can buy things with your credit card online. (A mixed blessing?)
One of the big potential holes in public key cryptography is that you have to be sure you know what public key to use when sending a message to somebody. The only way around this conundrum is to go through a security broker like VeriSign, whose job it is to physically go to companies to sign hand-delivered public keys with their master VeriSign private key, which your computers are pre-programmed to trust. (Man, talk about a single point-of-failure; if anybody managed to factor VeriSign's product-of-primes it would be "game over" for lots of digital security. If you don't like VeriSign's game, you can always physically share symmetric keys through a trusted, i.e. non-Internet, connection first. That's how I've set up my ssh into work; not that I don't trust VeriSign, but you never know...)
The RSA Algorithm
Signatures typically work through the RSA public key algorithm, which gains its cryptologic strength from the fact that it's easy to check if a number is prime, easy to do modular exponentiation (which I'm not even going to define here), but difficult to factor the product of two large prime numbers. (If you're interested in the math behind RSA, try chapter 1, page 42 of Algorithms, by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, freely available online and very well written).
The Need for Hashes
Theoretically it would be possible to sign entire documents with RSA, and to conduct whole conversations by passing messages through public-key cryptosystems. However, although using RSA with the proper keys is orders of magnitude faster than cracking it (which, as far as I know, hasn't been done ever with long enough keys), it still takes quite a few clock cycles. Typically then you don't send your secret info directly through RSA, but you use a block cypher like AES (Advanced Encryption Standard).
AES takes a secret 128-bit shared number and generates a bitstream of random-looking data. Both the sender and receiver share the same random 128-bit key through a secure method like RSA, then they use AES on the 128-bit key to make a stream of random-looking bits. Since both sender and receiver have the same key, AES is a type of symmetric key encryption.
AES is in many ways like a random-number generator on steroids: the 128-bit number is the random seed, and from it you can generate as long a random-looking bitstream as you like. The sender of messages ("Alice:" in cryptology it's always Alice who has some interesting secret message) then takes her digital message and the random bitstream and performs the exclusive OR operation between them. (I.e., if the random bit in the bitstream is 1, flip the message bit from 1 to 0 or vice versa, but if the random bit is 0 do nothing.) The result is a totally unintelligible to everyone but Bob (the ever-listening, trusted confidant of Alice), with whom Alice has shared the 128-bit key. Since Bob can use AES to make exactly the same bitstream as Alice, he knows which bits have been flipped, and thus can recover the original message by flipping the bits back.
Back to digital signatures. Just as it's impractical to sign everything with RSA directly, it would take a lot of CPU cycles to sign your documents with RSA. Instead, usually you'll sign a hash of the document you want to verify came from you. In the hallway analogy, think of it as distributing a book you liked to everyone, and then slipping the title page into your secure glassed-in message board so that everyone can see that you endorsed it.
Making a Good Hash Function
There's an immediate problem with the title-page strategy: other people could write a different book with the same title page. If the new interloping book contained inflammatory remarks, you could get into a heap of trouble. Ideally, what we want is some digest, or hash, of your book (other than just ripping out the title page) which had the following properties:
- Relatively fast to calculate
- Sensitive to the entire document, not just one page of it
- Small enough to fit into a message box
- Nearly impossible to reverse, i.e. find another book with the same hash
Let's go over why each of the four above points is important. #1: if it takes a long time to calculate the hash, you waste time. (That's why we don't sign whole messages with RSA, right?) #2: every bit of the hash must depend on every bit of the original document in a unique way. (This way changing even a single character in the document produces a completely different hash, making forgery difficult.) #3: hashes are typically only a few bytes long. (MD5 is 128 bits long, SHA-1 is 160 bits and WHIRLPOOL is 512 bits - all small enough that signing them is no big deal.) #4 the hashes should be computationally easy only in one direction: so it's hard to make a message with a specified hash. When I say "hard," I mean that ideally it would take about 2^(hash length) tries to find a message which would have a specified hash.
This last point is vital: when you make a digital signature, you're claiming authorship for every message which has that hash, since the hash is the only thing you sign. (If somebody distributed a forged document withe same hash as a document you signed, they could claimed you signed the forgery.)
Hashes are also used to protect passwords. You want programs to be able to identify if a password was correct, but for security reasons it's a bad idea to store the password itself on your disk. To get around this problem, most software stores only a hash of your password on your disk. To check that subsequent entered passwords are correct, programs do a hash of the entered text and compare it to the stored hash. As long as the has has good crypto strength, people with read-only access to the file containing the passwords will not be able to guess the password from the hash. (Aside: Windows by default uses an infamously insecure algorithm for storing password hashes, the LM hash, which requires only about 2^36 operations to crack. Even a general-purpose modern computer can brute-force Windows passwords in a few hours, and you can speed that up to a few minutes by using pre-computed rainbow tables. Insane!)
Potential Pitfall: Birthday Attacks
SHA-1 (Secure Hash Algorithm) is used industry-wide as a purportedly secure hash algorithm (i.e. one that satisfies #4 above). It's still pretty good, but it's starting to show its age. One of the best ways to attack the signed hash cryptosystem is to use what's called a birthday attack, named after the birthday paradox (which says if you have more than about 25 people in a room, chances are that two people will share the same birthday - the trick works because the number of possible birthday collisions is 25 * (25 -1)/2 = 300 - the number of potential pairs goes as the square of the number of people).
To do a birthday attack, the villain chooses a message you'd be happy to sign (A, which could be some innocuous legal document to be signed by a lawyer) and an evil message he wants you to sign (B which can be anything). He then looks for strings of invisible fluff (c and d) which he can append to A and B such that the hash of Ac will be the same as the hash of Bd. (The invisible fluff can be a string of mixed spaces and non-breaking spaces, or comments in an .html document, or tons of other things which won't affect the appearance of A and B but will change their hashes.)
Here's the bad news: although for a good 160-bit hash you'd have to make 2^159 guesses of d such that A and Bd would have the same hash, for a birthday attack the villain generates only about 2^80 fluff strings c and d. Chances are that for one of the 2^160 pairs of c'd and d's, the hash of Ac will be the same as the hash of Bd.
Real World Implications
Already there have been successful birthday attacks against MD5 (the 128-bit hash I mentioned), and SHA-1, which is an industry standard, is starting to show cracks as well. In 2005, Xiaoyun Wang, Andrew Yao and Frances Yao have found a shortcut to do birthday attacks on SHA-1 such that only about 2^63 computations are needed. (If SHA-1 were a better hash function, no attack faster than brute force would be possible, and that would take 2^80 operations). Even today, 2^63 operations is feasible with the right hardware: if a teraflop specialty purpose machine (like the Geforce 8800) costs about $500, then to make a malicious pair of messages Ac and Bd in a year you'd need about a billion dollars in computer resources.
Computers are going to get faster, and cryptanalysts (maybe) are going to find faster-than-2^63 attacks on SHA-1. My prediction is that birthday attacks against SHA-1 are going to become widespread some time within the next 20 years.
Staying One Step Ahead
Replacements for SHA-1 are in the works. There are hashes with longer digests which are already public standards: SHA-256, SHA-512 and WHIRLPOOL (with 256, 512 and 512 bit digest lengths), but they haven't been as widely scrutinized as SHA-1. (I bet they're all pretty good though, but I'm not a pro cryptanalyst.) If you're a programmer, consider coding software in a modular-enough way that you can drop in different hash functions into your code easily, and that different hash lengths don't mess up your program.
Until better software comes along, I'd recommend that people working on big, secret important stuff adopt the following two policies:
- Always edit a document sent to you before signing it in some unpredictable way.
- Always keep a copy of the document you actually do sign.
Back to the Real World
Of course, these day's it's much cheaper to hire a spy to infiltrate your organization than to generate a billion-dollar hash-colliding document pair and hope it's signed without modification. I'm a silly crypto-hobbyist for suggesting you should worry about anything but a side channel attack. And even then, I've found that the vast majority of folks are too concerned with their own business to try to hack yours. I routinely accidentally leave my door not just unlocked but wide open, and I have yet to be stolen from at home. The world's a safe place; you don't need to worry about digital security. I just have a little-kid-in-treehouse mentality when it comes to fancy computational methods for making rock-solid crypto. How about you? Do any of my readers think crypto is fun? Should I blog more about it?