Tuesday, May 22, 2007

Hiatus

Greetings, California Dreamers

I thought I'd let you know that I'm going to be on a trip for the next few weeks. Many Ideas is going to go dormant until mid July, so tune in then when I'll have a new slab of quirky considerations for your surfing pleasure.

Sincerely,

LeDopore

PS If you come up with any more brilliant ideas for posts (the best ones seem to be user-submitted), please leave them in a comment here.

Friday, May 11, 2007

Liberty and Bandwidth for All

Greetings, YouTubers*.

In my last post I outlined how our current system of private Internet Service Providers (ISPs) companies is economically wasteful. Although in general the private sector is better than the public sector at providing higher-quality services at a lower cost, with ISPs the product (Internet bandwidth) is a factor of 1000 times cheaper than what ISPs charge. Almost all the ISPs' operational costs come from advertising, distributing and charging for this cheap-as-dirt Internet backbone bandwidth. In other words IPSs are intrinsically so wasteful that publicly-owned networks make sense. This post is going to tackle how I think we should implement municipal data networks.

Letting Demand Drive Expansion

Internet technology changes so fast that it would be unwieldy for a council to try to have a sane policy of technology roll-out which took advantage of the latest and greatest. It would also be hard to periodically gauge the service levels residents truly want. It's much better to make technology policy future-proof, meaning that no new laws or regulations will be required to implement better technology where it's wanted as soon as it's developed.

Here's an example of a future-proof network-building scheme. Have residents pledge (with holds on their credit cards) that they would be willing to pay X dollars for Y service. As soon as a private company notices that enough residents in a neighborhood have pledged enough money to make granting the service worthwhile, they can install whatever hardware they choose which is able to meet or exceed the bandwidth demands Y of all the people who pledged X dollars. The money from the credit card holds would then go into a trust which would pay the hardware companies annuities for as long as the service works (or maybe the trust should be invested with low risk, and 25% of the total equity should be paid to the hardware builder/maintainer each year; since Internet technology becomes obsolescent so much faster than roads it makes sense to make the payment schedule accelerated).

The city would provide all of the (essentially free) backbone bandwidth in exchange for the fact that all Internet services using that bandwidth must be broadcast over authentication-free wireless Internet or users must be able to plug in wired connections for free in publicly-accessible points. (Perhaps encryption could be optional to prevent people from spying, but it shouldn't be mandatory, and passwords must not be secret. With good crypto you can have every user use a different session key, so that even if they know each others' passwords they can't snoop on each other.)

Miscellaneous Points

Here are a few guidelines for details of the policy which might help:
  • The quality of service could be specified by three numbers: bandwidth, reliability and latency-to-backbone; that way users can communicate what's most important for them to the free market.
  • Perhaps users should pay on a sliding scale, with payments tied to the quality of service received, so that there is always an explicit incentive to provide better Internet service.
  • Assuming only 25% of residents who want a given service would pledge for it, maybe the city should match pledges paid out of property tax.
  • Since optical fiber is cheap but expensive to lay, it's a common practice to lay cables with many more fibers than will be needed in the near future. These "dark" fibers can later be cheaply lit if needed. Policies should probably specify that some percentage (like 95%) of the fiber laid to make a network must be dark.
  • Depending on political will, it might make sense to pay for city-wide phone-level coverage off the bat through taxes, and let people pledge for upgrades as desired.
Conclusions

In the Chicago example of last post we saw that the entire city could have a free data network for a one-time cost of under $15 per person. People are probably willing to pay a lot more for much faster connections; the plan outlined in this post shows a way in which a publicly-owned network can deliver services the public wants as soon as their feasible to deliver without wasting money on advertising and accounting.

This plan isn't anti-business either. The local companies which would spring up to supply the network services asked for by the people would have a leg up spreading to other municipalities where this same incentive policy gets implemented. (I am fairly confident that other municipalities would want to emulate the digital utopias which would come from this type of municipal Internet service.)

With some organization, the people can have cake and eat it too: they can pay a pittance in extra tax in exchange for hassle-free, state-of-the-art Internet connectivity. Everybody wins except old-school ISP shareholders. (Sell!)

*Web 2.0 couch potatoes?

Wednesday, May 9, 2007

The Answer is Blowin' in the Windy City

Greetings, chatterboxes.

Today I'm going to outline why I think municipal wireless networks are a good idea. We depend more and more on Internet connectivity for our everyday lives; it's no longer the case that bandwidth is a luxury item only a small niche desires. However, the way we typically pay for bandwidth (through private Internet Service Providers, or ISPs) is tremendously inefficient. I'm going to outline an estimate of how inefficient privately-owned ISPs are, then in the next few posts I'll talk about a way in which publicly-owned networks can be financially and technologically sustainable.

Getting Hosed by ISPs

Bandwidth at Internet backbones is ridiculously cheap: about $1 per terabyte (TB) and falling fast. (Based on estimates of web-hosting costs which allow 3 TB of transfer per month for $5 per month - the $1 per TB might not be accurate to within more than an order of magnitude. I don't specifically endorse the web hosting company I linked to - it's just an example of how cheap backbone bandwidth can be.) A heavy home user might transfer about 20 GB of bandwidth per month, costing their ISPs no more than a few pennies per customer per month.

However, the rates which ISPs charge their customers is three orders of magnitude higher: $20 per month is considered a good deal. That's a markup factor of at least 1000.

There are at least three main expenses other than backbone bandwidth which contribute to the costs of running ISPs:
  1. The "last mile" connectivity between multiple homes and a backbone connection point
  2. Advertising and promotion
  3. Billing customers
Going Public

If a publicly-operated free (as in beer) municipal Internet network existed, there would be no need for costs # 2 and 3, and I postulate that #1 could take a big hit too by allowing better technology to be used. I think that one of the major reasons private ISPs are scared to deploy city-wide mesh wireless networks is that if users shared their passwords with friends, they could lose customers. Instead they've opted for wired networks (through DSL or cable) which are probably a lot more expensive than wireless mesh networks so they can be sure you don't share your account with friends.

Why do I think mesh networks are cheaper? The City of Chicago plans to roll out a city-wide wireless mesh network for only $18.5 million. A city-wide network would supplant not only ISP communication, but if a few Asterisk servers were part of the setup you could replace aging telephone lines and cellphones with voice over IP (VoIP), obviating the need for phone companies, whose costs are also dominated by the three numbered items above.

Savings

How much do Chicago's 3 million residents currently pay for phone, Internet and cell phones? If we assume one ISP line (at $20/mo.) and one land line (also at $20/mo.) for every 4 residents and one cellphone (at $30/mo.) for every two residents, we'd estimate that Chicago spends $900 million per year on combined data services. Even assuming Chicago's network costs double the estimate with a one-time cost of $40 million, a municipally-funded wireless network is an exceedingly good deal.

If the backbone bandwidth cost were approximately one penny per resident per month it would not be worth the city's while to try to charge people for their individual bandwidth usage, just as we don't try to charge people who use streetlights more for their fair share of electricity costs to the city.

Conclusions

Even if implemented poorly, a publicly-owned data network would give astronomical cost savings over the current arrangement. There are still the potential pitfalls that a publicly-owned network might be terribly cost-inefficient, or that it might not give the quality of service expected by the residents. However, in my next post I will unveil a plan which addresses both of these woes.

Until then!

LeDopore

Saturday, May 5, 2007

Keeping Your Autograph Yours

Greetings, John Hancock.

In today's post I'm going to talk a little about digital signatures and hashes. I'm going to talk about hashes and their use in cryptosystems, and then I'm going to give some crypto “recommendations” for how to stay one step ahead of potential digital signature forgers.

I'm not really sure why, but I just love to learn about security mechanisms, how they can be beat, and how you can really foil intruders. If you don't share my passions for math and security, maybe this post won't hold your attention; I promise I'll post something more sensational soon. If, on the other hand, you read Cryptonomicon and were starved for the numerical details behind the characters' plots, read on, and be satisfied!

Hype Warning

Let me get one thing clear before we start. Current digital cryptography is secure enough for you to rest easy – your weakest link is not going to be that somebody spends thousands of CPU hours to do a direct attack on your data. If somebody wants to steal your information it's much easier to use a “side channel attack,” in other words it's easier for a data thief to push malicious key-logging software onto your Windows computer, infiltrate your organization, or record the sound of your keyboard to get sensitive information than it is to do a brute-force attack on even a relatively weak cryptosystem.

However, I think cryptology is fun, so today I'll talk about a security practice which will keep your digitally-signed documents über-safe. If that appeals, read on!

Digital Signatures


The Internet provides a remarkable degree of anonymity to its users, which can be both a blessing and a curse. (LeDopore isn't my real name, by the way. I enjoy being able to post unfiltered opinions that will never be tied to C.V.-related Google searches. I can prepare a face to meet the faces that [I] meet and have only a select few be able to link my masks.)

The Internet would be much less useful if we couldn't establish the authenticity of any particular source. Because of its decentralized nature (which is one of the reasons it's so robust - 0 seconds downtime since the 1970s is pretty impressive), there's no way to have a trusted path between source and sender; we must let the message itself testify to its authenticity.

Public Key Cryptology

When a message is digitally signed with public key (asymmetric) encryption, you can quickly verify that only a particular sender (actually, a sender with access to the key's corresponding private key) could have sent it. We say "asymmetric" and "public" because for every secure channel, there's one public (in other words, you want everyone to know it) and one private (secret) key; these keys are different (hence "asymmetric").

Let me give a hallway analogy to explain what public key encryption can do. Imagine an apartment building hallway with rows of doors with mail slots, and with glassed-in locked message boards beside every door. You can slip a message into anybody's slot without anyone else being able to read it, and you can post anything you like in your locked message board so that anybody can see it an know you sent it. Slipping a message into others' slots is equivalent to encrypting it with their public key: they need their private key to read it. Posting behind glass is equivalent to encrypting it with your private key: if you need your public key to decrypt it, it's impossible that the message was generated with anything but your unique private key.

For a fully-secure connection, you can encrypt a message first with your private key and then with the receiver's public key; then only they will be able to receive it, and they can be sure that the message came from you. (The hallway metaphor breaks down, since with public key cryptology you can do the equivalent of slipping a message board of yours into someone else's mail slot.) Thus people who have never met can exchange fully private information, which is why you can buy things with your credit card online. (A mixed blessing?)

One of the big potential holes in public key cryptography is that you have to be sure you know what public key to use when sending a message to somebody. The only way around this conundrum is to go through a security broker like VeriSign, whose job it is to physically go to companies to sign hand-delivered public keys with their master VeriSign private key, which your computers are pre-programmed to trust. (Man, talk about a single point-of-failure; if anybody managed to factor VeriSign's product-of-primes it would be "game over" for lots of digital security. If you don't like VeriSign's game, you can always physically share symmetric keys through a trusted, i.e. non-Internet, connection first. That's how I've set up my ssh into work; not that I don't trust VeriSign, but you never know...)

The RSA Algorithm

Signatures typically work through the RSA public key algorithm, which gains its cryptologic strength from the fact that it's easy to check if a number is prime, easy to do modular exponentiation (which I'm not even going to define here), but difficult to factor the product of two large prime numbers. (If you're interested in the math behind RSA, try chapter 1, page 42 of Algorithms, by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, freely available online and very well written).

The Need for Hashes

Theoretically it would be possible to sign entire documents with RSA, and to conduct whole conversations by passing messages through public-key cryptosystems. However, although using RSA with the proper keys is orders of magnitude faster than cracking it (which, as far as I know, hasn't been done ever with long enough keys), it still takes quite a few clock cycles. Typically then you don't send your secret info directly through RSA, but you use a block cypher like AES (Advanced Encryption Standard).

Aside: AES

AES takes a secret 128-bit shared number and generates a bitstream of random-looking data. Both the sender and receiver share the same random 128-bit key through a secure method like RSA, then they use AES on the 128-bit key to make a stream of random-looking bits. Since both sender and receiver have the same key, AES is a type of symmetric key encryption.

AES is in many ways like a random-number generator on steroids: the 128-bit number is the random seed, and from it you can generate as long a random-looking bitstream as you like. The sender of messages ("Alice:" in cryptology it's always Alice who has some interesting secret message) then takes her digital message and the random bitstream and performs the exclusive OR operation between them. (I.e., if the random bit in the bitstream is 1, flip the message bit from 1 to 0 or vice versa, but if the random bit is 0 do nothing.) The result is a totally unintelligible to everyone but Bob (the ever-listening, trusted confidant of Alice), with whom Alice has shared the 128-bit key. Since Bob can use AES to make exactly the same bitstream as Alice, he knows which bits have been flipped, and thus can recover the original message by flipping the bits back.

Aside Over

Back to digital signatures. Just as it's impractical to sign everything with RSA directly, it would take a lot of CPU cycles to sign your documents with RSA. Instead, usually you'll sign a hash of the document you want to verify came from you. In the hallway analogy, think of it as distributing a book you liked to everyone, and then slipping the title page into your secure glassed-in message board so that everyone can see that you endorsed it.

Making a Good Hash Function

There's an immediate problem with the title-page strategy: other people could write a different book with the same title page. If the new interloping book contained inflammatory remarks, you could get into a heap of trouble. Ideally, what we want is some digest, or hash, of your book (other than just ripping out the title page) which had the following properties:

  1. Relatively fast to calculate
  2. Sensitive to the entire document, not just one page of it
  3. Small enough to fit into a message box
  4. Nearly impossible to reverse, i.e. find another book with the same hash
If you have a hash function which satisfies all of the above properties, you can speedily sign documents by distributing the bulk of the document through insecure channels, and then making a hash of the entire document and signing just the hash with your private key. Then receivers wanting to check the authenticity of your document can take the insecure copy of the document, hash it in exactly the same way you hashed it (there are publicly available hash algorithms like MD5, SHA-1, and WHIRLPOOL), and then compare the hash the digitally-signed hash you just made (by passing your signed hash through your public key).

Let's go over why each of the four above points is important. #1: if it takes a long time to calculate the hash, you waste time. (That's why we don't sign whole messages with RSA, right?) #2: every bit of the hash must depend on every bit of the original document in a unique way. (This way changing even a single character in the document produces a completely different hash, making forgery difficult.) #3: hashes are typically only a few bytes long. (MD5 is 128 bits long, SHA-1 is 160 bits and WHIRLPOOL is 512 bits - all small enough that signing them is no big deal.) #4 the hashes should be computationally easy only in one direction: so it's hard to make a message with a specified hash. When I say "hard," I mean that ideally it would take about 2^(hash length) tries to find a message which would have a specified hash.

This last point is vital: when you make a digital signature, you're claiming authorship for every message which has that hash, since the hash is the only thing you sign. (If somebody distributed a forged document withe same hash as a document you signed, they could claimed you signed the forgery.)

Hashes are also used to protect passwords. You want programs to be able to identify if a password was correct, but for security reasons it's a bad idea to store the password itself on your disk. To get around this problem, most software stores only a hash of your password on your disk. To check that subsequent entered passwords are correct, programs do a hash of the entered text and compare it to the stored hash. As long as the has has good crypto strength, people with read-only access to the file containing the passwords will not be able to guess the password from the hash. (Aside: Windows by default uses an infamously insecure algorithm for storing password hashes, the LM hash, which requires only about 2^36 operations to crack. Even a general-purpose modern computer can brute-force Windows passwords in a few hours, and you can speed that up to a few minutes by using pre-computed rainbow tables. Insane!)

Potential Pitfall: Birthday Attacks

SHA-1 (Secure Hash Algorithm) is used industry-wide as a purportedly secure hash algorithm (i.e. one that satisfies #4 above). It's still pretty good, but it's starting to show its age. One of the best ways to attack the signed hash cryptosystem is to use what's called a birthday attack, named after the birthday paradox (which says if you have more than about 25 people in a room, chances are that two people will share the same birthday - the trick works because the number of possible birthday collisions is 25 * (25 -1)/2 = 300 - the number of potential pairs goes as the square of the number of people).

To do a birthday attack, the villain chooses a message you'd be happy to sign (A, which could be some innocuous legal document to be signed by a lawyer) and an evil message he wants you to sign (B which can be anything). He then looks for strings of invisible fluff (c and d) which he can append to A and B such that the hash of Ac will be the same as the hash of Bd. (The invisible fluff can be a string of mixed spaces and non-breaking spaces, or comments in an .html document, or tons of other things which won't affect the appearance of A and B but will change their hashes.)

Here's the bad news: although for a good 160-bit hash you'd have to make 2^159 guesses of d such that A and Bd would have the same hash, for a birthday attack the villain generates only about 2^80 fluff strings c and d. Chances are that for one of the 2^160 pairs of c'd and d's, the hash of Ac will be the same as the hash of Bd.

Real World Implications

Already there have been successful birthday attacks against MD5 (the 128-bit hash I mentioned), and SHA-1, which is an industry standard, is starting to show cracks as well. In 2005, Xiaoyun Wang, Andrew Yao and Frances Yao have found a shortcut to do birthday attacks on SHA-1 such that only about 2^63 computations are needed. (If SHA-1 were a better hash function, no attack faster than brute force would be possible, and that would take 2^80 operations). Even today, 2^63 operations is feasible with the right hardware: if a teraflop specialty purpose machine (like the Geforce 8800) costs about $500, then to make a malicious pair of messages Ac and Bd in a year you'd need about a billion dollars in computer resources.

Computers are going to get faster, and cryptanalysts (maybe) are going to find faster-than-2^63 attacks on SHA-1. My prediction is that birthday attacks against SHA-1 are going to become widespread some time within the next 20 years.

Staying One Step Ahead


Replacements for SHA-1 are in the works. There are hashes with longer digests which are already public standards: SHA-256, SHA-512 and WHIRLPOOL (with 256, 512 and 512 bit digest lengths), but they haven't been as widely scrutinized as SHA-1. (I bet they're all pretty good though, but I'm not a pro cryptanalyst.) If you're a programmer, consider coding software in a modular-enough way that you can drop in different hash functions into your code easily, and that different hash lengths don't mess up your program.

Until better software comes along, I'd recommend that people working on big, secret important stuff adopt the following two policies:
  1. Always edit a document sent to you before signing it in some unpredictable way.
  2. Always keep a copy of the document you actually do sign.
Point 1 will protect you from birthday attacks. If you change Ac even slightly, the billion-dollar crack attempt made by the villain will be completely worthless, since he has a Bd which hashes to the unmodified Ac. Point 2 will make sure that even if your signature gets broken and somebody claims you signed Bd, you can whip out the document you actually did sign and show that somebody made a pair of hash-colliding documents. (You won't be able to prove if it was you or the villain, but at least there would be reasonable doubt.)

Back to the Real World

Of course, these day's it's much cheaper to hire a spy to infiltrate your organization than to generate a billion-dollar hash-colliding document pair and hope it's signed without modification. I'm a silly crypto-hobbyist for suggesting you should worry about anything but a side channel attack. And even then, I've found that the vast majority of folks are too concerned with their own business to try to hack yours. I routinely accidentally leave my door not just unlocked but wide open, and I have yet to be stolen from at home. The world's a safe place; you don't need to worry about digital security. I just have a little-kid-in-treehouse mentality when it comes to fancy computational methods for making rock-solid crypto. How about you? Do any of my readers think crypto is fun? Should I blog more about it?