Monday, March 19, 2007

The Google Hype-Meter

Greetings, West Nile mosquito-swatters.

I hope that by now you will be familiar with my style of putting over-hyped risks in their place. But how do you determine how over-hyped a problem is? Today I'm going to introduce a new metric to assess how out-of-proportion a particular death threat is: the Google Hits per Annual Fatality or GHAF metric.

Google and Hype

First of all, let me admit that reducing such a nebulous idea as "hype" to a number is an inexact science at best. However, I happen to be an inexact scientist: the perfect blogger for the job.

The people who post web content are not representative of the human race as a whole, so if there's something which netizens preferentially talk about, Google is going to reflect that bias. However, in most cases this bias will distort reality by at most about a factor of 10, so any enormous differences in the whole-world hype devoted to certain risk factors should be also present in a subject's Internet chatter. Luckily, some small risks are so enormously exaggerated that even an inexact measure like the GHAF can find them with confidence.

Calculating the GHAF

So, if we're agreed that Google hits will approximate the amount of talk on a subject, we can divide the number of hits by the annual death rate of a scare to get the GHAF, a relative measure of how much that particular problem has been overblown. Let's take a look at a few real-world examples of the GHAF.

Raw Data
  • Malaria in Africa (GHAF = 1.5, 3 million Google hits [1] per approximately 2 million annual deaths [2])
  • Cancer in the United States (GHAF = 94, 54 million hits [3] per 570 280 annual deaths: page 1 of [4], .pdf warning: 6 MB)
  • West Nile Virus in the United States (GHAF = 5 500, 911 000 hits [5] per 165 annual deaths [6])
  • vCJD, the human disease from eating a mad cow, worldwide (GHAF = 81 000, 1.4 million hits [7] for 139 cases over 8 years [8] - see my blog entry for an editorial[9])
  • Alligator Attacks in the United States (GHAF = 293 000, 461 000 hits [10] per 1.57 annual deaths [11] - possibly the fatality rate is underestimated by this list and possibly a lot of the Google hits came from attacks on non-human targets)
Summary

The GHAF hype metric has a huge variability. It is a few thousand time greater for West Nile in the US than for malaria in Africa. Working from the assumption that most human life should be treated with roughly the same degree of care, these wildly differing GHAFs indicate that we spend far too much time worrying about the wrong things. With the GHAF, we can measure just how skewed our fears are.

The above list is far from exhaustive; does anybody want to look into adding traffic deaths or killer bees? I've set up a wiki page to keep track of the GHAFs of various risks. Feel free to add to it!

In any case, there's a huge variation in how much hype a risk gets compared with the actual danger involved. I realize there are only so many articles one can read about a certain risk before becoming inured to it, so one would expect the GHAF to be lower for real risks as not as much press will go to the millionth victim as to the first. However, the number of Google hits a risk gets is not even an increasing function of associated body count, showing that our problems run deeper that just weariness over old news.

Conclusions

I've already introduced two new measures of danger, the life expectancy decrease (LED) and the equivalent driving distance (EDD). However, these measures only ask how dangerous an activity is; they do not report how much that danger has been magnified by the media. With the GHAF, we can quantify just how out-of-proportion the hype is around a certain fear, and perhaps allow this measure of exaggeration to shape policy.

I look forward to your additions to my wiki page. What will my intelligent readers discover?

2 comments:

Knaldskalle said...

Ah, again dear Leodopore, I am tempted to point out the flaws in your otherwise brilliant idea (GHAF ought to be a standard like the BigMac Index!).

Malaria is truly underhyped compared to vCJD, but who is it that mostly get malaria? Google-users? I think not.

In reality GHAF measures hype in the (Western?) Wired World, while most of Africa will be silent on almost any GHAF-meter, since they're less active on the internet (per capita, with the possible exception of Nigeria). And that probably explains why Malaria is so underhyped, if "we" (wired westerners) were dying from it in large numbers, there would likely be much more hype about it.

Language is another factor. If you google "breast cancer" you will also get hits from non-US searches in English, which will bias your data if you only include U.S. deaths from breast cancer. Likewise a "worldwide" GHAF on "AIDS" will be biased, since AIDS in French is "SIDA" (and large parts of Africa is Francophonic). Ironically, including all instances of "SIDA" will also bias you data, since that is the abbreviation for the Swedish International Development Agency (the Swedish equivalent of USAid).

This of course doesn't even take into account that a lot of internet hype is not as specific as we need it to be for GHAF use. How many times do you think people have searched for "mad cow" instead of "vCJD"? Or perhaps "BSE"? How many "alligator attacks" are searched for under "crocodile attacks"? You have to think a lot about which words to use in your GHAF'ing...

LeDopore said...

Dear Knaldskalle,

Of course you're right that the content of Google searches is biased towards the tastes and interests of those who have computers. You're also right that Google's not psychic, and so even if it returns related pages (e.g. "mad cow" when you search vCJD and "SIDA" when you search "AIDS") you still don't get an accurate measure of how many web pages out there are really relevant to your count.

Still I bet the GHAF will reflect the hype-to-danger ratio within a factor of about 10 even though web search hits are an inaccurate measure of number of relevant web pages, and relevant web pages are an inaccurate measure of whole-world hype. (Why 10? I'm just guessing; would you care to counter-guess?)

Because the GHAF is only accurate to within an order of magnitude, I've separated GHAFs on my wiki page into orders of magnitude. It's a pretty safe bet then that risks showing up in non-adjacent GHAF categories really do have different levels of hype-to-danger.

Even though the GHAF is inaccurate, I think it can still be useful because it is quick and easy to compute for pretty much any risk factor. We don't normally compare how overblown various issues are; and as long as the limitations of the GHAF are made clear I think it can be a useful way to reassess our priorities.