Bad Statistics

There are three kinds of lies: lies, damned lies, and statistics. --Benjamin Disraeli

It is easy to lie with statistics. It is hard to tell the truth without it." --Andrejs Dunkels

Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital --Aaron Levenstein

Abusing statistics can be done in a number of ways:
• Gathering
• Representation
• Interpretation

Making mistakes when gathering the data is easy. For instance, when using a questionnaire, make sure that the people filling it in are representative for the population at large. A streetcorner in New York would exclude most cardrivers, most Californians, most farmers and so on. Be aware of who was interviewed. When using automatic data gathering, be aware of what exactly it is you're gathering. Sometimes data can be obscured. (For instance, adding 1200 tokens to a WikiPage, then coming back a month later and correcting some spelling mistakes (while no one has edited in between) does not mean you've added those 1200+ tokens that day).

If you want to lie, the easiest way is in the representation. If Joe makes twice as much as Bob, you can draw Joe holding two bags of money, and Bob holding one. However, if you draw Joe's bag twice as big, it will look eight times as big (height * width * depth). Another famous example (I googled, but couldn't find a picture) is the one with a map of the USA, where the sum of the income of all the colored in states is the total annual budget of the government (or something like that, I forgot the exact details). By coloring in the states with a large area and a low population density (typically the westmost states) you'd get the impression the budget is huge, while coloring in the eastern states would've shown a totally different picture. Of course, hardly anyone would want to argue that the spending isn't that bad, so...

Then there's the interpretation. This is wide open to MistakenExtrapolation, HiddenCommonCause and PostHocErgoPropterHoc arguments. For instance, let's say that 90% of harddrug users started out using softdrugs (Do people outside TheNetherlands make that distinction? Drugs that aren't physically addictive, like marijuana, are considered softdrugs, while drugs that are, like heroine and cocaine, are considered harddrugs. Of course, strictly speaking, this would make caffeine and nicotine harddrugs.) This may make it look as if softdrugs lead to harddrugs. However, it is quite probable that 99% of all alcoholics started out drinking water, showing that the conclusion is unwarranted.

More content wanted; page created because it seemed wanted.

See also: Most of the examples on this page are in HowToLieWithStatistics.

EditText of this page (last edited September 3, 2009) or FindPage with title or text search