More specifically, an analysis of word frequencies done on 29,213,800 words, taken from TV and Movie scripts and transcripts that were available on the Internet in February 2006, shows that just 63 words account for 50% of the 29,213,800 words collected.
That's amusing. What's more interesting is my source, the article A practical model for analyzing long tails by Kalevi Kilkki, Principal Scientist, Nokia Siemens Networks, in the May issue of First Monday.
I'm a fan of Chris Anderson, have read his article, skimmed his book and I have commented on the subject before. Kilkki's article is less anecdotal and more mathematical grounded than Chris Anderson (without being inaccessible).
Plus Kalevi Kilkki's got tons of interesting graphs. :-)