Power Law vs. Log-Normal Distributions

Following and building on my Monday Musing post Regarding Regret, Abhay Parekh wrote Big Fat Regret, in which, among other things, he points out that the frequency of events we consider significant enough to really merit regret seems to follow a power law. Then, Robin had this post about Cosma Shalizi’s disagreement with Albert-László Barabási’s observation that delays in responding to correspondence, be it email or snail mail, also follow a power law. So I asked Abhay about it, and he responded by saying, “This is actually a new trend — take someone’s claim that something is a power law/lognormal and then claim it is actually distributed the other way. Frankly, the two distributions are very close… The interesting thing to me about regret is that it displays this odd pattern of elephant and mice along multiple time scales,” and he pointed me to this very interesting discussion of the distribution issue at Suresh Venkatasubramanian’s blog:

Over the last few days, there has been much discussion of a paper in Nature by Oliveira and Barabási on the correspondence patterns of Darwin and Einstein. One of the main conclusions of the paper was that at some point, their response patterns started following a power-law distribution, with coefficients such that a finite fraction of mail went unanswered.

Soon after this, there was a response suggesting that the analysis was flawed, and in fact the data followed a log-normal pattern, rather than a power-law. Cosma Shalizi and Aaron Clauset weighed in on this as well, on the side of the “log-normal folks”.

As Geomblog denizens might know, I had a while ago mentioned a fascinating article by Michael Mitzenmacher on the difference (and similarity) between log-normal and power-law distributions, and the natural processes that generate them. I asked Michael if he’d like to weigh in on this controversy, and what follows below is his note. He comments not only on the technical issues involved, but on the whole issue of PR in science, and the contributions that Barabási has made to the field, and his perspective is very interesting indeed.

For those of you who might wonder why computer scientists (and theoretical computer scientists at that) should concern themselves about such matters, I’ll refer you to the reading list for Jon Kleinberg’s course on Information Networks, where you’ll see how many aspects of link structure analysis on the web (which in turn is used for search engines like Google) relate to understanding power law distributions.

And now for the article…

Suresh asked me to comment on a current controversy: a paper by Barabasi et al claims that many human interactions, including the example of Darwin’s and Einstein’s letter response frequency, are governed by a bursty process that leads to power law tails. A rebuttal by Stouffer et al claims that the distribution is really lognormal.

I believe Suresh asked me to comment because (here comes the blatant self-promoting plug) I had written a survey on lognormal and power law distributions (and a slightly more fun, less edited earlier version). I’m embarassed to say I had not heard of the controversy, but it looks like a lot of fun for those of us who enjoy this sort of thing. Rather than focus on the specific technical details, let me make two high level comments…

More here.