Three-Toed Sloth on Darwin and Einstein and Email

Cosma Shalizi weighs in on Albert-László Barabási’s take on correspondence in email, and other forms, follow a power law distribution because of a queuing process.

[T]his is not true; the apparent power law is merely an artifact of a bad analysis of the data, which which is immensely better described by a log-normal distribution. . .

As every school-child knows (at least, these school-children do!), adding together many independent random variables, each of which makes a small contribution to the over-all result, generally gives you a Gaussian or normal distribution (unless the contributing variables are, themselves, kind of pathological). This fact is the central limit theorem.

What happens if the inputs are multiplied together, rather than added? Well, take the logarithm: log(XY) = log(X) + log(Y). The logarithm of the product will be the sum of the logarithms of the inputs. The latter will still be independent, so the logarithm of the output will be normally distributed. Undoing the log gives what’s imaginative called the log-normal distribution. Log-normals are very common, for the same reasons that normals are. Unlike normals, they are very easy to mistake for power law distributions, especially if your knowledge of statistics is as limited as most theoretical physicists’. (The distribution of links to weblogs, for instance, is much better fit by a log-normal than a power law, as we’ve seen.)

Update: Cosma points to the original paper in which Stouffer, Malmgren and Amaral properly reanalyzed the data. His post largely reports on their work.