The Britney Spears Problem

Tracking who’s hot and who’s not presents an algorithmic challenge.

Brian Hayes in American Scientist:

Screenhunter_01_jul_10_1845Back in 1999, the operators of the Lycos Internet portal began publishing a weekly list of the 50 most popular queries submitted to their Web search engine. Britney Spears—initially tagged a “teen songstress,” later a “pop tart”—was No. 2 on that first weekly tabulation. She has never fallen off the list since then—440 consecutive appearances when I last checked. Other perennials include ­Pamela Anderson and Paris Hilton. What explains the enduring popularity of these celebrities, so famous for being famous? That’s a fascinating question, and the answer would doubtless tell us something deep about modern culture. But it’s not the question I’m going to take up here. What I’m trying to understand is how we can know Britney’s ranking from week to week. How are all those queries counted and categorized? What algorithm tallies them up to see which terms are the most frequent?

One challenging aspect of this task is simply coping with the volume of data. Lycos reports processing 12 million queries a day, and other search engines, such as Google, handle orders of magnitude more. But that’s only part of the problem. After all, if you have the computational infrastructure to answer all those questions about Britney and Pamela and Paris, then it doesn’t seem like much of an added burden to update a counter each time some fan submits a request. What makes the counting difficult is that you can’t just pay attention to a few popular subjects, because you can’t know in advance which ones are going to rank near the top. To be certain of catching every new trend as it unfolds, you have to monitor all the incoming queries—and their variety is unbounded.

More here.