Cornell Researchers Launch Memetracker Powered by Spinn3r
Jure Leskovec, Lars Backstrom and Jon Kleinberg (author of the HITS algorithm, among other things) built MemeTracker by tracking the hottest quotes from throughout the blogosphere and rending a graph by the grouping quotes and then tracking the number of quote references.
MemeTracker builds maps of the daily news cycle by analyzing around 900,000 news stories per day from 1 million online sources, ranging from mass media to personal blogs.
We track the quotes and phrases that appear most frequently over time across this entire spectrum. This makes it possible to see how different stories compete for news coverage each day, and how certain stories persist while others fade quickly.
The plot above shows the frequency of the top 100 quotes in the news over time, for roughly the past two months.
Here’s a screenshot but you should definitely play with MemeTracker to see how it works:
We’ve been thinking of shipping a new API for tracking quotes across the blogosphere. Our new change tracking algorithm for finding duplicate content also does an excellent of finding quotes.
Tracking duplicate content turns out to be very important in spam prevention and ranking. It just so happens that there’s a number of overlapping features and technologies that these things can provide.
We’re not ready to ship it just yet because the backend requires about 2TB of random access data. This isn’t exactly cheap so we’ve been experimenting with some new algorithms and hardware to bring down the pricing. I think we’ll be able to ship something along these lines once we get our next big release out the door.