Cornell Researchers Launch Memetracker Powered by Spinn3r

200810231630We have a number of other pending announcements of researchers building cool applications with Spinn3r but this one was just too awesome to hold back.

Researchers at Cornell have developed a new memetracker (cleverly named MemeTracker) powered by Spinn3r.

Jure Leskovec, Lars Backstrom and Jon Kleinberg (author of the HITS algorithm, among other things) built MemeTracker by tracking the hottest quotes from throughout the blogosphere and rending a graph by the grouping quotes and then tracking the number of quote references.

MemeTracker builds maps of the daily news cycle by analyzing around 900,000 news stories per day from 1 million online sources, ranging from mass media to personal blogs.

We track the quotes and phrases that appear most frequently over time across this entire spectrum. This makes it possible to see how different stories compete for news coverage each day, and how certain stories persist while others fade quickly.

The plot above shows the frequency of the top 100 quotes in the news over time, for roughly the past two months.

Here’s a screenshot but you should definitely play with MemeTracker to see how it works:

200810231629

We’ve been thinking of shipping a new API for tracking quotes across the blogosphere. Our new change tracking algorithm for finding duplicate content also does an excellent of finding quotes.

Tracking duplicate content turns out to be very important in spam prevention and ranking. It just so happens that there’s a number of overlapping features and technologies that these things can provide.

We’re not ready to ship it just yet because the backend requires about 2TB of random access data. This isn’t exactly cheap so we’ve been experimenting with some new algorithms and hardware to bring down the pricing. I think we’ll be able to ship something along these lines once we get our next big release out the door.



%d bloggers like this: