Google is JUST NOW adding Duplicate Detection to Google News?

This is interesting. Apparently Google News is just now getting around to duplicate content detection:

Our goal has always been to offer users as many different perspectives on a story from as many different sources as possible, which is why we include thousands of sources from around the world in Google News. However, if many of those stories are actually the exact same article, it can end up burying those different perspectives. Enter “duplicate detection.” Duplicate detection means we’ll be able to display a better variety of sources with less duplication. Instead of 20 “different” articles (which actually used the exact same content), we’ll show the definitive original copy and give credit to the original journalist. (We launched a similar feature in Sort-by-Date and got great feedback about it.) Of course, if you want to see all the duplicates on other publisher websites with additional analysis and context, they’re only a click away.

The main Google crawler has done this for a long time now. Google’s clustering tech should have been able to catch this in the past. Tailrank’s memetracker text clustering does this already. As does Spinn3r to some extent (more to come here btw).



%d bloggers like this: