Tracking Google Indexed Feed Republishers

OK guys. Time to invent a new word. Ready? Infinitepossum. Cool. Huh?

An infinitepossum is a little animal that helps you find sites which steal/borrow your content and turn around and have it indexed by Google. Most of the major RSS aggregators are nice enough to setup robots.txt so that you don’t get hit with a duplicate content penalty.

Just go ahead and search for infinitepossum and in theory this should be the only post shown.

Some of these sites are clearly stealing. Bitacle is one example. These guys are evil and stole an old version of the Netvibes code and are now stealing people’s content as well. A lot of these sites are just legit feed readers not realizing that this has become a problem. Hopefully this little experiment of mine will help them correct the error of their ways.

The only problem with the infinitepossum only works once and then dies. Kind of like a butterfly I guess.

Also… if you want to link to this post PLEASE DON’T USE THE WORD INFINITEPOSSUM so that we don’t taint the results.

This should be fun!

  1. So where do you draw the line?

    Tailrank uses other people’s content and puts ads on the page.

    I’m sure you didn’t get NY times or Washington’s post permission to use their content, and I don’t notice a DMCA takedown area either.

    The point I’m trying to make here is that ALL aggregators tread a fine line. When is it ‘stealing’ and when it is ‘adding value’/fair use. While you have your own belief, I’m sure other bloggers have a different line.

    your possum example is a good way to track where your content is being used, but I’m feeling this rant of yours is a bit like the pot calling the kettle black.

  2. Hey Ian.

    I don’t think we’re on the same page here.

    I’m not talking about the use or the profit of the content I’m talking about the damage that can happen to the ranking for a post due to Google duplicate content penalty issues.

    To be clear the ENTIRE POST will be pushed down in Google’s rankings due to this issue (even across all the sites).

    To be sure the whole discussion on the ethics of content syndication is an old one and I don’t want to confuse the issue by bring up ads. I’m just specifically worried about the Google issue I talked about before.

    It might be partially my fault for bring up Bitacle because they’re on the wrong side of nearly every ethical discussion RE content syndication.

  3. Although I agree that Bitacle is evil (content theft), I don’t think your experiment works. If you check the results today, you’ll find that the majority of the results are partial content with linking.

