WordPress.com and RSS Streaming Models.

Matt just announced that WordPress will support the new RSS cloud protocol.

This ping model has already existed with Ping-o-Matic of course (which Matt/Wordpress have been running for since the blog epoch) and Spinn3r customers already benefit from this. In fact, we’ve been realtime for a long time now.

WordPress.com has always supported update pings through Ping-o-Matic so folks like Google Reader can get your posts as soon as they’re posted, but getting every ping in the world is a lot of work so not that many people subscribe to Ping-o-Matic. RSS Cloud effectively allows any client to register to get pings for only the stuff they’re interested in.

We haven’t announced this yet but we pushed a new filtering API in Spinn3r in the last release. We developed a domain specific language for filtering web content in real time.

A number of our customers have already started using this in production.

It’s nice that more people are pushing realtime content but I’m starting to worry about the proliferation of protocols here. XMLPRC pings are the old school way of handling things. Pubsubhubbub, Twitter stream API, SUP, etc.

However, I’ve played with most of these and think that they are all lacking in some area. One major problem is relaying messages when nodes fail and then come back online. For example, with XMLRPC pings, or the Twitter stream API, if my Internet connection fails, I’ve lost these messages forever.

The Spinn3r protocol doesn’t have this problem and supports resume. You just start off from where you last requested data and nothing is lost. We keep infinite archives so nothing is ever lost.

I don’t think most sites can support this much data (it’s expensive) but certainly a few hours of buffer, held in memory, seems reasonable to handle a transient outage.

ReadWriteWeb has more on this and is leading with a somewhat sensational title that would imply that these blogs were not real time in the past.

Techcrunch has more as does Scobleizer

One big issue with these protocols is spam. If it’s an open cloud any spammer can send messages into the cloud (which is the case with Pingomatic which receives 90% spam). And of course spammers can receive messages from the crowd to train their own classifiers and find spam targets.

We have an AUP with Spinn3r that prevents this usage. We’ve removed spam from the feed to begin with which is nice for our customers and allows them to build algorithms without having to worry about any attacks.

  1. I don’t thnk spam is an issue, all that’s being sent is a notification that a feed updated. The user had to subscribe to the feed in the first place, and no content flows through the system except from the feeds the user subscribed to. No more potential for spam than RSS itself has.

  2. Hey Dave.

    For RSS readers you’re totally correct.

    Most of our customers are interested in bulk content monitoring so I tend to think in those terms.

    We’re currently monitoring 35M blogs, Twitter, etc. It’s a lot of content :)

    Huge torrent of data :)

%d bloggers like this: