Feed Update Protocols and SUP
When you add a web site like Flickr or Google Reader to FriendFeed, FriendFeed’s servers constantly download your feed from the service to get your updates as quickly as possible. FriendFeed’s user base has grown quite a bit since launch, and our servers now download millions of feeds from over 43 services every hour.
It looks like the rapid fire site updates are about to start again for the social content conversation site FriendFeed. Just a few days after the launch of its new “beta” area, FriendFeed is finalizing a new technology that could help pull content into the site at a much faster rate.
The technology, called Simple Update Protocol (SUP) will process updates from the various services that FriendFeed imports faster than it currently does using traditional Really Simple Syndication (RSS) feeds, FriendFeed co-founder Paul Buchheit told Tech Confidential.
Spinn3r has a similar problem of course but we have 17.5M sources to consider.
The requirements are straight forward:
* Simple to implement. Most sites can add support with only few lines of code if their database already stores timestamps.
* Works over HTTP, so it’s very easy to publish and consume.
* Cacheable. A SUP feed can be generated by a cron job and served from a static text file or from memcached.
* Compact. Updates can be about 21 bytes each. (8 bytes with gzip encoding)
* Does not expose usernames or secret feed urls (such as Google Reader Shared Items feeds)
Sites wishing to produce a SUP feed must do two things:
* Add a special tag to their SUP enabled Atom or RSS feeds. This tag includes the feed’s SUP-ID and the URL of the appropriate SUP feed.
Interesting that this is seeing attention again because Dave proposed this in RSS 2.0:
is an optional sub-element of .
It specifies a web service that supports the rssCloud interface which can be implemented in HTTP-POST, XML-RPC or SOAP 1.1.
Its purpose is to allow processes to register with a cloud to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds.
In this example, to request notification on the channel it appears in, you would send an XML-RPC message to radio.xmlstoragesystem.com on port 80, with a path of /RPC2. The procedure to call is xmlStorageSystem.rssPleaseNotify.
However SUP is not XMLRPC (which is probably good since I’m a REST fan)
By using SUP-IDs instead of feed urls, we avoid having to expose the feed url, avoid URL canonicalization issues, and produce a more compact update feed (because SUP-IDs can be a database id or some other short token assigned by the service).
This can be avoided by just using the unique source URL. The feed is irrelevant. Just map the source to feed URL on your end.
Because it is still possible to miss updates due to server errors or other malfunctions, SUP does not completely eliminate the need for polling. However, when using SUP, feed consumers can reduce polling frequency while simultaneously reducing update latency. For example, if a site such as FriendFeed switched from polling feeds every 30 minutes to polling every 300 minutes (5 hours), and also monitored the appropriate SUP feed every 3 minutes, the total amount of feed polling would be reduced by about 90%, and new updates would typically appear 10 times as fast.
Spinn3r performs a hybrid. We index pinged sources once per week but also index right when they ping us. Best of both worlds basically.
The current ping space is across the board though.
There’s XMLRPC, XML, the Six Apart update stream and now JSON:
This doesn’t seem too different from Changes.xml…
I’m not sure what the solution is here but it’s clear we need some standardization in this area.
One suggestion for SUP is to not use a JSON-only protocol. Having an alternative REST/XML version seems to be advantageous for people who don’t want to put a second parser framework in production.