Archive for the ‘aggregation’ Category

200705302030Google Gears launches today and brings together a lot of open loops in my career.

While at Rojo, we spent a lot of time talking about offline storage. NewsMonster was the first RSS aggregator that added full offline support (which I’m still proud of – only took Google five years!) and we generally wanted it for Rojo as well.

Brad deserves a lot of credit for pushing this forward with Dojo offline storage. In fact, I’m a bit shocked that Google didn’t approach Brad to hire him to push this forward. If they don’t hire him now they’re insane.

What’s interesting here is that Gears is Open Source which really puts and end to the browser vs desktop debate:

Google Gears is open source software, licensed under the New BSD license. Generally speaking, this license is very permissive. You should, of course, always consult an attorney if you have any questions about software licensing.

There are generally two ways to use Google Gears: by embedding the API or runtime software in an application you distribute to end users, or by writing a web application which makes use of installations of Gears on end-users’ computers.

The only thing left I think is a local installer to keep shortcuts for online apps available in the start menu and on the desktop.

I’m not sure where this leaves Dojo offline. There were some significant limitations due to the fact that it was using flash and cookies and other ‘hacks’ (in the clever sense of the word) to store content locally.

200609051936I’ve been a big fan of Six Apart for a few years now. Not only do they have a great blogging service (and Vox seems poised to take over the world) but they just acquired Rojo as well.

Six Apart will be issuing a press release on the subject and I’ll let them give you all the juicy details once thats available.

In the mean time Om Malik notes:

Blogging company Six Apart will soon announce it has purchased Rojo, the web-based feed reader, for undisclosed terms.

Six Apart won’t be adding an aggregator based on Rojo, but instead incorporating some elements of the technology into its existing products, according to Six Apart CEO Barak Berkowitz. Rojo CEO Chris Alden will run Six Apart’s Movable Type group

Niall Kennedy comments:

Blogging company Six Apart has acquired online feed aggregator Rojo Networks. Rojo will be integrated with the Vox blogging tool allowing users to browse updated content and create more blog posts. Rojo CEO Chris Alden will be the new head of Movable Type according to a GigaOm report.

I helped co-found Rojo almost three years ago to build a killer online RSS aggregation service. Literally. Before we had a name for Rojo we called it the KSA (Killer Server-side Aggregator). Rojo lead the RSS space in a number of key areas including mobile support, feed search, and integrated social networking.

For the last year I’ve been independent (working on Tailrank actually) but still remained involved in an advisory capacity.

In hindsight, I don’t ever think Rojo was given the credit it deserved. Feed search in particular. In fact, earlier this year when Ask/Bloglines released their feed search it was pointed out that Rojo had been doing the same thing for months.

Six Apart has big plans for Rojo. They’re going to take Rojo’s RSS infrastructure and build it into LiveJournal and Vox which sounds pretty interesting. You can bet I’ll be paying attention…

Luckily, Rojo was located in blogger gulch (AKA SOMA) in San Francisco which is also the home of Technorati and Feedster. The employees literally only have two extra blocks to commute to their new offices.

Best of luck on the new gig guys!

Update:

Techcrunch has a few notes:

Terms of the deal were not disclosed, but our assumption was that this a less than $5 million deal. Six Apart is not planning on continuing to build out the core Rojo products. In the press release (sorry no link available yet), Six Apart says “Six Apart intends to sell a majority interest in Rojo’s newsreader services in the coming months,” meaning they will become a minority stockholder of the service. Rojo founder and CEO Chris Alden and CTO Aaron Emigh will joing Six Apart’s executive team.

… and so does ValleyWag:

We hear GigaOM founder Om Malik heard about this deal when he saw Alden and 6A CEO Barak Berkowitz outside 6A’s office.

Update 2:

Six Apart finally issues a press release:

San Francisco, CA —September 6, 2006—Six Apart, the world leader in blogging software and services, today announced that it had acquired Rojo Networks for an undisclosed sum. Rojo senior executives Chris Alden and Aaron Emigh joined the Six Apart team as executive vice president and general manager of Movable Type, and executive vice president and general manager of core technologies, respectively. Six Apart intends to sell a majority interest in Rojo’s newsreader services in the coming months.

Update 3:

You can follow this over on Tailrank… For some reason it picked up Valleywag twice. I’m going to have to fix that.

TalkCrunch has a great interview with all the big players in the online RSS aggregator war. Generally a good podcast but at 50 minutes its kind of a lot for one sitting. Then again I don’t think you’d be able to a decent job in only 20.

Hopefully the CoComment developers are paying attention to Technorati. I want a beta account.. hook me up! This has been a problem I’ve been thinking about for a while now (during my Rojo days) and I have some ideas how I might want to dovetail your service into TailRank.

So show me the love baby!

Update: W00t. I’m in! And they have the feature I needed for TailRank. More on this later….

Ever since FeedBurner launched I’ve been a bit concerned that they would introduce a vulnerability into the feedosphere. They would add another man-in-the-middle into your hosting scenario and if any component failed then you’re offline. Now your reliability is a function of the weakest link in the chain.

Sometimes though crisis is opportunity. Last month it dawned on me that if FeedBurner did it right that they could actually double the reliability of the feedosphere. All they would need to do is add a simple DNS trick that redirects the user back to the source feed if they have a critical system outage. (they would of course need to have separate and reliable DNS).

One of the capabilities we are very excited to roll out in Q1 will be something we are calling Feed Insurance. We have been laying the groundwork for geo-redundancy (multiple hosting facilities that can pick up the slack if one facility goes down). Hat tip to Kevin Burton for an idea he presented to us as a very nifty spin on feed availability. Kevin’s suggestion was to have a separate facility that just redirects feed requests to the publisher’s source feed if anything catastrophic happens to FeedBurner’s hosting network.

I think now the issue becomes more of why you wouldn’t want to use FeedBurner. If you’re on a hosting provider that might have stability problems in the future (and lets be honest – that’s everyone) then you might want to consider switching to FeedBurner and double your reliability.

I originally talked about this when I was a guest on Om and Niall podsessions.

I finally had some time tonight to finish up our feed delta API within Tailrank.

This new API allows a developer to get a set of blog posts added to the blogosphere since a given date.

Return a list of feed items new to TailRank since a given date. Feed items are blog posts from RSS/Atom feeds that TailRank is currently indexing. This allows developers to obtain a snapshot of the blogosphere as our spiders are running. This reduces a significant amount of work for the average developer including spam suppression, priority scheduling, Atom/RSS protocol issues, etc.

Right now the output is only RSS 2.0 but I plan to add Atom 1.0 as well as the AHAH microformat.

The licensing will be very flexible. If you’re a non-profit doing something cool we’ll most likely give you free access. If you’re a for-profit we’ll have to work out a licensing arrangement. Next week I’ll have an API key registration system available and all methods will require a key.

Our index currently consists of about 98% English blogs. The goal is to have it as 100% English and then to split languages into dedicated indexes. This way we’ll have an English version of TailRank, French version, Japanese, etc.

We’re currently indexing about 30,000 blogs. These are blogs with very high ranking (essentially the top 30k blogs in the blogosphere)

Maximum index time is around 60 minutes. I’ll be adding XMLRPC ping prioritization in the next week.

Right now our API is still beta. If you have any problems or feature requests please feel free to send me an email.

Looks like Dave Winer is working on another aggregator. Hopefully it will have a lot of new ideas because I think the feed reader space is a war of attrition right now.

It would be really slick to see Dave integrate OPML and SSE and maybe namespaces.

If he does I promise to add TailRank support.

Ajax + Aggregators?

Hm. Which RSS aggregators would support syndicated Ajax? Only the desktop apps and only those apps which use a real browser control and don’t strip javascript. I think NetNewsWire would qualify.

Rojo, Bloglines, and NewsGator wouldn’t be able to enable this due to fear of cross site scripting attacks (which is sad).

Looks like Inform.com launches and is in ‘true’ beta.

We hope that you’ll take the time to provide us with feedback about likes, dislikes, and any other ideas for improvement. Please register with us so we can let you know about enhancements — we expect to make significant progress by the end of this year.

It’s great that they’re launching a beta which needs improvement. Instead of being negative I’m going to try to be constructive here since they’re still in beta and wanting feedback.

1. They’re abusing Javascript and Ajax functionality. All links on the page are

javascript:

links which means I don’t know where I’m linking to. It also means I can’t use tabbed browsing. Pure evil. Fix this ASAP guys.

2. What’s with the annoying flash yellow blinking flash ad to the right of the page? Ads are fine but in an information product you want them to be non-intrusive.

3. What’s with launching your site in a new window which doesn’t have the navbar visible? I want my back button guys! It seems that the site just doesn’t work with the back button. Just because you hide it doesn’t mean you can’t invoke the back function. ALT-back on Windows/OSX and Linux will cause the browser to navigate backwards but on Inform it just doesn’t work.

4. Their registration is super easy. I don’t like the javascript alert() popup though. Nasty. Once I registered it automatically logged me ini which is cool.

5. I like the way you detect news images. Very cool… +1.

All and all though I’m giving them a C- … The functionality isn’t very impressive and their 1995 old school DHTML development model needs to go. With 55 developers I’m sure they’ll have these problems fixed soon! :)

OK gang.

What’s the one feature you need in an RSS aggregator before even considering a move.

I’ll start… I must have OPML import because there’s no way I’m going to manually enter in all my feeds. Of course they also need to have OPML export because I don’t want to become locked in.

Leave your thoughts in the comments. Hopefully a few aggregation companies will read this post :)

The WSF has a nice fluff (in the best possible sense of the word) piece on blog search engines:

Web logs, online diaries written and published by everyone from college students to big media companies, are being created and updated at an astonishing rate — and established search companies such as Google Inc. and Yahoo Inc. don’t always catch them fast enough. Now, a handful of closely held upstarts such as Technorati Inc., Feedster Inc. and IceRocket.com LLC see an opportunity: Build a search engine that can track the information zipping through blogs, nearly in real time.

We have two main spaces now. RSS search engines and RSS aggregators. There’s a slight amount of overlap but the players that focus seem to be doing well (though there’s room for improvement).

I’d like to see more partnerships develop here. Bloglines using Technorati search or Rojo using Feedster search (and vice-versa). I would have thought we’d have seen this already but I guess not.

Feedburner seems to be the only one playing across the board. This is of course because users are driving their adoption which is a lot more decentralized if you think about it.

Looks like Ben should win a bloggy for this one:

Also waiting in the wings was the ex-Rojo-RokR Kay@Burton who deep in the corner of the club had a “hunch.. that they needed to ship this API as a way to integrate with a 3rd party and it leaked out.” It also wasn’t surprising that DJ Think Secret remixed The Mini…sampling – “The music player will ditch its hard drive and move entirely to solid state, flash media, a move that sources familiar with the new design say will shave 20 to 25 percent off the size of the unit.”

Wow. This is the funniest post about RSS news I’ve ever read. Except for the Winer Number post

Vegetarian Gyoza. lol.

Russell notes that his beloved Bloglines is having scaling problems:

Lots of Bloglines folk have pinged me about the fact that my site isn’t updating. Not much I can do. I’ve emailed them several times to fix their “jsession” bug (where they include Java Server Sessions as part of the URL) or to just delete every feed of mine except for the index.rss main feed. But that doesn’t seem to happen. I’ve done redirects on old feed URLs so it should work. I think Bloglines is starting to get crufty – there’s lots and lots of things that aren’t working, and the site is starting to bog down like crazy. I was really hoping that the move to Ask Jeeves would accelerate updates and improvements, not stall them.

I cite this not to pick on Bloglines but to point out that a lot of players in the space are having scaling problems. I wish Bloglines the best of luck. I’m staring to sense a trend.

Launch a site on commodity hardware and software. Commodity hardware holds up but the software doesn’t scale.

It’s time for Internet-scale databases. Enough is enough. Give me clustering or give me death.

logoNice. PubSub increased the space race here and released their PubSub LinkRanks 1000:

The PubSub LinkRanks 1000 is a list of the most consistently influential sites that publish feeds, based on their average LinkRank scores from August 1st to August 30th, 2005. To learn more about LinkRanks, click here.

To create this list, we’ve averaged the daily LinkRanks of more than 15 million sources. We’ve also included a 15-day average as well as daily LinkRank data for August 30th as additional points of comparison.

This is twice as good as the Feedster 500 and 10x as good as the Technorati Top 100! :). That said it’s still not long tail enough. There are still 14,999,900 blogs that aren’t ranked here (in PubSub’s index). Still pretty damn cool.

I’m no where to be seen on the list (feedblog.org). Hopefully it’s not personal and just because my blog is so new.

Looks like Bloglines Citations is Kaput!:

It’s been almost a week since Bloglines Citations responded with anything other than…

There is a problem with the database. Please try again later 

I hope Mark reads this. But since Citations is broken he’ll have to use Feedster to find it :-( 

I wonder if this might be a point where Blogines starts to have more scalability problems. I have to admit that they’ve scaled pretty well or at least avoided the issues becoming publicly aware. Every service seems to have problems sooner or later (Rojo, Technorati, Feedster, Friendster, all come to mind) and scaling is hard.

At first it seems like an opportunity for you to nab some users from your competitor but sooner or later you’ll have similar problems. At the end of the day it just ends up hurting consumers.

The main issue is that the tools most people are using just don’t scale. Building a cluster that can handle the number of transactions necessary is a very difficult problem. Hopefully in the next few years we will have Open Source tools which will allow even the novice and small company to build a decent and scalable cluster and have it scale.

Google Wallet soon?

Apparently Google Wallet is coming soon?:

Google Wallet, the rumored PayPal type payment system powered by Google, may be on its way soon. An insightful tipster noted that http://www.google.com/wallet does the same 404 redirect that started before Google Talk launched.

Hopefully they’ll provide a decent API. I’d really like to be able to setup payment services but with Google Wallet (in competition with Paypal) but the last time I played with their API it was kind of broken.

Here’s hoping that their Terms of Service allows you to use both Paypal and Google Wallet at the same time. I’d hate to be in a similar situation with Adsense/YPN.

Eric Hayes has a great post on RSS ranking:

But that’s only part of the story. What’s missing from the discussion are the specifics of how focusing on Attention can help individuals alleviate some of the enormous demands on our attention by cutting through information overload. And RSS isn’t helping with the problem. In fact it’s only increasing the flow. But the end goal of RSS technology shouldn’t be based on more is good. It needs to be tightly wrapped around the promise that less is more. We need to offer our RSS consumers a drinking fountain of information, not a Fire Hose!

I agree that the drinking fountain is more powerful than the full fire hose I just disagree that attention is the best approach. The attention data is amazingly sensitive (do you really want to share your click stream?) and I just think the privacy implications are too onerous.

There’s a lot you can accomplish with public data. I’ve ran simulations on theoretical systems and the private data just doesn’t seem to augment the public data by that much. Once you add the computational complexity you’re over the top and it’s just not worth the difficulty.

logo_pnWell this has worked before. If I need to get into a Beta program (like YPN) I just beg publicly on my blog. Hopefully someone at Yahoo will feel my pain and let me in.

I’m an awesome beta tester and promise great feedback!

A small comment started me thinking:

That, my friend, is what everyone wants right now. Web 2.0 needs Data 2.0! A del.icio.us for files.

Looks like Sam and Bitworking found this interesting as well…

I’ve been thinking about this from time to time recently. I hate filesystem hierarchy (especially on *NIX). Instead of organizing files into hierarchy why not just create arbitrary tags for them. If we’re smart we’ll add a remote way for someone else to tag your files when you email them. Either way I can never seem to find the files I want.

Desktop search seems to help a bit but not much. Back in my P2P days I talked with Sam Joseph about NeuroGrid and I think he’s more passionate about hierarchies being evil than I am.

I have a note taking system which I wrote for Emacs that I use all the time. I can keep track of my notes with tags (wrote it 4 years ago). Every day I create about a dozen little notes and tag them. I can then iterate over them by tag (TODO, today, linux, research, reading, etc).

I find that this system is a lot more productive than anything I’ve ever used.

Writing a backend for a local file tagger API wouldn’t be too hard. Transparent integration within your applications would be difficult though.

rojologoNice! Looks like Rojo has launched their scriptlets support.

So you want to add some Rojo features to your blog or web site? Well, you have come to the right place. Here you will learn how to add some simple scripts to your blog template or web page page that will make some of your Rojo experience available to readers on your site!

The official Rojo blog has more:

We are very happy to announce Rojo Scriptlets! Rojo scriptlets are one line scripts that bloggers and other publishers can use to re-post content from Rojo onto their blog. Rojo users can choose to show the most recent headlines from the feeds they subscribe to in Rojo.

I’m personally excited to see this released because I wrote a prototype of this functionality a while back. Of course 90% of the work is never the initial feature but supporting it and making sure it works in production so hats off to the Rojo gang for making this happen!

There’s more functionality here that Rojo has yet to release so I don’t want to let the cat out of the bag. Needless to say it’s pretty cool.

If you’re reading from RSS make sure to load this blog post in your browser and you’ll see that my blog is not hosting Rojo tags in the right sidebar.