Yahoo URL Canonicalization

Yahoo has added a cool new feature to their Site Explorer:

Today comes a new wave for search engines with the first-ever Beta launch of ‘Dynamic URL Rewriting’ in Site Explorer. The new feature provides the ability for site owners to alert Yahoo! of the dynamic parameters in URLs that they’d like Yahoo! to ignore, which we’ll then automatically rewrite accordingly. Try this out for all the cases where you’d want to use parameters in your URLs that don’t affect the content of your page, but that have other important uses.

This is awesome. The problem is that this data should ideally be exposed to every search engine.

Open Search has a URL template mechanisms for creating new URLs.

I wonder if something could be done for the larger web as a whole (maybe via sitemaps) so that crawlers like Spinn3r could canonicalize URLs internally based on public data and a discovery mechanism.

Internally, Tailrank uses a bunch of custom URL canonicalizers for large sites like the New York Times. It obviously doesn’t scale but since 20% of the sites drive 80% of the traffic we just have to have about a dozen manual entries for really problematic sites.

If sites were to create their own canonicalizers we could rely on their own work to scale into the long tail.

%d bloggers like this: