So let’s say you write a web scale crawler and you accidentally pushed a bug. It was a huge mistake and you hurt a few hosts and end up being blocked.
A month passes and you’ve implemented a fix and a number of other features which make crawling easier on hosts in your cluster.
… basically you want another chance to crawl these sites. The problem is that you now need to wait an eternity until they remove your robot block.
Do you ignore the block? That’s probably not right.
Do you create a new User-Agent so that you can slide through the robot block? Possibly. That might work. However, what if you’re blocked because people don’t like you (and it’s not a politeness issue).
I assume if it’s a non-crawlable directory they’re just going to use
One could extend robots.txt to include additional syntax so that would allow robots.txt to handle such situations but honestly how many users are going to use that extension.
They could always just remove the disallow rules…