The comment is for the comment i replied to, not the linked video. If it was for the video i'd post it as a top level comment.
(though if anything around 1999 things were most likely looking towards desktop use, it wouldn't be until a few years later that computing would turn away from desktop and towards the web and later towards mobiles)
robots.txt is supposed to be helpful to the robot too.
If you write a crawler, you probably don't want it to waste time indexing a list of articles in every possible sort order, trying all "reply" buttons, things like that.
For me, a "Disallow" line in robots.txt means "don't bother, nothing interesting here". It is a suggestion that benefits everyone when followed, not an access control list.
>If you write a crawler, you probably don't want it to waste time indexing a list of articles in every possible sort order, trying all "reply" buttons, things like that.
On the other hand, many websites (like wikipedia here) hide interesting pages behind a Disallow.
I think the concern is more accurately: you must go out of your way to honor robots.txt.
I think both robots.txt and security.txt are great ideas. However, they will always only be useful to those who follow the wishes of the website (which hopefully outweight those who do not).
Part of my monthly maintenance on an independent Mediawiki install is to cross-reference our robots.txt (which is based on Wikipedia's) and server logs.
If a client or IP range is misbehaving in the server logs, it goes into robots.txt. If it's ignoring robots.txt, it gets added to the firewall's deny list.
I've tried to automate that process a few times but haven't ever gotten far. It's unending, though. Feels like all it seems to take is a handful of cash and a few days to start an SEO marketing buzzword company with its own crawler, all to build yet another thing for us to block.