Hacker Newsnew | past | comments | ask | show | jobs | submit | user9383781's commentslogin

Trains have been using regenerative braking to stop massive multi-ton loads for decades. I don't think this is a real issue.


Strange comment for a video about the 1999 LinuxWorld.


The comment is for the comment i replied to, not the linked video. If it was for the video i'd post it as a top level comment.

(though if anything around 1999 things were most likely looking towards desktop use, it wouldn't be until a few years later that computing would turn away from desktop and towards the web and later towards mobiles)


I've always thought something like robots.txt was a bit silly when it's so easy to ignore.


robots.txt is supposed to be helpful to the robot too.

If you write a crawler, you probably don't want it to waste time indexing a list of articles in every possible sort order, trying all "reply" buttons, things like that.

For me, a "Disallow" line in robots.txt means "don't bother, nothing interesting here". It is a suggestion that benefits everyone when followed, not an access control list.


>If you write a crawler, you probably don't want it to waste time indexing a list of articles in every possible sort order, trying all "reply" buttons, things like that.

On the other hand, many websites (like wikipedia here) hide interesting pages behind a Disallow.


I think the concern is more accurately: you must go out of your way to honor robots.txt.

I think both robots.txt and security.txt are great ideas. However, they will always only be useful to those who follow the wishes of the website (which hopefully outweight those who do not).


Google, Bing, and other large search engines honor it. That alone is the point of most of the entries in this file.


robots.txt is the sheet with the house rules on the wall (or part thereof), not the enforcement of those rules.


Part of my monthly maintenance on an independent Mediawiki install is to cross-reference our robots.txt (which is based on Wikipedia's) and server logs.

If a client or IP range is misbehaving in the server logs, it goes into robots.txt. If it's ignoring robots.txt, it gets added to the firewall's deny list.

I've tried to automate that process a few times but haven't ever gotten far. It's unending, though. Feels like all it seems to take is a handful of cash and a few days to start an SEO marketing buzzword company with its own crawler, all to build yet another thing for us to block.


It's like putting up this sign in a public restroom (slightly nsfw):

http://i.imgur.com/d9L6I59.jpg


It's a bit like "do not track". The people you want to use it for don't care and won't respect it (even if publicly they might say they will).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: