Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How would the links be prioritized? If the bots goal is to crawl all content would they have prioritization built-in?


How would they prioritize things they haven't crawled yet?


It's not clear that they are doing that. Web logs I've seen from other writing on this topic show them re-crawling the same pages at high rates, in addition to crawling new pages


Actually I've been informed otherwise, they crawl known links first according to this person:

> Unfortunately, based on what I'm seeing in my logs, I do need the bot detection. The crawlers that visit me, have a list of URLs to crawl, they do not immediately visit newly discovered URLs, so it would take a very, very long time to fill their queue. I don't want to give them that much time.

https://lobste.rs/c/1pwq2g




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: