Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How would they prioritize things they haven't crawled yet?


It's not clear that they are doing that. Web logs I've seen from other writing on this topic show them re-crawling the same pages at high rates, in addition to crawling new pages


Actually I've been informed otherwise, they crawl known links first according to this person:

> Unfortunately, based on what I'm seeing in my logs, I do need the bot detection. The crawlers that visit me, have a list of URLs to crawl, they do not immediately visit newly discovered URLs, so it would take a very, very long time to fill their queue. I don't want to give them that much time.

https://lobste.rs/c/1pwq2g




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: