Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That makes no sense.

There is no reason for AI scrappers to use tens of thousands of IPs to scrape one site over and over.

That just sounds like a classic DDOS.





Sure there is, scrapers do that to defeat throttling. 10,000 is less than 3 hours of scraping at 1 request per second.

It's not 10k requests, it's 10k IPs

Having lots of IPs is helpful for scraping, but you don't need 10k. That's a botnet


The way it works is this: You can sign up for a proxy rotator service that works like a regular proxy except every request you make goes through a different ip address. Is that a botnet? Yes. Is it also typically used in a scraping project? Yes.

Yeah I know, I've done scrapping too.

It can absolutely be that, but that requires a confluence of multiple factors - misconfigured scrapper hitting the site over and over, a big bot net like proxy setup that is way overkilled for scrapping, a setup sophisticated enough to do all that yet simultaneously stupid enough to not cope with a site is mostly text and a couple gigs at most and all that over extended timeframe without anyone realising their scrapper is stuck.

Or alternative explanation: It's a DDOS


Except that I think it's clear that the motive was getting the data not taking the site offline. The evidence for that is that it stopped on its own without them doing anything to mitigate it.

Also I don't know why you think this is sophisticated, it's probably 40 lines of Python code max.


No, DDOS do stop on their own too..

This stopping is absolutely not "evidence" that the motive was grabbing data. Honestly...


Ok so they spent all that money to... mildly inconvenience users temporarily? Lol.

If you call it DDOS you can't capitalize on ai hate

It likely is AI scrapers essentially doing a DDoS. They use separate IPs (and vary the UA) to prevent blocking.

I have a site which is currently being hit (over 10k requests today) and it looks like scrapers as every URL is different. If it was a DDoS, they would target costly pages like my search not every single URL.

SQLite had the same thing: https://sqlite.org/forum/forumpost/7d3eb059f81ff694 As have a few other open source repositories. It looks like badly written crawlers trying to crawl sites as fast as possible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: