We need to update robots.txt for the LLM world, help them find things more efficiently (or not at all I guess). Provide specs for actions that can be taken. Etc.
If current behaviour is anything to go by, they will ignore all such assistance, and instead insist on crawling infinite variations of the same content accessed with slightly different URL-patterns, plus hallucinate endless variations of non-existent but plausible looking URLs to hit as well until the server burns down - all on the off-chance that they might see a new unique string of text which they can turn into a paperclip.
There's no LLM in the loop at all, so any attempt to solve it by reasoning with an LLM is missing the point. They're not even "ignoring" assistance as sibling supposes. There simply is no reasoning here.
This is what you should imagine when your site is being scraped:
def crawl(url):
r = requests.get(url).text
store(text)
for link in re.findall(r'https?://[^\s<>"\']+', r):
crawl(link)
Sure, but at some point the idea is to train an LLM on these downloaded files no? I mean what is the point of getting them if you don't use them. So sure, this won't be interpreted during the crawling but it will become part of the knowledge of the LLM
You can debate this until you are blue in the face. If the costs are less they will do it. If they aren’t they won’t do it. That’s the only sense that needs to be made.
Ah, yes, the "efficient market hypothesis", it's well known that no company has ever gone bankrupt because every company only does things that are optimally efficient and profitable.
No company has ever made an investment in something that ended up being more expensive than calculated, or so expensive it bankrupted them.
You are assuming they will commit to the solution and ride it to their grave trying to make it work. They will experiment and figure out a way to make it cheaper, or they will give up. They have plenty of money to experiment with this.
If this is in reference to the parent's mention of Hauwei (200,000 employees, ~$120 billion annual revenue), then I'm not sure we all share your idea of a small company
Small company can mean different things to different people. Considering that some larger Chinese companies have north of 2 million employees, 200k is quite a small number actually. Go figure.
My solution is that sustainable companies are more worthwhile to society, than late scale capitalism companies that always lay off employees when the exponential growth targets set by their C suites aren't met.
“Even if you can disable individual AI features, the cognitive load of monitoring an opaque system that’s supposedly working on your behalf would be overwhelming.”
99.9% of people haven’t ever had one single thought about how their software works. I don’t think they will be overwhelmed with cognitive load. Quite the opposite.
I feel like people dance around this a lot because idk it hurts nerd credibility or something. The fact is on a moment to moment basis, the iPhone is just a better experience generally. They also hold their value a lot longer. I consistently trade in my phone or sell it to other people for easily 80% of what I paid for it. Usually this is 3-4yrs out
Remember how long it took for Instagram to be functional on android phones?
I've tried them out and not a single thing about it was tangibly better IMO. They have no inherent merit above Android except that some see them as a status symbol (which is absurd as my S25U has a higher MSRP than most iPhone models)
Cameras, for starters. I’ve never seen another smart phone keep up with the quality color and texture of an iPhone’s photos/videos (videos in particular) since the 4s. Their color science is just better. We’ve intercut footage since the 7 or so with our work and frankly you’d be hard pressed to catch it wasn’t one of our nicer rigs unless we hold the shot for too long. we just can’t get other phone cameras to match footage with the same ease, especially when it comes to skin tones.
It’s not like ElasticSearch lacks ranking algorithms and control thereof. But it can require tuning and adjustment for various domains. Relevancy is, after all, subjective.
Any numbers on how much energy isn’t sensitive to time? Is it reasonable to say that people can just use energy more when it’s windy to save money? Perhaps if could incentivize people to have large local batteries to eat it up during these times and use it during more costly times? But that seems very expensive.
That is the whole "smart grid" idea. Problem is that people are rightly suspicious that as usual, the "smarts" are not there to serve them, but to maximally squeeze them and maximize profits for the operator.
I'm going to have V2H installed (Vehicle-to-Home), where excess power from the solar panels will charge the car battery, and the car battery can feed the home at night. I'm planning on following a setup I saw in another house, it seemed to work very well.
There are businesses that attract people that use cards fraudulently and the business gets flagged demand eventually dropped. Gas stations in less desirable neighborhoods in the US have this issue and some only take cash.
Credit card fraud is not nearly as common in Europe as it is in the US.
Additionally, and specifically in Sweden, the fees that banks charge businesses for handling cash (picking it up and depositing it at the end of each business day) have increased significantly in the last decade or two. This has been a significant factor in driving businesses away from cash - it's just expensive for them to deal with.
I don’t think it’s some master scheme. They are trying to make money more than anything else. So they distort the truth to what sells the most. That just happens to be one of two major ideologies that hate each other. The effect is the same, but the motivations, and thus how you counteract, are different.
>They are trying to make money more than anything else.
Who knows what some people will do these days, just for that.
Well, we actually have a pretty good idea, without all the gory details.
But I know what you mean, it's not too easy for multiple sources to be on the same page even when they really try sometimes.
However, only the few most popular are what most people listen to, and those biggies are usually well aware of each others' stance. On an ongoing basis. And if a combined effort were to take place nothing else would have a chance.
Sometimes even sharing personnel, concurrently and/or sequentially, which can also lay the groundwork for approaches that seem competitive but are really complementary. As designed with a single, possibly obscured agenda designed from the ground up to deceive.
Things like this might be why "trust but verify" may have to be deprecated, and reversed to "verify and still be skeptical" if the propaganda keeps getting worse.
reply