Quite a surprising result: “across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%.”
Hey, paper author here.
We did try to get an even sample - we include both SWE-bench repos (which are large, popular and mostly human-written) and a sample of smaller, more recent repositories with existing AGENTS.md (these tend to contain LLM written code of course). Our findings generalize across both these samples. What is arguably missing are small repositories of completely human-written code, but this is quite difficult to obtain nowadays.
To reduce the number of variables to account for. To be able to finish the paper this year, and not the next century. To work with a familiar language and environments. To use a language heavily represented in the training data.
All research is conducted in constraints. It's not hard to understand those constraints by simply thinking.
Besides, one could actually open the research, and scroll to section 5 where they acknowledge the need to expand beyond Python:
--- start quote ---
5. Limitations and Future Work
While our work addresses important shortcomings in the literature, exciting opportunities for future research remain.
# Niche programming languages
The current evaluation is focused heavily on Python. Since this is a language that is widely represented in the training data, much detailed knowledge about tooling, dependencies, and other repository specifics might be present in the models’ parametric knowledge, nullifying the effect of context files. Future work may investigate the effect of context files on more niche programming languages and toolchains that are less represented in the training data, and known to be more difficult for LLMs
I think that is a rather fitting approach to the problem domain. A task being a real GitHub issue is a solid definition by any measure, and I see no problem picking language A over B or C.
If you feel strongly about the topic, you are free to write your own article.
The FDA has approved it for men up to age 45. I myself got it in my late thirties at a pharmacy. For one of the shots, the pharmacist hassled me a little, asking if I was high risk, but acquiesced when I told them I was. For the other two, they just gave me the shot. It was also covered by my insurance.
Yes, it has been redacted far in excess of what the law allows, and the material is a tiny fraction of what the administration was required by law to release by this date
It doesn't belong into the Epstein Files, and doesn't need to be censored either, but the way it is framed in the DoJ release implies guilt where there is none.
How can you be sure the image wasn't part of the files collected during investigation? What makes you so sure Epstein didn't have the file saved somewhere on a device, server, or account that was collected?
I don’t think I expressed a particular opinion here, I just stated where the suspicion comes from.
That being said, I think we can demand a level of due diligence from public institutions that entails only censoring actual victims on actual pieces of evidence, instead of mindlessly placing black squares on the faces of news article pictures found on his computer. Nevermind that nobody can explain yet how this particular picture ended up in the grand jury files anyway.
This is the same DOJ that released the edited Epstein jail video as "raw", with the attorney general claiming the missing minute was from how the video system reset for a new day, when they had the actual raw video with the missing minute.
That's not the exact same image, though. It's a separate image, from the same time and place. The one released may have been in Epstein's possession and therefore part of the files. Either some DoJ drone just redacted all children and non-celebrities due to procedure, or it was deliberately done in such a way as to make Clinton and Jackson look suspicious. Whatever the reason, this was not a Getty stock image planted in the files.
I know what picture we're talking about. 1) it's not the same as the Getty stock image everyone seems to mistake it for. 2) we don't know if the redaction is erroneous or intentionally misleading, but either way the non-celebrity faces were redacted even though another image of them exists in the public domain. Probably easier to just apply a blanket policy when handling all these images rather than observing edge cases.
I did some searches for “nobody should be on that platform” and found:
- one hit on a Lana del Rey message board
- one bluesky post from 8 months ago with no likes, reposts, or replies.
If you widen the search to “should be on that platform” then you get more hits, but many are references to Instagram, Discord, Snapchat, TikTok etc. It seems that people are reaching for a noun that can refer to these social media properties that are not just “sites” and not just “apps.” It would appear that ”platform” is the word we’ve landed on.
I think the idea was that it was the sum of all historical profits. Contrast that with valuation, which at best is about the expectation of future profits.
Same, I don't understand the complaints against modern C++. A lambda, used for things like comparators etc, is much simpler than structs with operators overloaded defined elsewhere.
My only complaint is the verbosity, things like `std::chrono::nanonseconds` break even simple statements into multiple lines, and you're tempted to just use uint64_t instead. And `std::thread` is fine but if you want to name your thread you still need to get the underlying handle and call `pthread_setname_np`. It's hard work pulling off everything C++ tries to pull off.
> And `std::thread` is fine but if you want to name your thread you still need to get the underlying handle and call `pthread_setname_np`.
Yes, but here we're getting deep into platform specifics. An even bigger pain point are thread priorities. Windows, macOS and Linux differ so fundamentally in this regard that it's really hard to create a meaningful abstraction. Certain things are better left to platform APIs.