Hacker Newsnew | past | comments | ask | show | jobs | submit | medler's commentslogin

Quite a surprising result: “across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%.”


Well, task == Resolving real GitHub Issues

Languages == Python only

Libraries (um looks like other LLM generated libraries -- I mean definitely not pure human: like Ragas, FastMCP, etc)

So seems like a highly skewed sample and who knows what can / can't be generalized. Does make for a compelling research paper though!


Hey, paper author here. We did try to get an even sample - we include both SWE-bench repos (which are large, popular and mostly human-written) and a sample of smaller, more recent repositories with existing AGENTS.md (these tend to contain LLM written code of course). Our findings generalize across both these samples. What is arguably missing are small repositories of completely human-written code, but this is quite difficult to obtain nowadays.


Why stick to python-only repositories though?


To reduce the number of variables to account for. To be able to finish the paper this year, and not the next century. To work with a familiar language and environments. To use a language heavily represented in the training data.

I mean, it's not that hard to understand why.


[flagged]


All research is conducted in constraints. It's not hard to understand those constraints by simply thinking.

Besides, one could actually open the research, and scroll to section 5 where they acknowledge the need to expand beyond Python:

--- start quote ---

5. Limitations and Future Work

While our work addresses important shortcomings in the literature, exciting opportunities for future research remain.

# Niche programming languages

The current evaluation is focused heavily on Python. Since this is a language that is widely represented in the training data, much detailed knowledge about tooling, dependencies, and other repository specifics might be present in the models’ parametric knowledge, nullifying the effect of context files. Future work may investigate the effect of context files on more niche programming languages and toolchains that are less represented in the training data, and known to be more difficult for LLMs

--- end quote ---


You still did not answer my question and you're still being a d*ck. I understand now why - because you have no idea what I am talking about.


I think that is a rather fitting approach to the problem domain. A task being a real GitHub issue is a solid definition by any measure, and I see no problem picking language A over B or C.

If you feel strongly about the topic, you are free to write your own article.


> Libraries (um looks like other LLM generated libraries -- I mean definitely not pure human: like Ragas, FastMCP, etc)

How does this invalidate the result? Aren't AGENTS.md files put exactly into those repos that are partly generated using LLMs?


The FDA has approved it for men up to age 45. I myself got it in my late thirties at a pharmacy. For one of the shots, the pharmacist hassled me a little, asking if I was high risk, but acquiesced when I told them I was. For the other two, they just gave me the shot. It was also covered by my insurance.


Gardasil is usually a three-shot series. You may want to go back and get those followup shots


I had an incomplete series when I was younger, and told the GP that. I forget how they scheduled the follow-up shots but I trust my care team.


It runs great on windows 11. The install took a long time but I didn’t have to do anything special to make it work


Maybe we have different editions? I never got mine to work.


Yes, it has been redacted far in excess of what the law allows, and the material is a tiny fraction of what the administration was required by law to release by this date


Images were also planted that were not part of the files.


Planted by whom? That were not part of the files? That seems dubious at best. What is your source? It doesn't even make sense.


There is a picture of Bill Clinton with Michael Jackson and Diana Ross that was just publicly available before: https://www.threads.com/@meidastouch/post/DSfEKJslM1H

It doesn't belong into the Epstein Files, and doesn't need to be censored either, but the way it is framed in the DoJ release implies guilt where there is none.


How can you be sure the image wasn't part of the files collected during investigation? What makes you so sure Epstein didn't have the file saved somewhere on a device, server, or account that was collected?


I don’t think I expressed a particular opinion here, I just stated where the suspicion comes from.

That being said, I think we can demand a level of due diligence from public institutions that entails only censoring actual victims on actual pieces of evidence, instead of mindlessly placing black squares on the faces of news article pictures found on his computer. Nevermind that nobody can explain yet how this particular picture ended up in the grand jury files anyway.


This is the same DOJ that released the edited Epstein jail video as "raw", with the attorney general claiming the missing minute was from how the video system reset for a new day, when they had the actual raw video with the missing minute.


Makes sense if you are a criminal.


Surely you can link me to the exact "planted" images you are talking about...

who planted them?



That's not the exact same image, though. It's a separate image, from the same time and place. The one released may have been in Epstein's possession and therefore part of the files. Either some DoJ drone just redacted all children and non-celebrities due to procedure, or it was deliberately done in such a way as to make Clinton and Jackson look suspicious. Whatever the reason, this was not a Getty stock image planted in the files.


You can see the erroneously redacted image here: https://www.bbc.com/news/articles/c8r38ne1x2mo


I know what picture we're talking about. 1) it's not the same as the Getty stock image everyone seems to mistake it for. 2) we don't know if the redaction is erroneous or intentionally misleading, but either way the non-celebrity faces were redacted even though another image of them exists in the public domain. Probably easier to just apply a blanket policy when handling all these images rather than observing edge cases.


The redaction is a distraction. The concern is that it is from a charity event that is seemingly unrelated to Epstein


If Epstein had the photo in his possession, then that would explain why it's there!


It wasn’t erroneous. The DoJ said they were redacting the faces of all non-celebrity women and children under the presumption they could be victims.


In that case it makes perfect sense


> it agreed with me that its initial answer was wrong.

Most likely that was just its sycophancy programming taking over and telling you what you wanted to hear


No, we have no rulers but ourselves.


I did some searches for “nobody should be on that platform” and found:

- one hit on a Lana del Rey message board

- one bluesky post from 8 months ago with no likes, reposts, or replies.

If you widen the search to “should be on that platform” then you get more hits, but many are references to Instagram, Discord, Snapchat, TikTok etc. It seems that people are reaching for a noun that can refer to these social media properties that are not just “sites” and not just “apps.” It would appear that ”platform” is the word we’ve landed on.


I was more specifically referring to Reddit comments and the like, which I don't think are indexed by search engines.


That was $100B in profits, not valuation.


Profits over what timeframe? Valuation is just the total sum of profit discounted for time and risk.


I think the idea was that it was the sum of all historical profits. Contrast that with valuation, which at best is about the expectation of future profits.


The new changes in C++14, 17, and 20 are really nice. It feels like the language keeps getting cleaner and easier to use well


Yes! Just to list a few personal highlights:

C++14:

  - generalized lambda capture
  - generic lambdas
C++17:

  - structured bindings
  - init statement for if
  - class template argument deduction (CTAD)
  - std::string_view
  - std::filesystem
  - std::variant
  - std::optional
  - std::to_chars() and std::from_chars()
C++20:

  - std::format
  - coroutines (makes ASIO code so much cleaner!)
  - concepts
  - std::span
  - bit manipulation (<bit>)
  - std::bind_front
  - std::numbers (math constants)


Same, I don't understand the complaints against modern C++. A lambda, used for things like comparators etc, is much simpler than structs with operators overloaded defined elsewhere.

My only complaint is the verbosity, things like `std::chrono::nanonseconds` break even simple statements into multiple lines, and you're tempted to just use uint64_t instead. And `std::thread` is fine but if you want to name your thread you still need to get the underlying handle and call `pthread_setname_np`. It's hard work pulling off everything C++ tries to pull off.


> And `std::thread` is fine but if you want to name your thread you still need to get the underlying handle and call `pthread_setname_np`.

Yes, but here we're getting deep into platform specifics. An even bigger pain point are thread priorities. Windows, macOS and Linux differ so fundamentally in this regard that it's really hard to create a meaningful abstraction. Certain things are better left to platform APIs.


```c++

// To lessen verbosity, try defining the following convenience aliases in a header:

using SystemClock_t = std::chrono::system_clock;

using SteadyClock_t = std::chrono::steady_clock;

using HighClock_t = std::chrono::high_resolution_clock;

using SharedDelay_t = std::atomic<SystemClock_t::duration>;

using Minutes_t = std::chrono::minutes;

using Seconds_t = std::chrono::seconds;

using MilliSecs_t = std::chrono::milliseconds;

using MicroSecs_t = std::chrono::microseconds;

using NanoSecs_t = std::chrono::nanoseconds;

using DoubleSecs_t = std::chrono::duration<double>;

using FloatingMilliSecs_t = std::chrono::duration<double, std::milli>;

using FloatingMicroSecs_t = std::chrono::duration<double, std::micro>;

```


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: