I think you’re missing that the law considers intent. If the devs of copilot wer...

Applejinx · on Aug 3, 2021

Would it not be categorically intended as infringement regardless of the copyright status of the material?

It seems to me that the licensing part is the part you can't throw into a big markov chain, legally. Even if they aimed only at open-source licensed material without exception, the point where they discard all the licenses and export a 'generic' slurry is the point where they infringe by definition. If they trained on more restrictive licenses that's just doubling down: what's needed is annotation and maintenance of what bits of code came from what licensing pool. You could well have a giant pool of GPL, a giant pool of MIT (which I would be in, all the more since I maintain a very automatable code style that's easy to import from). You could accumulate a list of sources for anything you did, at whatever level of granularity is desired.

The purpose of throwing away this attribution is intent to infringe. It's constructing a machine for the explicit purpose of grinding code into sludge of intentionally small enough pieces that, if you reconstruct copyrighted code in your markov-chainy way, you've got grounds for pretending you didn't build your machine to do exactly that.

woojoo666 · on Aug 4, 2021

> you've got grounds for pretending you didn't build your machine to do exactly that.

I believe all laws about intent have to deal with determining who is pretending and who isn't. But these laws still exist, because there are ways to prove such things

zaarn · on Aug 3, 2021

I don't think that is so easy, a tiny training set would obviously defeat the point but the other part is that the AI can't commit copyright infringement, and I don't have to ask it to produce anything. I merely fed it copyrighted code and released it to other developers without documenting that fact. Possibly open source the entire bot, as no part of the AI would be under the restrictions of the training set.

542458 · on Aug 3, 2021

Again, the law isn’t enforced by robots and is able to adapt such that “clever legal hacks” don’t typically work. Us programming nerds tend to think in terms of rigid, unambiguous rules that treat inputs as black boxes, but the law does not work like this.

zaarn · on Aug 3, 2021

I'm well aware but I don't think this is an issue here.

nl · on Aug 4, 2021

It's exactly the issue.

If the AI could be shown to have copied the code it would likely to be found to be infringement.

If it was found to have generated new unique code, and merely leant how to program from the code it was trained on it likely wouldn't.

In either case, this is different to a clean-room implementation (which I think is what you said by "white room").

Clean-room implementations are supposed to protect against trade secret infringement, and are mostly used when building interop with hardware (where compatibility has special carve-outs).

If a person or AI had seen copyright code used in the project it would never be considered clean room.

But CDDL code is fine for a person or AI to learn from when building a new, incompatible implementation that doesn't share any code.