How do you draw the line between allowed learning and disallowed learning? You d...

jfmc · on Aug 3, 2021

That is NOT the point. You are allowed to learn whatever you want. What is horribly unethical is not recognizing the life-long effort of the people who wrote the original code and designed the original algorithms. Programmers are not machines. The human *knows* the open source that she/he is reading and she/he can acknowledge it in their own code (either public or private).

What is the copyright of code written with copilot? Copilot learns the code and forgets authors.

Would you agree if I take your open source project, learn piece by piece, rewrite it from scratch and put my name on it without a single word about your work?

Operyl · on Aug 3, 2021

> Would you agree if I take your open source project, learn piece by piece, rewrite it from scratch and put my name on it without a single word about your work?

If it was indeed written from scratch, I see no reason (although it’d feel nice) to credit my original work. Having multiple implementations of an idea is always a great thing.

jfmc · on Aug 3, 2021

How do you separate the implementation from the algorithm/idea? I do not believe that you'd be fine if you invest a significant period of your life on some idea that someone else copies without at least some credit (i.e., replacing your name by theirs?). Nobody works like this unless your time worths nothing or your idea is trivial. Open source would be ruined if everyone believed that copying smart code without recognizing the authors is ethical.

Would this kind of copying be fine in software and not in other scientific papers or other industrial processes? Would it be fine if I train copilot on a patent database and start creating new patents (at a rate in which is would be unpractical to determine that it is regurgitating prior art)?

spywaregorilla · on Aug 3, 2021

> Open source would be ruined if everyone believed that copying smart code without recognizing the authors is ethical.

Open source would be ruined if it were easier to build upon past works with lower barriers to research and licensing?

> Would this kind of copying be fine in software and not in other scientific papers or other industrial processes?

Scientific papers are more about collecting and experimenting with novel data- and referencing an explicit paper trail of past results. It's not really comparable. Fiction is a better match.

> Would it be fine if I train copilot on a patent database and start creating new patents (at a rate in which is would be unpractical to determine that it is regurgitating prior art)?

This is a problem with the patent system, not copilot, and is also isn't a capability that copilot actually has. You're describing a different system entirely.

jfmc · on Aug 3, 2021

> Open source would be ruined if it were easier to build upon past works with lower barriers to research and licensing?

Why is recognizing someone else's work so much pain?

The whole point is that copilot forgets who wrote the code and who is the author of the whole idea (unfortunately few programmers write it but sometimes it is there is you are patient enough to read documentation). Thus a copilot's user cannot know who deserves the credit.

This whole discussion is like if you train an AI to pick apples from a supermarket and leave them on the street waiting for someone else to take them home, and pretending that nobody is stealing anything.

spywaregorilla · on Aug 3, 2021

> Why is recognizing someone else's work so much pain?

Because its basically impossible to completely and accurately attribute the origin of all your knowledge. And it is impossible to verify that the source you think is the originator of your knowledge is the original creator of that knowledge. Odds are they learned it from someone else. It really doesn't matter, at all.

> This whole discussion is like if you train an AI to pick apples from a supermarket and leave them on the street waiting for someone else to take them home, and pretending that nobody is stealing anything.

No, because in this case the supermarket has lost apples. This is more like accusing street performers singing popular songs without permission of the songwriter of being thieves. Or an engineer studying a bridge and leveraging techniques used in that bridge.

jfmc · on Aug 4, 2021

> Because its basically impossible to completely and accurately attribute the origin of all your knowledge. And it is impossible to verify that the source you think is the originator of your knowledge is the original creator of that knowledge. Odds are they learned it from someone else. It really doesn't matter, at all.

This is a fallacy.

Dylan16807 · on Aug 4, 2021

> This is a fallacy.

If you don't explain why you think the comparison is wrong, your comment might as well be "nuh uh".

Operyl · on Aug 3, 2021

Honestly? It has happened many times to me, and others. See: all the various code hosting sites. It's not worth the stress/getting worked up over it. People "steal" ideas from each other all the time, and people come to the same conclusion and ideas independently all the time too. I have more important stuff to worry about than "someone took my idea for a game and reimplemented it from scratch!"

spywaregorilla · on Aug 3, 2021

This is a pretty stupid hill to die on. Humans read code and forget authors too. Nobody cites 100% of the origin of their knowledge when writing new code. Most people don't cite anything. You could write a script that says "this repo is similar to these repos" based on copilots embedding space and it would be far superior to any typical human attribution.

ric2b · on Aug 10, 2021

The difference is that copilot is memorizing and reciting code verbatim without mentioning the source.

If a human does that we call it plagiarism.

pessimizer · on Aug 3, 2021

Computers don't have a private life, and applying the word "learning" to what they do is a convenient metaphor.

Computers read, process according to predefined algorithms, and output. A computer "learns" code when it comes over a wire in pieces over a bus, and writes code when it transmits it over a bus to a another device.