You don't even need to go this far. The word-probabilities are transformative us...

surfingdino · on May 18, 2024

MS Word does not actively collect and process all texts for all available sources and does not offer them in recombined form. MS Word is passive whereas the whole point of an LLM is to produce output using a model trained on ingested data. It is actively processing vast amounts of texts with intent to make them available for others to use and the T&C state that the user owns the copyright to the outputs based on works of other copyright owners. LLMs give the user a CCL (Collateralised Copyright Liability, a bit like a CDO) without a way of tracing the sources used to train the model.

KoolKat23 · on May 18, 2024

Legally, copyright is only concerned with the specific end work. A unique or not so unique standalone object that is being scrutinized, if this analogy helps.

The process involved in obtaining that end work is completely irrelevant to any copyright case. It can be a claim against the models weights (not possible as it's fair use), or it's against the specific once off output end work (less clear), but it can't be looked at as a whole.

dgoldstein0 · on May 18, 2024

I don't think that's accurate. The us copyright office last year issued guidance that basically said anything generated with ai can't be copyrighted, as human authorship/creation is required for copyright. Works can incorporate ai generated content but then those parts aren't covered by copyright.

https://www.federalregister.gov/documents/2023/03/16/2023-05...

So I think the law, at least as currently interpreted, does care about the process.

Though maybe you meant as to whether a new work infringes existing copyright? As this guidance is clearly about new copyright.

KoolKat23 · on May 18, 2024

These are two sides of the same coin, and what I'm saying still stands. This is talking about who you attribute authorship to when copyrighting a specific work. Basically on the application form, the author must be a human. The reason it's worth them clarifying is because they've received applications that attributed AI's, and legal persons do exist that aren't human (such as companies), they're just making it clear it has to be human.

Who created the work, it's the user who instructed the AI (it's a tool), you can't attribute it to the AI. It would be the equivalent of Photoshop being attributed as co-author on your work.

arrowsmith · on May 18, 2024

Couldn't you just generate it with AI then say you wrote it? How could anyone prove you wrong?

KoolKat23 · on May 18, 2024

That's what you're supposed to do. No need to hide it either :).

throwaway2037 · on May 18, 2024

First, I agree with nearly everything that you wrote. Very thoughtful post! However, I have some issues with the last sentence.

    > Collateralised Copyright Liability

Is this a real legal / finance term or did you make it up?

Also, I do not follow you leap to compare LLMs to CDOs (collateralised debt obligations). And, do you specifically mean CDO or any kind of mortgage / commercial loan structured finance deal?

surfingdino · on May 18, 2024

My analogy is based on the fact that nobody could see what was inside CDOs nor did they want to see, all they wanted to do was pass them on to the next sucker. It was all fun until it all blew up. LLM operators behave in the same way with copyrighted material. For context, read https://nymag.com/news/business/55687/

throwaway2037 · on May 20, 2024

    > nobody could see what was inside CDOs

Absolutely not true. Where did you get that idea? When pricing the bonds from a CDO you get to see the initial collateral. As a bond owner, you receive monthly updates about any portfolio updates. Weirdly, CDOs frequently have more collateral transparency compared to commercial or residential mortgage deals.

rcbdev · on May 18, 2024

OpenAI is outputting the partially copyright-infringing works of their LLM for profit. How does that square?

KoolKat23 · on May 18, 2024

You, the user, is inputting variables into their probability algorithm that's resulting in the copyright work. It's just a tool.

maeil · on May 18, 2024

Let's say a torrent website asks the user through an LLM interface what kind of copyrighted content they want to download and then offers me links based on that, and makes money off of it.

The user is "inputting variables into their probability algorithm that's resulting in the copyright work".

KoolKat23 · on May 18, 2024

Theoretically a torrent website that does not distribute the copyright files themselves in anyway should be legal, unless there's a specific law for this (I'm unaware of any, but I may be wrong).

They tend to try argue for conspiracy to commit copyright infringement, it's a tenuous case to make unless they can prove that was actually their intention. I think in most cases it's ISP/hosting terms and conditions and legal costs that lead to their demise.

Your example of the model asking specifically "what copyrighted content would you like to download", kinda implies conspiracy to commit copyright infringement would be a valid charge.

DaSHacka · on May 18, 2024

How is it any different than training a model on content protected under an NDA and allowing access to users via a web-portal?

What is the difference OpenAI has that lets them get away with, but not our hypothetical Mr. Smartass doing the same process trying to get around an NDA?

KoolKat23 · on May 18, 2024

Well if OpenAI signed an NDA beforehand to not disclose certain training data it used, and then users actually do access this data, then yes it would be problematic for OpenAI, under the terms of their signed NDA.

rcbdev · on May 20, 2024

Yes, a tool that they charge me money to use.

KoolKat23 · on May 21, 2024

Just like any other tool that can be used to plagiarize, Photoshop, Word etc.

throwaway2037 · on May 18, 2024

You raise an interesting point. If more professional lawyers agreed with you, then why have we not seen a lawsuit from publishers against OpenAI?

dgoldstein0 · on May 18, 2024

Some of them are suing

https://www.nytimes.com/2023/12/27/business/media/new-york-t... https://www.reuters.com/legal/us-newspapers-sue-openai-copyr... https://www.washingtonpost.com/technology/2024/04/09/openai-...

Some decided to make deals instead

kmeisthax · on May 19, 2024

There are some lawsuits, especially in the very reflexively copyright-pilled industries. However, a good chunk of publishers aren't suing for self-interested reasons. There's a lot of people in the creative industry who see a machine that can cut artists out of the copyright bargain completely and are shouting "omg piracy is based now" because LLMs can spit out content faster and for free.

kibibu · on May 18, 2024

Is converting an audio signal into the frequency domain, pruning all inaudible frequencies, and then Huffman encoding it tranformative?

KoolKat23 · on May 18, 2024

Well if the end result is something completely different such as an algorithm for determining which music is popular or determining which song is playing then yes it's transformative.

It's not merely a compressed version of a song intended to be used in the same way as the original copyright work, this would be copyright infringement.