Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s great that you have sympathy for illustrators, but I don’t see a big difference if the training data is a novel, a picture, a song, a piece of code, or even a piece of legal text.

As my mom retired from being a translator, she went from typewriter to machine-assisted translation with centralised corpus-databases. All the while the available work became less and less, and the wages became lower and lower.

In the end, the work we do that is heavily robotic will be done by less expensive robots.



Here’s the argument:

The output of her translations had no copyright. Language developed independently of translators.

The output of artists has copyright. Artists shape the space in which they’re generating output.

The fear now is that if we no longer have a market where people generate novel arts, that space will stagnate.


A translation is absolutely under copyright. It is a creative process after all.

This means a book can be in public domain for the original text, because it's very old, but not the translation because it's newer.

For example Julius Caesar's "Gallic War" in the original latin is clearly not subject to copyright, but a recent English translation will be.


So if a machine was to do the translation, should that also be considered a creative work?

If not, that would put pressure on production companies to use machines so they don’t have to pay future royalties


Well that's the real question, isn't it?

Our current best technology, LLMs, are good enough for translating an email or meeting transcript and getting the general message across. Anything more creative, technical, or nuanced, and they fall apart.

Meaning for anything of value like books, plays, movies, poetry, humans will necessarily be part of the process: coaxing, prompting, correcting...

If we consider the machine a tool, it's easy, the work would fall under copyright.

If we consider the machine the creator, then things get tricky. Are only the parts reworked/corrected under copyright? Do we consider under copyright only if a certain portion of the work was machine generated? Is the prompt under copyright, but not its output?

Without even getting into the issue of training data under copyright...

There is some movement regarding copyright of AI art, legislation being drawn up and debated in some countries. It's likely translations would be impacted by those decisions.


> So if a machine was to do the translation, should that also be considered a creative work?

No, but it will be derived work covered by the same copyright as original.

The quality of human translation is better, for now.


> The output of artists has copyright.

Copyright is a very messy and divisive topic. How exactly can an artist claim ownership of a thought or an image? It is often difficult to ascertain whether a piece of art infringes on the copyright of another. There are grey areas like "fair use", which complicate this further. In many cases copyright is also abused by holders to censor art that they don't like for a myriad of unrelated reasons. And there's the argument that copyright stunts innovation. There are entire art movements and music genres that wouldn't exist if copyright was strictly enforced on art.

> Artists shape the space in which they’re generating output.

Art created by humans is not entirely original. Artists are inspired by each other, they follow trends and movements, and often tiptoe the line between copyright infringement and inspiration. Groundbreaking artists are rare, and if we consider that machines can create a practically infinite number of permutations based on their source data, it's not unthinkable that they could also create art that humans consider unique and novel, if nothing else because we're not able to trace the output to all of its source inputs. Then again, those human groundbreaking artists are also inspired by others in ways we often can't perceive. Art is never created in a vacuum. "Good artists copy; great artists steal", etc.

So I guess my point is: it doesn't make sense to apply copyright to art, but there's nothing stopping us from doing the same for machine-generated art, if we wanted to make our laws even more insane. And machine-generated art can also set trends and shape the space they're generated in.

The thing is that technology advances far more rapidly than laws do. AI is raising many questions that we'll have to answer eventually, but it will take a long time to get there. And on that path it's worth rethinking traditional laws like copyright, and considering whether we can implement a new framework that's fair towards creators without the drawbacks of the current system.


Ambiguities are not a good argument against laws that still have positive outcomes.

There are very few laws that are not giant ambiguities. Where is the line between murder, self-defense and accident? There are no lines in reality.

(A law about spectrum use, or registered real estate borders, etc. can be clear. But a large amount of law isn’t.)

Something must change regarding copyright and AI model training.

But it doesn’t have to be the law, it could be technological. Perhaps some of both, but I wouldn’t rule out a technical way to avoid the implicit or explicit incorporation of copyrighted material into models yet.


> There are very few laws that are not giant ambiguities. Where is the line between murder, self-defense and accident? There are no lines in reality.

These things are very well and precisely defined in just about every jurisdiction. The "ambiguities" arise from ascertaining facts of the matter, and whatever some facts fits within a specific set of set rules.

> Something must change regarding copyright and AI model training.

Yes, but this problem is not specific to AI, it is the question of what constitutes a derivative, and that is a rather subjective matter in the light of the good ol' axiom of "nothing is new under the sun".


> These things are very well and precisely defined in just about every jurisdiction.

Yes, we have lots of wording attempting to be precise. And legal uses of terms are certainly more precise by definition and precedent than normal language.

But ambiguities about facts are only half of it. Even when all the facts appear to be clear, human juries have to use their subjective human judgement to pair up what the law says, which may be clear in theory, but is often subjective at the borders, vs. the facts. And reasonable people often differ on how they match the two up in many borderline cases.

We resolve both types of ambiguities case-by-case by having a jury decide, which is not going to be consistent from jury to jury but it is the best system we have. Attorneys vetting prospective jurors are very much aware that the law comes down to humans interpreting human language and concepts, none of which are truly precise, unless we are talking about objective measures (like frequency band use).

---

> it is the question of what constitutes a derivative

Yes, the legal side can adapt.

And the technical side can adapt too.

The problem isn't that material was trained on, but that the resulting model facilitates reproducing individual works (or close variations), and repurposing individual's unique styles.

I.e. they violate fair use by using what they learn in a way that devalues other's creative efforts. Being exposed to copyrighted works available to the public is not the violation. (Even though it is the way training currently happens that produces models that violate fair use.)

We need models that one way or another, stay within fair use once trained. Either by not training on copyrighted material, or by training on copyrighted material in a way that doesn't create models that facilitate specific reproduction and repurposing of creative works and styles.

This has already been solved for simple data problems, where memorization of particular samples can be precluded by adding noise to a dataset. Important generalities are learned, but specific samples don't leave their mark.

Obviously something more sophisticated would need to be done to preclude memorization of rich creative works and styles, but a lot of people are motivated to solve this problem.


It seems like your concerns is about how easy it is going to be to create derivative and similar work, rather than a genuine concerns for copyright. Do I understand correctly?


No, I am just narrowing down the problem definition to the actual damage.

Which is a very fair use and copyright respecting approach.

Taking/obtaining value from works is ok, up until the point where damage to the value of original works happen. And that is not ok. Because copyright protects that value to incentivize the creation and sharing of works.

The problem is that models are shipping that inherently make it easy to reproduce copyrighted works, and apply specific styles lifted from single author's copyrighted bodies of work.

I am very strongly against this.

Note that prohibiting copying of a recognizable specific single author's style is even more strict than fair use limits on humans. Stricter makes sense to me, because unlike humans, models are mass producers.

So I am extremely respectful of protecting copyright value.

But it is not the same thing as not training on something. It is worth exploring training algorithms that can learn useful generalities about bodies of work, without retaining biases toward the specifics of any one work, or any single authored style. That would be in the spirit of fair use. You can learn from any art, if it's publicly displayed, or you have paid for a copy, but you can't create mass copiers of it.

Maybe that is impossible, but I doubt it. There are many ways to train that steer important properties of the resulting models.

Models that make it trivial to create new art deco works, consistent with the total body of art deco works, ok. Models that make it trivial to recreate Erte works, or with an accurately Erte style specifically. Not ok.


> The problem is that models are shipping that inherently make it easy to reproduce copyrighted works, and apply specific styles lifted from single author's copyrighted bodies of work. > I am very strongly against this. > Note that prohibiting copying of a recognizable specific single author's style is even more strict than fair use limits on humans. Stricter makes sense to me, because unlike humans, models are mass producers.

This sounds like gate-keeping rather than genuine copyright concerns.

> Models that make it trivial to create new art deco works, consistent with the total body of art deco works, ok. Models that make it trivial to recreate Erte works, or with an accurately Erte style specifically. Not ok.

Yeah, again, sounds like gate-keeping more than an economic and incentives argument which are, in my opinion, the only legitimate concerns underpinning copyright's moral ground.

Every step of progress has made doing things easier and easier to the point that now arguing with some strange across the world seems trivial, almost natural. Surely there are some arguments to curtail this dangerous machinery that undermines the control of information flow and corrupts the minds of the naive! we must shut it down!

Jokes aside, "making things easier/trivial" is the name of the game of progress. You can't stop progress. Everything will be easier and easier as the time goes on.


>Art created by humans is not entirely original.

The catch here is that a human can use single sample as input, but AI needs a torrent of training data. Also when AI generates permutations of samples, does their statistic match training data?


No human could use a single sample if it was literally the first piece of art they had ever seen.

Humans have that torrent of training data baked in from years of lived experience. That’s why people who go to art school or otherwise study art are generally (not always of course) better artists.


I don't think the claim that the value of art school simply being more exposure to art holds water.


Not without a torrent of pre-training data. The qualitative differences are rapidly becoming intangible ‘soul’ type things.


A skilled artist can imitate a single art style or draw a specific object from a single reference. But becoming a skilled artist takes years of training. As a society we like to pretend some humans are randomly gifted with the ability to draw, but in reality it's 5% talent and 95% spending countless hours practising the craft. And if you count the years worth of visual data the average human has experienced by the time they can recreate a van Gogh then humans take magnitudes more training data than state of the art ML models


In case of an ML model either a very good description or that single reference could be added to the context.


That makes no sense, neither legally nor philosophically.

> Language developed independently of translators.

And it also developed independently of writers and poets.

> Artists shape the space in which they’re generating output.

Not writers and poets, apparently. And so maybe not even artists, who typically mostly painted book references. Color perception and symbolism developed independently of professional artists, too. Moreover, all of the things you mention predate copyright.

> The fear now is that if we no longer have a market where people generate novel arts, that space will stagnate.

But that will never happen; it's near-impossible to stop humans from generating novel arts. They just do it as a matter of course - and the more accessible the tools are, the more people participate.

Yes, memes are a form of art, too.

What's a real threat is lack of shared consumption of art. This has been happening for the past couple decades now, first with books, then with visual arts. AI will make this problem worse by both further increasing volume of "novel arts" and by enabling personalization. The real value we're using is the role of art as social objects - the ability to relate to each other by means of experiencing the same works of art, and thus being able to discuss and reference it. If no two people ever experienced the same works of art, there's not much about art they can talk about; if there's no shared set of art seen by most people in a society, there's a social baseline lost. That problem does worry me.


If you think memes are art too and we lack shared consumption of art due to personalization, you clearly don't have kids into YouTube or Minecraft or Frozen, or ...


I don't get what you are trying to say here? Yes, memes are arts however foreign it might be to older folks. To your second point, you know about Frozen because everyone else also watches that. We are about to lose that if there are 1 million variations of "Frozen"-esque movie that people can watch.

I don't think have an AI partner that is trained from zero from childhood to adulthood with goals such as "make me laugh" is too far fetched. The problem is you will never be able to connect with this child because the AI is feeding it insanely obscure, highly specific videos that matches the neurons of the kid perfectly.


I actually have kids, two of them in kindergarten. I can already see this problem affecting them, because beyond Frozen and Paw Patrol and a couple others, everyone also has their own favorite series on YouTube that few other kids heard of, and I can see kids trying and failing to bond over those.

I never thought I'd be thankful for global, toy-pushing franchises, but they at least serve as a social object for kids, when the current glut of kids videos on YouTube doesn't.


I don't think the Berne Convention on Copyright was meant as a complete list of things where humans have valuable input. Translators do shape the space in which they generate output. Their space isn't any single language bit rather the connecting space between languages.

Most translation work is simple just as the day-to-day of many creative professions is rather uncreative. But translating a book, comic or movie requires creative decisions on how to best convey the original meaning in the idioms and cultural context of a different language. The difference between a good and a bag translation can be stark


You are wrong. Translations have copyright. That is why a new translation of for example an ancient book has copyright and you are now allowed to reproduce it without permission.


Makes me wonder if the generous copyright protections afforded to artists had not become so abhorrent (thanks, Disney) then this kind of thing might not have happened.


Wrong from the first sentence…


Translations absolutely have copyright.


Stagnate just like hand thatched rooves? Or like weavers, ever since Jaquard?

I don't see too many people defending artists also calling for people to start buying handmade clothing and fabrics again.

That said and because people on here are feisty, I have many artist friends and I deeply appreciate their work at the same time as appreciating how cool diffusion models are.

The difference being of course that we live in a modern society and we should be able to find a solution that works for all.

That said, humans can't even get in something basic like UBI for people and humans consistently vote against each other in favour of discriminating on skin colour, sex, sexuality, culture. Meanwhile the billionaires that are soon to become trillionaires are actively defended by many members of our species, sometimes even by the poor. The industrial age broke our evolved monkey brains.


Piracy is promotion, look at all the fanfiction.

Also in case of graphic and voice artists unique style looks more valuable than output itself, but style isn't protected by copyright.


My prediction:

It will be like furniture.

A long time ago, every piece of furniture was handmade. It might have been good furniture, or crude, poorly constructed furniture, but it was all quite expensive, in terms of hours per piece. Now, furniture is almost completely mass produced, and can be purchased in a variety of styles and qualities relatively cheaply. Any customization or uniqueness puts it right back into the hand-made category. And that arrangement works for almost everyone.

Media will be like that. There will be a vast quantity of personalized media of decent quality. It will be produced almost entirely automatically based on what the algorithm knows about you and your preferences.

There will be a niche industry of 'hand made' media with real acting and writing from human brains, but it will be expensive, a mark of conspicuous consumption and class differentiation.


This. Except one should also disillusion themselves of the idea that there will always be a higher quality to the 'hand made' versions. AI will almost certainly outpace us in every way, including the ability to make something beautiful that looks 'hand-made', even with artificial flaws and illusions of the history and natural rugged beauty of the piece.

The only discernable difference that won't be replicable is a cryptographic signature "Certified 100% Human-Made!" sticker, which will probably become the mark of the niche industry.

Somewhat more accurate analogy would be the custom car market. Beautiful collectible convertibles with fine detailing everywhere, priced thousands of times higher than normal cars, that actually run far worse and basically break apart after a few thousand miles and are impossible to find parts for. Automated factories certainly could churn them out but they don't because they're impractical poorly-designed status items kept artificially scarce for the very rich to peacock with.

Except AI will probably still produce equivalent impractical stuff anyway, just because production (digital and physical) will eventually be easy enough that resources are negligible, and everyone can have flashy impractical stuff. So again, only that "100% Human!" seal will distinguish, eventually.


This prediction implies that people will value consuming tailored media, knowing 100% that it was generated because they wanted it (as opposed to because someone wanted to express something), with no deeper story or connection or exploration to it.

If people instead care about the creation story and influences (the idea of "behind the scenes" and "creator interviews" for on demand ai generated media is pretty funny) then this won't have much value.

Time will tell - it's an exciting, discouraging time to be alive, which has probably always been the case.


I think the proportion of people who care about 'the making of' type content is vanishingly small. Almost everyone is looking for a dopamine hit and that's it.


> There will be a niche industry of 'hand made' media with real acting and writing from human brains, but it will be expensive, a mark of conspicuous consumption and class differentiation.

This addresses one axis of development.

Meanwhile, there's lots of people around willing to express themselves for advertisement money.

Like with translation: We're going to see tool-assisted work where the tools get more and more sophisticated.

Your example with furniture is good. Another is cars: From horses to robotaxis. Humans are in the loop somewhere still.


The reproduction cost for the 2nd copy of media is near zero just like software. Handmade or customized furniture is more expensive because it takes more labor for each copy. With media, the cost is fixed, even if it is large. Once the first version of handmade media has been created, the owner is incentivized to get as much value from it as possible. The optimal demand curve is probably not a few rich people paying as much as possible.


> As my mom retired from being a translator, she went from typewriter to machine-assisted translation with centralised corpus-databases. All the while the available work became less and less, and the wages became lower and lower.

She was lucky to be able to retire when she did, as the job of a translator is definitely going to become extinct.

You can already get higher quality translations from machine learning models than you get from the majority of commercial human translations (sans occasional mistakes for which you still need editors to fix), and it's only going to get better. And unlike human translators LLMs don't mangle the translations because they're too lazy to actually translate so they just rewrite the text as that's easier, or (unfortunately this is starting to become more and more common lately) deliberately mistranslate because of their personal political beliefs.


While LLMs are pretty good, and likely to improve, my experience is OpenAI's offerings *absolutely* make stuff up after a few thousand words or so, and they're one of the better ones.

It also varies by language. Every time I give an example here of machine translated English-to-Chinese, it's so bad that the responses are all people who can read Chinese being confused because it's gibberish.

And as for politics, as Grok has just been demonstrating, they're quite capable of whatever bias they've been trained to have or told to express.

But it's worse than that, because different languages cut the world at different joints, so most translations have to make a choice between literal correctness and readability — for example, you can have gender-neutral "software developer" in English, but in German to maintain neutrality you have to choose between various unwieldy affixes such as "Softwareentwickler (m/w/d)" or "Softwareentwickler*innen" (https://de.indeed.com/karriere-guide/jobsuche/wie-wird-man-s...), or pick a gender because "Softwareentwickler" by itself means they're male.


no, "Softwareentwickler" doed NOT mean the person is male. It's the correct german form for either male OR generic. (generisches Maskulinum)


Same is true in Polish, but the feminist movement insists this is not acceptable and tries to push feminatives.

I personally have no strong opinion on this, FWIW, just confirming GP's making a good point there. A translated word or phrase may be technically, grammatically correct, but still not be culturally correct.


> While LLMs are pretty good, and likely to improve, my experience is OpenAI's offerings absolutely make stuff up after a few thousand words or so, and they're one of the better ones.

That's not how you get good translations from off-the-shelf LLMs! If you give a model the whole book and expect it to translate it in one-shot then it will eventually hallucinate and give you bad results.

What you want is to give it a small chunk of text to translate, plus previously translated context so that it can keep the continuity.

And for the best quality translations what you want is to use a dedicated model that's specifically trained for your language pairs.

> And as for politics, as Grok has just been demonstrating, they're quite capable of whatever bias they've been trained to have or told to express.

In an open ended questions - sure. But that doesn't apply to translations where you're not asking the model to come up with something entirely by itself, but only getting it to accurately translate what you wrote into another language.

I can give you an example. Let's say we want to translate the following sentence:

"いつも言われるから、露出度抑えたんだ。"

Let's ask a general purpose LLMs to translate it without any context (you could get a better translation if you'd give it context and more instructions):

ChatGPT (1): "Since people always comment on it, I toned down how revealing it is."

ChatGPT (2): "People always say something, so I made it less revealing."

Qwen3-235B-A22B: "I always get told, so I toned down how revealing my outfit is."

gemma-3-27b-it (1): "Because I always get told, I toned down how much skin I show."

gemma-3-27b-it (2): "Since I'm always getting comments about it, I decided to dress more conservatively."

gemma-3-27b-it (3): "I've been told so often, I decided to be more modest."

Grok: "I was always told, so I toned down the exposure."

And how humans would translate it:

Competent human translator (I can confirm this is an accurate translation, but perhaps a little too literal): "Everyone was always saying something to me, so I tried toning down the exposure."

Activist human translator: "Oh those pesky patriarchal societal demands were getting on my nerves, so I changed clothes."

(Source: https://www.youtube.com/watch?v=dqaAgAyBFQY)

It should be fairly obvious which one is the biased one, and I don't think it's the Grok one (which is a little funny, because it's actually the most literal translation of them all).


>> While LLMs are pretty good, and likely to improve, my experience is OpenAI's offerings absolutely make stuff up after a few thousand words or so, and they're one of the better ones.

> That's not how you get good translations from off-the-shelf LLMs! If you give a model the whole book and expect it to translate it in one-shot then it will eventually hallucinate and give you bad results.

You're assuming something about how I used ChatGPT, but I don't know what exactly you're assuming.

> What you want is to give it a small chunk of text to translate, plus previously translated context so that it can keep the continuity

I tried translating a Wikipedia page to support a new language, and ChatGPT rather than Google translate because I wanted to retain the wiki formatting as part of the task.

LLM goes OK for a bit, then makes stuff up. I feed in a new bit starting from its first mistake, until I reach a list at which point the LLM invented random entries in that list. I tried just that list in a bunch of different ways, including completely new chat sessions and the existing session, it couldn't help but invent things.

> In an open ended questions - sure. But that doesn't apply to translations where you're not asking the model to come up with something entirely by itself, but only getting it to accurately translate what you wrote into another language.

"Only" rather understates how hard translation is.

Also, "explain this in Fortnite terms" is a kind of translation: https://x.com/MattBinder/status/1922713839566561313/photo/3


This is just not true, LLMs struggle very hard with even basic recursive questions, nuances and dialects


But as a customer cannot know that, they will tend to consume (and mostly trust) whatever LLM result is given.


Yes indeed. After a few years humans will be trained to accept the low tier AI translations as the new normal, hopefully I'm dead by then already.


ChatGPT is always in my pocket, I can use it effortlessly when I'm travelling.

My Chinese isn't good enough to explain the difference between ice cream and gelato to my in-laws but ChatGPT gave me a good-enough output in seconds, this far exceeds anything that has come before. A friend (who speaks zero Chinese) was able to have conversations with his in-laws using one of those in-ear translation devices.

Normal people would never ever hire translators in this type of situations and now our spouses can also relax on vacation :)


Maybe for dry text. Translation of art is art too and there's no such thing as higher quality art.


I’m intrigued by this statement. It seems obvious to me that some artworks are ‘higher quality’ than others. You wouldn’t, I’d presume, consider the Sistine Chapel or the Mona Lisa to be the same quality as a dickbutt scribbled on a napkin?


>You wouldn’t, I’d presume, consider the Sistine Chapel or the Mona Lisa to be the same quality as a dickbutt scribbled on a napkin?

To paraphrase Frank Zappa...Art just needs a frame. If you poo on a table...not art. If you declare 'my poo on the table will last from the idea, until the poo dissappears', then that is art. Similarly, banksy is just graffiti unless you understand (or not) the framing of the work.


You can't compare translation to creating new works of art. Sorry mom, but that's apples and oranges. A dangerously false comparison.


If you speak more than one language(esp something like Chinese or Japanese) you understand how subjective some choices are. It certainly takes creative decision making.


I speak Japanese natively and hell, I'm just going to say, there is no such thing as translation, there is just foreign language ghostwriting.

I'm not even sure if bilingualism is real or if it's just an alternate expression for relatively benign forced split personality. Could very well be.


As noted in another sub-thread, translations are indeed works of art. As evidence, my mom has received royalties for her translations for decades, both from sales and from library lending. And she could sue for copyright infringement if someone stole her translation. The only difference is that she needs permission to distribute the translation, unless it’s translated from the public domain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: