Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks to me like these "watermarks" are embedded in monetary numbers, acronyms etc. Maybe to stop them breaking into different lines?


U+202F in the screenshot, between "FY" and "2024", is the "Narrow No-Break Space ". Similarly, U+A0 is the "No-Break Space" (aka  ).

It's not watermarks, it's just scraped typography.


The examples they give all look like valid uses of different Non-breaking spaces, with width hints for their use/location, this might be a little overzealous if written by a human but perhaps not for a machine.

https://en.wikipedia.org/wiki/Non-breaking_space

Different display apps may "display them identically" but others and typesetters/printing apps might not.


Yeah, it's probably not an intentional watermark, just something the model has been trained to do. Maybe some professionally written news articles already use them for the same purpose?

Still hope HN adds a filter to block any comment with those characters in it :)


It is very easy to filter those out from the output of GPT, though, using basic UNIX utilities. In fact, many methods don't survive reformatting or copy-pasting, not requiring filtering at all.

It is a very basic watermark technique (text steganography) if it indeed is supposed to be one.

A more advanced one would be a linguistic (grammar-based) one, but I am not going to give any more ideas. :D


It's easy to remove those characters, but that still requires being aware of them, and an intent to deceive. So many people just copy LLM output here because they (wrongly) believe it adds something of value to a discussion.


I do not think either that pasting output of LLM typically adds anything to the conversation. It might, usually it does not.


This is the most likely explanation.

I mean, sure, these characters could be used to help estimate the likelihood text was generated (because human writers might be less likely to add proper non-breaking spaces), but I doubt these are watermarks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: