The examples they give all look like valid uses of different Non-breaking spaces, with width hints for their use/location, this might be a little overzealous if written by a human but perhaps not for a machine.
Yeah, it's probably not an intentional watermark, just something the model has been trained to do. Maybe some professionally written news articles already use them for the same purpose?
Still hope HN adds a filter to block any comment with those characters in it :)
It is very easy to filter those out from the output of GPT, though, using basic UNIX utilities. In fact, many methods don't survive reformatting or copy-pasting, not requiring filtering at all.
It is a very basic watermark technique (text steganography) if it indeed is supposed to be one.
A more advanced one would be a linguistic (grammar-based) one, but I am not going to give any more ideas. :D
It's easy to remove those characters, but that still requires being aware of them, and an intent to deceive. So many people just copy LLM output here because they (wrongly) believe it adds something of value to a discussion.
I mean, sure, these characters could be used to help estimate the likelihood text was generated (because human writers might be less likely to add proper non-breaking spaces), but I doubt these are watermarks.