A quick read through of their anonymization process seems to indicate that they ...

bawolff · 2025-05-21T18:03:51 1747850631

I can't help but think that if you say something in a public forum you should implicitly give up the right to privacy.

E.g. if someone scraped hackernews and made a dataset containing this comment, i don't think i should have any right to complain.

jowea · 2025-05-21T15:45:55 1747842355

I understand wanting to be careful, but didn't they only grab messages from servers that are already very public? Are Twitter message datasets anonymized?

Cynddl · 2025-05-21T16:01:52 1747843312

That's not how GDPR works and in this case the data is clearly anonymised despite the authors' claims. Amongst others, there needs to be mechanisms for users to delete their data, whether it was at some point public or not.

jowea · 2025-05-21T16:11:07 1747843867

Yeah there probably is some GDPR implication somewhere, I wasn't speaking on the legal aspects.

ronsor · 2025-05-21T16:23:46 1747844626

The authors can presumably update the dataset on the site; however, I think past versions remain. Besides that, the GDPR is at odds with the fact that public posts and data almost never goes away. I don't think that reality can be legislated away, try as politicians might.

In all honesty, it's better to reserve the effectiveness for private, personal data, for the sake of practicality.