Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A quick read through of their anonymization process seems to indicate that they didn’t scan the message contents for PII (other than usernames).

If true, that seems like a huge oversight. I also wonder what would happen if someone finds their information in the dataset and requests it to be removed per GDPR or other privacy legislation.



I can't help but think that if you say something in a public forum you should implicitly give up the right to privacy.

E.g. if someone scraped hackernews and made a dataset containing this comment, i don't think i should have any right to complain.


I understand wanting to be careful, but didn't they only grab messages from servers that are already very public? Are Twitter message datasets anonymized?


That's not how GDPR works and in this case the data is clearly anonymised despite the authors' claims. Amongst others, there needs to be mechanisms for users to delete their data, whether it was at some point public or not.


Yeah there probably is some GDPR implication somewhere, I wasn't speaking on the legal aspects.


The authors can presumably update the dataset on the site; however, I think past versions remain. Besides that, the GDPR is at odds with the fact that public posts and data almost never goes away. I don't think that reality can be legislated away, try as politicians might.

In all honesty, it's better to reserve the effectiveness for private, personal data, for the sake of practicality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: