A quick read through of their anonymization process seems to indicate that they didn’t scan the message contents for PII (other than usernames).
If true, that seems like a huge oversight. I also wonder what would happen if someone finds their information in the dataset and requests it to be removed per GDPR or other privacy legislation.
I understand wanting to be careful, but didn't they only grab messages from servers that are already very public? Are Twitter message datasets anonymized?
That's not how GDPR works and in this case the data is clearly anonymised despite the authors' claims. Amongst others, there needs to be mechanisms for users to delete their data, whether it was at some point public or not.
The authors can presumably update the dataset on the site; however, I think past versions remain. Besides that, the GDPR is at odds with the fact that public posts and data almost never goes away. I don't think that reality can be legislated away, try as politicians might.
In all honesty, it's better to reserve the effectiveness for private, personal data, for the sake of practicality.
If true, that seems like a huge oversight. I also wonder what would happen if someone finds their information in the dataset and requests it to be removed per GDPR or other privacy legislation.