Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Anonymize / de-identify LLM chat history export, post-processing
2 points by msiraj1 48 days ago | hide | past | favorite | 1 comment
Hi all, this is my first question! I have been finding a lot of pre-processing tools to anonymize prompt data, but was wondering if anyone knew of tools that could be used in post-processing llm chat history files.

I want to conduct a study that strives to more easily anonymize the participant chat history so that when I receive it, it reduces PII risk.

another step I will need to add is just dropping chats that discuss personal health or rather summarizes chats that discuss topics of personal health? I really don't know, hence me asking here before just developing it on my own!



This is a pretty annoying problem right now. The closest that I’ve seen is https://microsoft.github.io/presidio/

Unfortunately, it requires a decent amount of customization to do anything AFAICT




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: