One feature I’d love to see is a transformer that instead of providing a random value provides a cryptographic one way hash of the data (ie sha2) - that way key uniqueness stays the same (to avoid unique constraints on columns) and also the same value used in one place will match another value in another table after transformation which more accurately reflects the “shape” of the data.
We do this via Copycat (https://github.com/snaplet/copycat). We generate static "fake values" by hashing your original value to a number, and map that to a fake-value.
This will not work, at least not if we’re talking PII as it is defined by a Somewhat Sane (TM) privacy legislation.
Sure, passwords and credit card info is obscured with your methodology, but names, dates of birth, sexual orientation, telephone numbers, email and ip will remain unique. This uniqueness is what allows you to potentially identify a person given enough data.
>Sure, passwords and credit card info is obscured with your methodology
Even that's problematic, because there may be code that depends on the data being somewhat "real". Credit cards, for example, may need to pass LUHN tests, or have valid BIN sections, etc.
I suppose that what you’d have to do is change the data and then hash it. But once you’ve changed the data it’s no longer PII, so there’s no reason to hash it.
Of course, given enough data that has been changed can potentially allow you to deduce how that data was changed and thus revert it, at which point it would become PII again and you’d have a problem… but that’s probably a fringe scenario
I hate to be so self promoting (I swear I'm just trying to be helpful), but Gretel has that as a transformer you can use[0]. You can test out a lot of our stuff without payment info through our console[1] if you just want to mess around and see if tools like it ( and Replibyte of course :) ) would fit your use case. That being said, you can run into issues using direct transforms like this, depending on the correlated data, because of various known deanonymization attacks. There are some pretty gnarly examples out there if you Google around.
What you're asking for is similar to what goes by the term "tokenization"[1], a technique often used by payment processors to avoid leaking credit card numbers and similar sensitive data. Using the proper transformer might provide the behavior you need.