In many privacy regulations, any unique identifier is treated as Personally Iden...

ensignavenger · on Feb 3, 2021

In typical username+password systems, the passwords are not unique, and thus the password hash would be an inappropriate lookup key.

motohagiography · on Feb 3, 2021

It wouldn't be, unless you were sharing an unsalted hash database across organizations, where it was aggregated with other ones, then that hash is a unique identifier. Uniqueness becomes an issue in things like de-identification, which is a legal concept, and pseudonymization, which is basically tokenization, but not an information theoretic one like say, k-anonymity, or a cryptographic one like entropy.

If these distinctions are new to people on this thread, I recommend https://www.oreilly.com/library/view/building-an-anonymizati... for some background.

cookingmyserver · on Feb 3, 2021

So, correct me if I am wrong, but you are saying that uniqueness when it comes to data regulation is different than uniqueness in information theory.

Even though I generate a random salt when hashing passwords I wouldn't use it to select a user out of a database because I don't check for collisions by ensuring a unique salt that has never been used before. The chance of a collision is tiny, but I still don't want to do it. But this would still qualify as unique under data regulations?

motohagiography · on Feb 3, 2021

The fact of a collision does not generally have regulatory. meaning unless your hashing scheme was for de-identification and your re-identification risk assessment of the method showed it was trivial to re-identify someone because of said collisions. Also, logically, the lower the collision rate, the more unique the identifier. I recommend reading the book I linked above.

The policy view of what these things mean vs. what we talk about in terms of entropy, collisions, confusion, etc, are subtly different things. Of course it depends on the regs regime, but here's an example from what I do:

A business wants to share 10million of personal profiles keyed on a SIN with another business or agency for some research. Privacy law says they can't share the data unless the personally identifying information is removed. Business tells their DBA to "encrypt" the SIN numbers, who says "sure" and SHA256's them all and shares the data set, because to him, now the SINs are not shared(!).

A privacy policy analyst freaks out because hashing the SINs has done nothing to protect the identities of the people in the data profiles. The DBA can't figure out why because he used an NIST approved 256bit hash on them and he tells his boss "it's fine, they're encrypted."

To your case with the salt: if you salt the hashes, the agency receiving the file calls back and says they can't use the data because the hashes don't match the profiles in their database - because they have been using the hashes as unique identifiers and re-identifying people in the data set. If you salt them for your initial data sharing, and then re-hash them with a new salt for an update, that's better, but it is still a transferable identifier for that cut of the data.

Even if you tokenize each profile with a UUID and transfer that UUID between agencies, you are transferring a unique identifier about the person. The right way to do this is to have a tokenization service broker that takes records and synthesizes new record keys (MBUN) for each destination counterparty you are sharing the data set with. Hardly anyone does this, and they just take the privacy risk instead.

The policy vs. info theory difference is that policy can conflate hashing and encryption depending on the purpose of the use, where in tech and security, they are totally different things.

hxtk · on Feb 3, 2021

In the theoretical sense what you care about is whether you can safely assume those hashes will always be unique.

In the practical sense what matters is whether you can safely assume the hash for a particular user in a particular system will ever be unique.

E.g., if I had a snapshot of a database mapping identities to password hashes and a log of all the hashes that had been computed by the auth server, I could make very reasonable guesses about who logged in at what time.

With that said, I am not a lawyer and I have no idea what the legal significance of all that might be.

jtdev · on Feb 3, 2021

The passwords should be salted, and therefore unique.

DoofusOfDeath · on Feb 3, 2021

Thanks for sharing! I'm curious about the exact dividing line between PII and non-PII. So if you know, could you clarify a detail?

Your phrasing was: "If the hashed password *is used as* a lookup key, that makes it a unique identifier, and it's PII."

If I take your wording literally, your saying that I could store everyone's SSN in a database, but as long as my system didn't attempt to use SSN as a unique identifier, the SSN doesn't count as PII.

But it seems unlikely that that's how you meant it.

derekp7 · on Feb 3, 2021

"If A, then B" does not necessarily mean "If Not A, then Not B". Now if the first logical statement was "If, and only if A, then B", then the second logical statement would follow.

geoduck14 · on Feb 3, 2021

Wow. I had never considered that. I always wondered why some companies don't let me SSO into different parts of their apps.

motohagiography · on Feb 3, 2021

It's an interesting issue now because norms have changed. That specific privacy view is a very 90's-00's worldview that assumed silos and didn't anticipate using your webmail login for literally everything. Examples include legal firewalls between divisions of banks where they can't use your transaction history to make car insurance decisions. Should an airline know your income before quoting you a fare? (which is why some people use Tor and proxies to search for flights)

Privacy is anti-discrimination, and the reason social media companies are so rich is because they sell micro-discrimination as a service. It's so valuable because when people see how it works they ask, "how is this even legal?"

EGreg · on Feb 3, 2021

Okay maybe not PII, but can they share PIII or PIV?