Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What I think you'd typically do is put different data under different keys/paths, so that red is personally identifiable data, yellow contains pointers to such data, and green is just regular data. You could have a structure like s3://my-data-lake/{red|yellow|green}/{raw|intermediate}/year={year}/month={month}/day={day}/source={system}/dataset={table}

Then you just don't keep red data for longer than 30 days.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: