As a former big tech engineer, I can't help but come up with a gazillion ways to work around these sorts of seemingly straightforward policies.
Here's one way they could get around their own privacy policy: keep track of what % of Claude-generated code is retained in the codebase over time (as an indicator of how high-quality / bug-free the code was); A/B test variations of Claude Code to see which variations have higher retention percentages.
No usage data is retained, no code is retained, no data is used (other than a single floating point number) and yet they get to improve their product atop your usage patterns.
Here's another idea: use a summarization model to transform your session transcript into a set of bits saying "user was satisfied/dissatisfied with this conversation", "user indicated that claude was doing something dangerous", "user indicated that claude was doing something overly complicated / too simple", "user interrupted claude", "user indicated claude should remember something in CLAUDE.md", etc. etc. and then train on these auxiliary signals, without ever seeing the original code or usage data.
I always get a kick out of sheer number of HNers with deep concern about “training on their data” while hacking a crud boot service with nextjs fromt-end :)
Here's one way they could get around their own privacy policy: keep track of what % of Claude-generated code is retained in the codebase over time (as an indicator of how high-quality / bug-free the code was); A/B test variations of Claude Code to see which variations have higher retention percentages.
No usage data is retained, no code is retained, no data is used (other than a single floating point number) and yet they get to improve their product atop your usage patterns.
Here's another idea: use a summarization model to transform your session transcript into a set of bits saying "user was satisfied/dissatisfied with this conversation", "user indicated that claude was doing something dangerous", "user indicated that claude was doing something overly complicated / too simple", "user interrupted claude", "user indicated claude should remember something in CLAUDE.md", etc. etc. and then train on these auxiliary signals, without ever seeing the original code or usage data.