Even if output is blocked, if it can be demonstrated that the copyrighted materi...

Even if output is blocked, if it can be demonstrated that the copyrighted material is still in the model then you become liable for distribution and/or duplication without a license.

Training on synthetic data is interesting, but how do you generate the synthetic data? Is it turtles all the way down?