The author claims that they tried to avoid that: "[. . .] we had to choose them carefully and experiment to ensure that these documents were not already in the LLM training data (full disclosure: we can’t know for sure, but we took every reasonable precaution)."