1. they can check to see if the generated code is an exact copy of an example in the training set
2. when the code matches, they can discard it, they got many predictions for each prompt anyway.
3. My preferred option - they can display the URL of the source page together with the code, acting like a regular search engine at this point; this also solves the problem of not knowing the copyright status of the code
1. they can check to see if the generated code is an exact copy of an example in the training set
2. when the code matches, they can discard it, they got many predictions for each prompt anyway.
3. My preferred option - they can display the URL of the source page together with the code, acting like a regular search engine at this point; this also solves the problem of not knowing the copyright status of the code