I don't think including "noncompetitive" games is an issue. For a game with so many possible states, it only makes sense to ask about what moves have been played at all, and not the context that these moves were played in.
Plus, restricting the dataset introduces more biases and ambiguities. What exact ELO should be "good enough" for consideration? Why not a point higher or lower? Should they have accounted for time control too, because people in speed chess play worse and can get into weird situations they otherwise wouldn't have been in?
Plus, restricting the dataset introduces more biases and ambiguities. What exact ELO should be "good enough" for consideration? Why not a point higher or lower? Should they have accounted for time control too, because people in speed chess play worse and can get into weird situations they otherwise wouldn't have been in?