You seem to think that the default assumption is that fixing the format is easy/...

You seem to think that the default assumption is that fixing the format is easy/feasible, and I don't see why. Do you have domain knowledge pointing that way?

It's a truism in machine learning that curating and massaging your dataset is the most labor-intensive and error-prone part of any project. I don't why that would stop being true in healthcare just because lives are on the line.