I used the API from Together[0]. Thanks for sharing your results, they're indeed...

I used the API from Together[0].

Thanks for sharing your results, they're indeed pretty different. I looked at the source again and did append a "# " before every prompt made by those 10 `code` models (during testing thought that formatting it as a Python comment might help them).

Will re-run the script without that to see if it matches your results.

[0] https://docs.together.ai/docs/models-inference#code-models