It's because of the SAT solver for dependencies. Unlike Pip, it keeps track of every package you installed and goes out of its way to avoid installing incompatible packages.
Why go through all this trouble? Because originally it was meant to be a basic "scientific Python" distribution, and needed to be strict around what's installed for reproducibility reasons.
It's IMO overkill for most users, and I suspect most scientific users don't care either - most of the time I see grads and researchers just say "fuck it" and use Pip whenever Conda refuses to get done in a timely fashion.
And the ones who do care about reproducibility are using R anyway, since there's a perception those libraries are "more correct" (read: more faithful to the original publication) than Pythonland. And TBH I can't blame them when the poster child of it is Sklearn's RandomForestRegressor not even being correctly named - it's bagged trees under the default settings, and you don't get any indication of this unless you look at that specific kwarg in the docs.
Personally, I use Conda not for reproducibility, but so all of my projects have independent environments without having to mess with containers
> And the ones who do care about reproducibility are using R anyway
I worked in a pharma company with lots of R code and this comment is bringing up some PTSD. One time we spent weeks trying to recreate an "environment" to reproduce a set of results. Try installing a specific version of a package, and all the dependencies it pulls in are the latest version, whether or not they are compatible. Nobody actually records the package versions they used.
The R community are only now realising that reproducible environments are a good thing, and not everybody simply wants the latest version of a package. Packrat was a disaster, renv is slightly better.
> Personally, I use Conda not for reproducibility, but so all of my projects have independent environments without having to mess with containers
A perfectly reasonable goal, yup! Thankfully not one that, in fact, requires conda. Automated per-project environments are increasingly the default way of doing things in Python, thank goodness. It's been a long time coming.
Why go through all this trouble? Because originally it was meant to be a basic "scientific Python" distribution, and needed to be strict around what's installed for reproducibility reasons.
It's IMO overkill for most users, and I suspect most scientific users don't care either - most of the time I see grads and researchers just say "fuck it" and use Pip whenever Conda refuses to get done in a timely fashion.
And the ones who do care about reproducibility are using R anyway, since there's a perception those libraries are "more correct" (read: more faithful to the original publication) than Pythonland. And TBH I can't blame them when the poster child of it is Sklearn's RandomForestRegressor not even being correctly named - it's bagged trees under the default settings, and you don't get any indication of this unless you look at that specific kwarg in the docs.
Personally, I use Conda not for reproducibility, but so all of my projects have independent environments without having to mess with containers