The Mesos detail struck me as a risk for the reason you stated.
Also, the article seems to suggest Twitter only has two datacenters. That seems surprising for the global reach of the company. Perhaps there are other smaller datacenters that are not prepared to handle the entirety of the site’s traffic.
My current thinking is there’s time to figure out how to operate the current system before it runs into issues that would render it degraded for a prolonged period of time. I noticed TLS certs have already rotated for instance. That was my best guess for simple thing that could fail if managed poorly.
The company was moving off it. I wish I could find a twitter eng blog about it. Interesting there's a bunch of not directly from twitter sources about the decision though.
Twitter didn't start yesterday. There was a time when Mesos was all the hotness and k8s was this new thing that looked promising but wasn't nearly production ready.
Apple is also a big Mesos user, but also moving to k8s.
They were one of the first users of Mesos, but they didn't create it.
> Mesos began as a research project in the UC Berkeley RAD Lab by then PhD students Benjamin Hindman, Andy Konwinski, and Matei Zaharia, as well as professor Ion Stoica.
> The social networking site Twitter began using Mesos and Apache Aurora in 2010, after Hindman gave a presentation to a group of Twitter engineers.
Mesos is dead. So you need in-house expertise to patch it without being able to leverage community knowledge.
Does Twitter retain enough people to manage Mesos?