Just thinking of another way to do this with (hopefully) less human interaction. A local apt repository (Using debmirror with some scripts to generate a list of allowed packages), and set most machines to use that repository. Then set up unattended-upgrades with some extra exclusions (like the kernel upgrades mentioned in the post). A subset of machines with some extra monitoring could be used as canaries, by pointing them at the standard repositories. For monitoring, maybe a dashboard that polls the machines for all installed packages and compares that version to those in the local mirror and the standard mirror?
Another possibility for managing this would be to use a puppet agent / master setup, and use puppet directives to pin sensitive packages (i.e. the ones that comprise your application) to specific versions while allowing the rest of the system to update accordingly (assuming the pinned packages don't cause dependency issues - which should be tested before pushing).
So the process might look like this:
1. Manually update a test system and take note of the packages comprising your application and their new versions ('grep -E "<PATTERN>" --color=always' could be helpful here).
2. Run automated tests against the test build to ensure that new packages have not caused issues.
3. If any breaking changes are discovered, pin the offending packages to their unbroken versions. Rinse and repeat.
4. Once a stable build is found, update your puppet manifests to reflect any pinned packages and run it on a single test system (I use an isolated puppet master test server for this).
5. If all goes well on the test system, update the main puppet master server and wait for the agents to call home (don't forget to update the runinterval directive in puppet.conf so the agents don't call home every 30 minutes - even idempotent processes consume resources).
Not a bad idea. I use Ansible to automated the updates. Only rarely does it fail for one reason or another.
I’m glad you are taking updates seriously. There are plenty of companies who do not. Like mine, who still has DNS on Ubuntu 12, has a fleet of Ubuntu 14 servers on top of that all running important production services (web, DHCP, MySQL etc), and none which have been updated in 3 years.
I think I understand the motivations, but this doesn't seem like a good idea to me. You're already limiting it to "safe" packages, so why not use unattended-upgrades and take the human effort out completely? Or if you need to vet updates before pushing them out, use Pulp or something to control the rollout rather than a custom tool that you have to maintain in the face of edge cases. Heck, I'd probably take Ansible with a list of package versions over this setup.
Edit: Actually, let me slightly walk that back. The described system is probably a reasonable way to maintain a small number of pets. My objections are mostly based on the idea that you're dealing with more than a handful of servers.
Putting safe in quotes is accurate, because our updates are not limited to truly safe packages at all for the simple reason that the Debian package format makes all package updates potentially dangerous. Any package update can decide to ask you new questions and there is no guarantee that the default answer is what you actually want, because it's up to the package author to decide.
As a policy thing, we also want to know what packages were updated when and broadly to control this, and we have enough machines that this information needs to be aggregated. Even if unattended-upgrades was totally safe, we'd want to only trigger it only on demand, see the package list in advance, and get an all-machines summary at the end.
Life would be a lot easier if apt-get updates could be more controlled, so for instance you could say 'only update this list of packages'. But apt-get doesn't want to do that; updates are global and using 'apt-get install' to update specific packages (normally) has side effects, like marking them as manually installed.
(I'm the author of the linked-to blog entry. For reference, we currently have 119 machines that get updated through this system.)
A reason would be to have someone read the list of updates before he presses the button. Then if for some reason weird things start to happen this person should make the connection between the package he updated and the issues. Is similar with code, I like to check the changes others do in my team so I am aware of what areas are changed and what is pushed live.