Mastering the Seamless Upgrade: From CentOS Stream 8 to 9 Without Downtime
Downtime is costly—whether you’re running stateless containers or legacy stateful workloads. The upgrade from CentOS Stream 8 to 9 is significant: kernel changes, library updates, deprecations, and new defaults. But full application stoppage is rarely acceptable, especially in production clusters or on hosts running critical services.
Below, a pragmatic path for a high-assurance in-place upgrade—built on live experience with both bare metal and VM deployments.
Why Move to CentOS Stream 9?
- Security lifecycle: CentOS Stream 8's active support ends soon; Stream 9 receives CVE patches and critical fixes.
- Kernel and component updates: Stream 9 ships with a newer kernel (
5.14.x
), up-to-date container runtimes, and improved storage drivers. File system behavior (XFS, Btrfs) shifts subtly—read the kernel changelogs. - Application compatibility: Most current vendor and open source software is already shifting their build matrices to
el9
. - Risk reduction: Delaying migration adds layers of undocumented drift.
Where Upgrades Go Wrong
A straight DNF upgrade without prep can leave you with:
- Broken dependencies (common: Python virtualenvs, custom kernel modules).
- Unexpected SELinux denials due to updated policies.
- Third-party repositories lagging behind, causing
No module named ...
errors at runtime.
Rarely discussed: Upgrades may override /etc
configs (but not in /usr/local
); confirm your configuration management tools cover the right surfaces.
Upgrade Architecture
+-----------+ Clone +-----------+
| Prod VM | -----------> | Test VM |
| (Stream 8)| |(Clone) |
+-----------+ +-----------+
| |
[Backup/Snapshot] [Preview upgrade]
| |
[Final sync/app drain] [Resolve conflicts]
| |
+----- In-place DNF system-upgrade --+
|
[Services restart, health checks, rollbacks if needed]
Side note: On Kubernetes or with HAproxy, rolling host nodes is safer as pods/applications can migrate, keeping user impact near zero.
Step 1: Backup and State Inventory
- Filesystem:
rsync
home/opt data to external storage; capture a VM snapshot for large-scale protection. - Databases: Use
mysqldump
,pg_dump
; schedule brief read-only windows if necessary. - System state:
rpm -qa --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n" > installed-8.txt cp -a /etc /var/backups/etc-8-$(date +%F)
- Confirm remote out-of-band management works (ILO/iDRAC/IPMI or cloud console).
Step 2: Test the Upgrade in a Clone
Clone your target host, preferably at the hypervisor level:
virt-clone --original centos8-prod --name centos9-preflight --auto-clone
Alternatively, a disk image plus a new VM.
Inside the clone:
dnf install -y dnf-plugin-system-upgrade
dnf system-upgrade download --releasever=9 --allowerasing
# Known issue: Some el8 EPEL packages will block the transaction. Remove/replace as needed.
dnf system-upgrade reboot
Monitor for errors. Typical snag:
Error: Transaction test error:
file /usr/bin/python3.6 from install of python3-3.9.2-... conflicts with file from package python3-3.6.8-...
Remove legacy packages or update scripts to use #!/usr/bin/python3
.
Test application stack end-to-end: runCI jobs, init containers, or manual health checks.
Step 3: Minimize Production Downtime
- Containerize critical applications if possible (move DBs and middleware out to other hosts or pods).
- Drain applications: For classic VMs, schedule a short maintenance window. For HA clusters, disable node in pool, finish draining, then upgrade.
Upgrade sequence on prod:
dnf clean all
dnf install -y dnf-plugin-system-upgrade
dnf system-upgrade download --releasever=9 --allowerasing --disablerepo='*debug*'
# Confirm no blockers:
awk '/Problem/{print $0}' /var/log/dnf.log
dnf system-upgrade reboot
After reboot:
cat /etc/centos-release # Should show Stream 9.
systemctl daemon-reload
systemctl restart nginx mariadb # Swap in your actual services.
Tip: Run post-upgrade validation scripts—yours, not the distro's.
Step 4: Rollback and Recovery
- If services break, restore disks or revert VM snapshot. If kernel panic occurs, boot previous kernel from GRUB menu.
- Non-obvious test: Boot the fallback kernel before you need it, just once, so you know if it works.
grubby --info=ALL | grep '^kernel='
- SELinux in
enforcing
mode may cause silent application failures post-upgrade; try temporarypermissive
if roll-forward looks feasible.
Advanced Notes
- Disable all non-essential third-party repos before the upgrade:
dnf config-manager --set-disabled my-custom-repo
- Remove orphans:
dnf repoquery --unsatisfied package-cleanup --orphans
- In multi-node load-balanced setups (e.g. LAMP behind F5/BIG-IP or NGINX), upgrade nodes one-by-one, verify health, then return traffic to pool.
Gotcha: Kernel live-patching won’t save you across a major upgrade boundary—patches get invalidated.
Conclusion
A clean, test-validated dnf-system-upgrade workflow lets you converge on CentOS Stream 9 without unplanned downtime. The true risk isn’t the upgrade itself, but unknowns—package orphaning, config drift, and missed pre-checks. Plan for rollback, but invest more in a thorough dry run.
References
- CentOS Stream Release Notes
- DNF System Upgrade Plugin Docs
- Field experience, RHEL migration guides, and upstream changelogs
Have a trick for large-scale upgrades without cluster orchestration? Add it to the discussion below.