Mastering the Seamless Upgrade: From CentOS Stream 8 to 9 Without Downtime

Downtime is costly—whether you’re running stateless containers or legacy stateful workloads. The upgrade from CentOS Stream 8 to 9 is significant: kernel changes, library updates, deprecations, and new defaults. But full application stoppage is rarely acceptable, especially in production clusters or on hosts running critical services.

Below, a pragmatic path for a high-assurance in-place upgrade—built on live experience with both bare metal and VM deployments.

Why Move to CentOS Stream 9?

Security lifecycle: CentOS Stream 8's active support ends soon; Stream 9 receives CVE patches and critical fixes.
Kernel and component updates: Stream 9 ships with a newer kernel (5.14.x), up-to-date container runtimes, and improved storage drivers. File system behavior (XFS, Btrfs) shifts subtly—read the kernel changelogs.
Application compatibility: Most current vendor and open source software is already shifting their build matrices to el9.
Risk reduction: Delaying migration adds layers of undocumented drift.

Where Upgrades Go Wrong

A straight DNF upgrade without prep can leave you with:

Broken dependencies (common: Python virtualenvs, custom kernel modules).
Unexpected SELinux denials due to updated policies.
Third-party repositories lagging behind, causing No module named ... errors at runtime.

Rarely discussed: Upgrades may override /etc configs (but not in /usr/local); confirm your configuration management tools cover the right surfaces.

Upgrade Architecture

+-----------+       Clone          +-----------+
| Prod VM   |  ----------->        | Test VM   |
| (Stream 8)|                     |(Clone)    |
+-----------+                     +-----------+

     |                                   |
 [Backup/Snapshot]                [Preview upgrade]
     |                                   |
 [Final sync/app drain]          [Resolve conflicts]
     |                                   |
     +----- In-place DNF system-upgrade --+
     |
 [Services restart, health checks, rollbacks if needed]

Side note: On Kubernetes or with HAproxy, rolling host nodes is safer as pods/applications can migrate, keeping user impact near zero.

Step 1: Backup and State Inventory

Filesystem: rsync home/opt data to external storage; capture a VM snapshot for large-scale protection.
Databases: Use mysqldump, pg_dump; schedule brief read-only windows if necessary.

System state:

rpm -qa --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n" > installed-8.txt
cp -a /etc /var/backups/etc-8-$(date +%F)

Confirm remote out-of-band management works (ILO/iDRAC/IPMI or cloud console).

Step 2: Test the Upgrade in a Clone

Clone your target host, preferably at the hypervisor level:

virt-clone --original centos8-prod --name centos9-preflight --auto-clone

Alternatively, a disk image plus a new VM.

Inside the clone:

dnf install -y dnf-plugin-system-upgrade
dnf system-upgrade download --releasever=9 --allowerasing
# Known issue: Some el8 EPEL packages will block the transaction. Remove/replace as needed.
dnf system-upgrade reboot

Monitor for errors. Typical snag:

Error: Transaction test error:
  file /usr/bin/python3.6 from install of python3-3.9.2-... conflicts with file from package python3-3.6.8-...

Remove legacy packages or update scripts to use #!/usr/bin/python3.

Test application stack end-to-end: runCI jobs, init containers, or manual health checks.

Step 3: Minimize Production Downtime

Containerize critical applications if possible (move DBs and middleware out to other hosts or pods).
Drain applications: For classic VMs, schedule a short maintenance window. For HA clusters, disable node in pool, finish draining, then upgrade.

Upgrade sequence on prod:

dnf clean all
dnf install -y dnf-plugin-system-upgrade
dnf system-upgrade download --releasever=9 --allowerasing --disablerepo='*debug*'
# Confirm no blockers:
awk '/Problem/{print $0}' /var/log/dnf.log
dnf system-upgrade reboot

After reboot:

cat /etc/centos-release  # Should show Stream 9.
systemctl daemon-reload
systemctl restart nginx mariadb  # Swap in your actual services.

Tip: Run post-upgrade validation scripts—yours, not the distro's.

Step 4: Rollback and Recovery

If services break, restore disks or revert VM snapshot. If kernel panic occurs, boot previous kernel from GRUB menu.
Non-obvious test: Boot the fallback kernel before you need it, just once, so you know if it works.

grubby --info=ALL | grep '^kernel='

SELinux in enforcing mode may cause silent application failures post-upgrade; try temporary permissive if roll-forward looks feasible.

Advanced Notes

Disable all non-essential third-party repos before the upgrade:
```
dnf config-manager --set-disabled my-custom-repo
```

Remove orphans:

dnf repoquery --unsatisfied
package-cleanup --orphans

In multi-node load-balanced setups (e.g. LAMP behind F5/BIG-IP or NGINX), upgrade nodes one-by-one, verify health, then return traffic to pool.

Gotcha: Kernel live-patching won’t save you across a major upgrade boundary—patches get invalidated.

Conclusion

A clean, test-validated dnf-system-upgrade workflow lets you converge on CentOS Stream 9 without unplanned downtime. The true risk isn’t the upgrade itself, but unknowns—package orphaning, config drift, and missed pre-checks. Plan for rollback, but invest more in a thorough dry run.

References

CentOS Stream Release Notes
DNF System Upgrade Plugin Docs
Field experience, RHEL migration guides, and upstream changelogs

Have a trick for large-scale upgrades without cluster orchestration? Add it to the discussion below.

Centos Stream 8 To 9