How To Ubuntu

How To Ubuntu

Reading time1 min
#Ubuntu#Linux#Server#Sysadmin#SystemAdministration#DevOps

How to Master Ubuntu System Administration for Reliable, Scalable Server Operations

Ubuntu anchors production workloads at every scale, from single-node LAMP stacks to thousands of VM instances running in public clouds. Those systems only approach “secure” or “reliable” once sysadmins look past default installs and idle dashboards. Below: a focused playbook developed from years of supporting real-world deployments—covering fundamental tools, operational gaps, and sharp edges only noticed after 2 a.m. outages.


CLI-First: No Excuses

Stop bouncing between terminal and desktop UI. Nearly every critical system task—maintenance, troubleshooting, policy enforcement—happens fastest on the CLI. At minimum, proficiency with the following is non-negotiable:

  • apt, dpkg: Package lifecycle and patch interrogation (apt list --upgradable)
  • systemctl, journalctl: Service management, log aggregation
  • ss, netstat: Socket and connection analysis
  • ufw, iptables: Firewall rules
  • ssh: Always with key-based auth (ssh-copy-id), not passwords

Patch discipline: On production, never just sudo apt upgrade -y && sudo reboot; always test in a staging clone (do-release-upgrade can break custom configs and third-party extensions). For batch fleets, consider unattended-upgrades with selective pinning for kernel and security-related patches.

Crontab pitfall:

0 3 * * 1 root apt update && apt upgrade -y && reboot

This can cause multi-node downtime if applied carelessly; instead, stagger or couple with load-balancer draining scripts.


User and Access Model: Beyond “Add a Sudoer”

Security starts at identity—sloppy configuration here creates attack surfaces.

  • Provision named, audited sudo users only. Avoid direct root SSH; use sudo su - escalation.
  • Functional groups: Use group memberships for lifecycle management (teams, service roles). Example:
    sudo adduser alice
    sudo groupadd ci_admins
    sudo usermod -aG sudo,ci_admins alice
    
  • SSH lockdown: PermitRootLogin no, PasswordAuthentication no in /etc/ssh/sshd_config. Always verify sshd reload:
    sudo systemctl reload sshd
    journalctl -u sshd | tail
    
  • Integrate with LDAP or FreeIPA for centralized management if scaling beyond 10 servers.

Brute force? fail2ban works, but aggressive config can lock out legitimate users after multiple VPN drops.


Logging and Troubleshooting Loops

Don’t wait for disks to fill before you notice log issues. Default syslog/journalctl retention can swamp /var/log.

Service-specific logs:

journalctl -u nginx -S "2024-06-01" -p warning

Filter by timestamp and severity.

Logrotate tweaks: Most packaged configs are conservative; using 7 days retention on /var/log/nginx/*.log:

/var/log/nginx/*.log {
    daily
    rotate 7
    compress
    missingok
    delaycompress
    notifempty
    create 640 root adm
    sharedscripts
    postrotate
        systemctl reload nginx >/dev/null 2>&1 || true
    endscript
}

Gotcha: Large web workloads may require hourly rotation; missed this once and burned a full SSD in a weekend.


Backups: If You Don’t Test Restores, It’s Not a Backup

rsync is fine for static data. For databases, always dump AND test-reimport to a scratch instance.

Files:

rsync -aAXH --delete /etc/ backup@vault:/backups/etc/
  • -AAXH preserves attributes, hard links—useful for system state.

MySQL:

mysqldump --single-transaction --routines --triggers dbname | gzip > /backups/dbname-$(date +%F).sql.gz

For PostgreSQL, use pg_dump. Schedule via /etc/cron.d/ not user crontab, for clearer audit trails.

Non-obvious tip: Test recovery under a different server UID context. SELinux/AppArmor profiles can silently block restore.


Web Server Performance: Push Past Defaults

Ubuntu defaults rarely match actual traffic profiles.

Nginx:

Tuning worker_processes and worker_connections can double concurrency for static sites:

worker_processes auto;
worker_connections 2048;
keepalive_timeout 45;

Set server_tokens off to suppress version leaks.

Apache2:

Consider:

sudo a2dismod mpm_prefork
sudo a2enmod mpm_event
sudo systemctl restart apache2

MPM Event handles high concurrency better. Profile modules with apache2ctl -M.

Cache headers:

location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
    expires 30d;
    add_header Cache-Control "public, immutable";
}

Known issue: Nginx reloads can drop a handful of inflight connections—prefer rolling restarts behind a load balancer if zero downtime is mandatory.


Monitoring: Feedback Loops at Every Layer

No visibility, no uptime. At a minimum:

MetricTool
CPU, RAMtop, htop, glances
Disk IOiostat, dstat
System logsjournalctl, logwatch
Endpoint healthcurl, check_http (Nagios)

Prometheus + Grafana stack handles large fleets; for a handful of VMs, glances suffices:

sudo apt install glances
glances --export influxdb

Tip: Glances supports alert thresholds—integrate with Slack/email using simple plugins.


Security Hardening: Reduce Surface, Raise Bars

  • Package patching: Automate via unattended-upgrades, but pin critical daemons to avoid silent restarts.
    sudo apt install unattended-upgrades
    sudo dpkg-reconfigure --priority=low unattended-upgrades
    
  • Trim unused services:
    sudo systemctl disable avahi-daemon.service
    sudo systemctl mask cups.service
    
  • Firewall:
    sudo ufw allow 22/tcp comment 'SSH'
    sudo ufw allow 443/tcp comment 'HTTPS'
    sudo ufw enable
    
  • AppArmor: Validate status (aa-status), use Enforce mode where possible. Update profiles if custom binaries are deployed—failure to do so blocks execution silently.
  • Check for world-write permissions:
    find / -xdev -type d -perm -0002
    
    Remove where unnecessary.

Side note: Kernel hardening (sysctl tweaks, e.g., kernel.randomize_va_space=2) pays off only if baseline hygiene above is met.


In practice, discipline—not automation alone—keeps Ubuntu servers reliable. Build a habit of reviewing change logs, routinely restoring backups to scratch systems, and brutalizing configurations in staging before updating anything live. Downtime and botched upgrades are symptoms, not root causes.


Comments open for practical horror stories or overlooked sysadmin tricks. For advanced automation and infra-as-code guides, check the rest of the blog.