How To Tar A File

How To Tar A File

Reading time1 min
#Linux#Backup#Compression#tar#Archiving#Sysadmin

Mastering 'tar' for Reliable, Efficient File Archiving

A corrupt backup on a customer’s NFS share kept me up late once—ownerships gone, timestamps mangled, lost SELinux context. “You should have used tar --xattrs --selinux,” the storage vendor said. They weren’t wrong.


tar (Tape ARchive) remains a default tool for aggregating and compressing files in Linux/Unix environments. Despite origins in streaming to tape, it's the backbone for modern filesystem snapshots, deployable artifacts, and backup pipelines. Its versatility, and occasional sharp edges, demand attention to detail.

Syntax, At a Glance

tar [flags] [archive-file.tar] [sources]

Typical create operation:

tar czf /srv/backup-2024-06-10.tar.gz /etc/ /opt/app/
  • c: Create archive.
  • z: Compress using gzip.
  • f: File name to write/read.

Swapping z for j triggers bzip2; J selects xz.

Table: Common Flags & Their Uses

FlagFunctionGotcha
cCreate archiveN/A
xExtract archiveMistake: extracting as root w/ bad paths
vVerbose outputNoisy on large trees
z/j/Jgzip/bzip2/xz compressionxz is slow but smallest archives
pPreserve permissionsNeeds root to fully honor
--excludeExclude paths/globsCareful: quotes required
--xattrsPreserve extended attributesNot available on every tar build

Beyond Simple Compression: Archival Integrity in Practice

A daily tar czf job works for most home directories. Production and regulated environments—compliance archiving, root filesystem backups—require more rigor.

Preserve Everything Worth Preserving

System files, SELinux labels, or Docker volumes: default tar sometimes misses ACLs and extended attributes unless options are explicitly set.

Full-preservation example (GNU tar 1.30+):

tar cpf backup.tar \
    --xattrs --acls --selinux \
    /etc /var/lib/app

Note: Not all file systems or tar builds on minimal containers ship with full feature support. Always test restore against a scratch VM.

Exclude Intelligently

Backups balloon quickly. Excluding build artifacts, node_modules, or temp folders saves bandwidth and disk:

tar czf project-src.tar.gz \
    --exclude='*.log' \
    --exclude='build/' \
    src/

For complex rules, store excludes in a file:

tar czf home.tar.gz --exclude-from=exclude.list /home/user/

Contents of exclude.list:

*.cache
Downloads/tmp/

Compression: Performance vs. Size

Gzip (default in most scripts) is a balanced choice. For large archives and multi-core systems, consider pigz:

tar cf - ./dataset | pigz -9 > dataset-$(date +%F).tar.gz
  • pigz: Parallel gzip; ~4–8x faster on modern CPUs
  • Downside: requires external package (apt install pigz)
  • Trade-off: maximal -9 is slowest but smallest; adjust per I/O budget

xz compression:

tar cJf logs.tar.xz /var/log/
  • Smallest size, slowest compression/decompression.

Extraction: Minimize Damage, Maximize Safety

Typical restore (with permission preservation):

tar xzvpf backup.tar.gz -C /tmp/recovery/
  • -C: Explicit extraction path (never restore directly to / unless required)
  • p: Tries to restore original ownership/mode—only feasible as root

If the archive includes absolute paths (e.g., /etc/passwd), set --strip-components=1 to flatten and avoid overwriting critical system files.

tar xzpf backup.tar.gz --strip-components=1 -C /my/chroot

Known issue: Extraction with --strip-components may lose parent structure required by some relative links.


Real-World Problem: Incomplete Backups

tar will skip files if read permissions are missing; it reports skipped files on stderr. Always inspect for errors:

tar: ./ssl/private.key: Warning: Cannot open: Permission denied

Solution: Run under sufficient privileges and redirect error logs for audit:

tar czpf /srv/snapshot.tar.gz /srv/app > /var/log/backup.log 2>&1

Ensuring Archive Integrity

Post-backup, always validate archive integrity:

sha256sum archive.tar.gz > archive.tar.gz.sha256
sha256sum -c archive.tar.gz.sha256

For critical volumes, consider running tar --diff (GNU tar only) post-restore to confirm correct permission and content restoration.


Shortcuts, But Carefully

Avoid archiving with absolute paths unless the archive is restoring to its original location; otherwise, risk overwriting unrelated files. Explicitly add --absolute-names if genuinely needed, but rarely recommended.

Tip: Archive for Portability

If sharing tarballs between hosts with different user/group structures, or older tar versions (e.g., RHEL 6’s tar 1.23), preempt issues with ownership mismatches and missing xattrs. Prefer using plain --numeric-owner for cross-system transfer, and verify tar version compatibility:

tar --version

In Summary

“tar” is only as good as its arguments—backup reliability hinges on details: metadata preservation, error monitoring, compression trade-offs. Test both creation and extraction paths before trusting your data to automation.

Hidden complexity? Absolutely. But once mastered, tar is a rock-solid building block for filesystem backup, container image build chains, and artifact delivery.


Not perfect—no tool is. Internal repo? Use checksum manifests, or for large datasets, consider .tar plus zstd piping or filesystem snapshots as alternatives.

Gotcha: Always test your restores. An untested backup is one step away from data loss.