Mastering the Linux find
Command: Beyond Simple Searches
Trivial file lookups are rarely the bottleneck. Real pain points emerge managing application logs, pruning legacy artifacts, or syncing complex directory trees. In those scenarios, find
exposes its real power—filtering and acting on files by nearly any attribute, supporting regular expressions, granular permission checks, and even coordinating batch file operations.
Below, practical techniques for using find
as a force multiplier in day-to-day system administration and automation tasks. Linux distributions considered: RHEL 8+, Debian/Ubuntu 20.04+, kernel ≥4.15. All examples tested with GNU findutils 4.7+.
Composite Attribute Searches
Auditing log directories? Suppose a compliance task requires archiving .log
files larger than 10MB, modified during the last 7 days:
find /var/log -type f -name "*.log" -size +10M -mtime -7
Option | Purpose |
---|---|
/var/log | Root search directory |
-type f | Files only |
-name "*.log" | Pattern match on extension |
-size +10M | Greater than 10MiB |
-mtime -7 | Modified in last 7*24h |
These filters combine as logical AND. A subtle point: -mtime -7
matches files modified within the last seven days to the second, not “since this past week”.
Logical Expressions and Permissions
Find files owned by alice
or writable by others—useful scanning shared project directories for potential misconfigurations:
find /home/projects \( -user alice -o -perm -0002 \)
Parentheses group expressions, but beware: in Bash, parentheses must be escaped (\(
\)
).
-perm -0002
matches files/group/world-writable; -user alice
sets the alternate branch.
Note: Some versions of find
(especially on macOS) handle permissions slightly differently. For portable scripts, always consult man find
.
Command Execution: -exec
and Efficiency
Deleting .tmp
files older than 30 days—a classic tmpwatch replacement:
find /tmp -type f -name "*.tmp" -mtime +30 -exec rm -- {} \;
{}
is substituted for each result; \;
ends the exec segment. Depending on filesystem size, this can create hundreds of forked rm
processes—suboptimal.
Optimization: Many find
implementations support:
find /tmp -type f -name "*.tmp" -mtime +30 -exec rm -- {} +
Here, +
batches results, invoking rm
with as many files as possible per shell call (watch for ARG_MAX limits).
Gotcha: Some legacy find
binaries lack the +
syntax or have quirks with whitespaces in filenames. Always run a preview (-print
) before destructive actions.
Pattern Matching Beyond Wildcards: -regex
Standard glob fails when you need several matching prefixes/types. Instead:
find . -type f -regex '.*/\(notes\|todo\).*\.\(md\|txt\)$'
Regex here must match the entire path, not just the basename. .*\/
anchors the expression to filenames within subdirectories.
Known issue: Regex flavor is POSIX basic; don’t expect extended PCRE capabilities.
Investigating Permissions at Scale
Security audits often require identifying "dangerous" scripts. Suppose you need to find .sh
files executable by owner but not world-readable:
find ~/scripts -type f -name "*.sh" -perm u=x,g-r,o-rwx
u=x
→ owner executeg-r
→ group not readableo-rwx
→ no permissions for others
Small detail:find
matches exact permissions set; for bitwise tests, prefix with-
(e.g.,-perm -u=x
).
Empty Files and Directories
Cleanup tasks often focus on placeholders or abandoned staging dirs:
# Files of zero size
find . -type f -empty
# Directories with no contents
find . -type d -empty
Note, on ext* filesystems, inodes aren’t immediately reclaimed even after deleting empty directories if a process holds a reference (e.g., bash PWD).
Scripting and Automation: Old Backup Rotation
A practical retention policy—delete .bak
files older than 90 days. Throw this into a crontab:
BACKUP_DIR="/backups"
find "$BACKUP_DIR" -type f -name "*.bak" -mtime +90 -delete
Starting from GNU findutils 4.3.3, -delete
is atomic and more robust than -exec rm
, but dangerous if combined with -exec
elsewhere. Always validate match set first:
find "$BACKUP_DIR" -type f -name "*.bak" -mtime +90 -print
Practical hint: For production systems, log deletions or integrate with monitoring—unexpected deletions rarely go unnoticed.
Non-Obvious: Handling Special Characters with xargs
This pattern crops up in media management or batch processing:
find . -type f -name '*.mp3' -print0 | xargs -0 ffmpeg -i
-print0
/-0
ensures filenames with spaces or newlines are processed correctly; avoid plain pipes (|
) with vanilla xargs, which splits on whitespace.
Not perfect: some tools still mishandle edge-cases with bizarre filenames or NUL bytes.
Preview—Don’t Destroy
Destructive operations should always be previewed, especially across large or critical directories. Either use -print
or -ls
:
find ~/logs -type f -name "*.old" -mtime +365 -ls
Not all shell escapes or pipeline tricks are flawless—expect occasional surprises.
Recap
find
can replace brittle ad-hoc scripts, but misuse leads to catastrophic data loss. Always preview matches, be cautious with batch deletes, and leverage batching for performance. Ignore shell-escaping edge cases at your own risk.
Alternative: For huge datasets or parallelism, fd
(https://github.com/sharkdp/fd) offers safer defaults, but lacks some depth.
No one’s mastered every find
corner case, but knowing where it stumbles is half the battle.