Mastering find
in Linux: Beyond the Basics
The find
command is sometimes underestimated—treated as a mere tool for file lookup. In reality, it's engineered for system-wide file discovery, batch operations, and ad hoc automation. For sysadmins, DevOps engineers, and anyone working with multi-user or large-scale Linux environments, find
is indispensable.
Below, a collection of usage patterns and techniques—shaped by production incidents, large directory hierarchies, and the need for precision.
Compound Searches: Using Logical Operators
Suppose a filesystem fills up overnight. You're tasked with locating recent, oversized log files—a classic “needle in the haystack” scenario.
find /var/log \( -name "*.log" -a -size +100M \) -o -mtime -7
\(...\)
groups conditions, required for complex logic.-a
(logical AND) between patterns inside the group.-o
(logical OR) splits between groups.- Gotcha: Operator precedence is subtle; missing or misplacing parentheses yields broad (and usually wrong) matches.
- In practice, always test your logic by printing results before destructive actions.
Misleading variant:
find /var/log -name "*.log" -size +100M -mtime -7
This performs an implicit AND—you may miss vital files that only match some of your criteria.
Advanced Pattern Matching: Regular Expressions
Standard globbing (-name
, -iname
) is often blunt. When hunting file variants—generated by different tools, with inconsistent casings—regex becomes essential.
Example: Case-insensitive image scan
find ~/Pictures -regextype posix-extended -regex '.*\.(jpe?g|JPE?G|png|PNG)$'
-regextype posix-extended
: preferable for familiaregrep
-like syntax.- Some distributions (e.g., Ubuntu 22.04) default to
emacs
regex; always specify-regextype
for portable scripts.
Alternative: Multiple patterns with -o
and -iname
—less efficient for large trees, but sometimes clearer.
find ~/Pictures \( -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" \)
Batch Operations: -exec ... {} +
vs. -exec ... {} \;
By default, -exec
spawns a new process for each result ({}
). Use {}
with +
to batch many files per command, minimizing fork overhead.
Example: Bulk compression
find ~/Documents -type f -name "*.txt" -mtime +30 -exec gzip {} +
{} +
appends all matches to a single invocation when possible.- Known issue: Some utilities (e.g., certain legacy scripts) don’t handle many arguments gracefully—test on a sample range first.
Defensive Deletion: Review Before Remove
Deleting files in automated jobs? Occasional disasters start here. Always print or log matches first.
find /tmp -type f -size 0 -print > empty_files.txt
# Inspect, then:
xargs --no-run-if-empty rm < empty_files.txt
--no-run-if-empty
(GNU-specific) prevents accidentalrm
without arguments.- Note: Null-delimiting (
-print0
/-0
) is safer when filenames contain whitespace.
find /tmp -type f -size 0 -print0 | xargs -0 rm -i
Here, -i
(interactive) on rm
is the last line of defense—slows batch cleanup, but prevents a slip.
Permission Scans: Identifying Security Risks
World-writable files and wrong permissions create audit findings—and compromise vectors.
Search for world-writable everywhere:
find / -xdev -perm -0002 -type f 2>/dev/null
-xdev
: Stay within a filesystem (avoiding /proc, NFS, etc).-perm -0002
: Matches any file with others-write bit.- Pipes/redirects needed to silence permission errors in
/proc
and unwritable directories.
To find 755 binaries exactly (no extra sticky/SUID/SGID bits):
find /usr/local/bin -type f -perm 0755
Pruning Unwanted Paths
Version control directories (.git, .svn), caches, or build objects often pollute find
searches.
find /home/user/projects -path "*/.git" -prune -o -name "*.py" -print
-prune
prevents descent into matched directories.- Real-world issue: Prune logic is inverted—files inside
.git
are excluded, but.git
itself is not listed. For multiple patterns, combine with-o
and more-path ... -prune
.
Content Search: Integrating with xargs
and grep
Locating files by content requires piping results—standard shell pattern.
find /etc -type f -name "*.conf" -print0 | xargs -0 grep --color=auto -H "timeout"
- Single dash avoids surprises when filenames start with
-
. - Realistic scenario: Binary/gzipped files trigger grep errors. Use
file
to filter for text files before grepping, at the cost of performance.
Saving and Reusing Result Lists
For batch pipelines, save matches for hand-off.
find ~/Downloads -type f \( -iname "*.mp4" -o -iname "*.avi" \) > /tmp/video_files.lst
# Later:
while IFS= read -r f; do mediainfo "$f"; done < /tmp/video_files.lst
- Note:
IFS=
avoids trimming leading whitespace from file paths. - In case filenames have embedded newlines, use null-terminated lists instead.
Practical Details & Non-Obvious Gotchas
- Filesystem boundaries: Use
-xdev
or-mount
to prevent accidental NFS bloat or crawling/proc
. - Permission errors: Standard for production hosts—redirect stderr or use root judiciously.
- Performance: Recursive descent on millions of files is slow; narrow root paths and prune aggressively.
- Shell globs versus
find
: For small directories,ls
and shell expansion may be faster/simpler;find
only shines with scale or complexity.
At a Glance: Key Options Table
Option | Use Case | Side Effects/Notes |
---|---|---|
-name / -iname | File glob matching | Fast, non-regex |
-regex | Flexible pattern, complex naming conventions | Mind the regex dialect |
-mtime / -ctime | Time-based search (mod/change, days) | Units in 24h blocks |
-exec {} + | Efficient batch command execution | App may fail on very long arg lists |
-prune | Skip certain directories | Must combine carefully with logic |
Side Note: Scripting with find
Isn’t Perfect
On older distributions (e.g., CentOS 7 with GNU findutils 4.5), some of these patterns (notably -regextype
) might not behave as above—test locally before baking into automation. For edge cases, Perl’s File::Find
or fd
(an alternative “find” with more modern UX) may be preferable.
Unlike wrapper tools, find
exposes power and risk in equal measure. Used with care, it shifts routine maintenance from cumbersome to almost trivial, even for environments with tens of millions of files.
For further reading, check out man find
—no substitute for the official documentation, especially if crafting automation in critical systems.
For additional command-line tips focused on reliability and scale, see other entries in this series on file management and Linux automation.