Interview Scripting (Bash, Groovy)

Why must input be sorted before uniq, and what does uniq -c do? [Advanced]

Answer

uniq compares only neighboring lines, so input must be sorted if I want global duplicate counts. uniq -c prefixes each group with the number of repeated adjacent lines.

Technical explanation

Without sorting, the same value appearing in separate parts of a file will be counted as multiple groups.

The typical frequency pipeline is sort | uniq -c | sort -rn.

If preserving original order matters, use awk with a map instead of sorting.

Hands-on example

printf '%s\n' b a b a a | uniq -c

# counts only adjacent duplicates

printf '%s\n' b a b a a | sort | uniq -c | sort -rn

# global frequency counts

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Scripting (Bash, Groovy) interview questions

← All Scripting (Bash, Groovy) questions