Updated 8 hours ago
Grep asks: "Is this pattern here?"
Awk asks: "What's in column 7? How many times did each IP address appear? What's the average response time for 500 errors?"
That's the difference. Grep finds needles. Awk counts them, sorts them by size, and tells you which haystack they came from.
Awk Sees Structure
When awk reads a line, it automatically splits it into fields. Whitespace becomes invisible boundaries. Suddenly a log entry isn't a string—it's a data structure.
This prints the first field from every line. For syslog, that's the month. $2 is the day, $3 is the timestamp, and so on. $0 is the entire line.
Print multiple fields:
Commas add spaces between fields. Without commas, fields concatenate directly.
Custom Delimiters
Not all logs use whitespace. Specify a different field separator with -F:
This splits on colons. For Apache access logs:
Field 1 is the IP address. Field 7 is the requested URL. The log format determines which field holds what.
Pattern Matching
Awk can filter lines before processing them:
This finds lines containing "error" and prints specific fields from them. The pattern goes in slashes before the action block.
Filter by field values:
This prints URLs that returned 404 errors. Field 9 is the status code in Apache's combined log format.
Associative Arrays: Awk's Superpower
This is where awk leaves grep behind entirely.
Count requests per IP address:
The array count uses IP addresses as keys. Each time an IP appears, its count increments. The END block runs after all lines are processed, printing the totals.
Sort by frequency:
Putting counts first lets sort -nr show the highest values. This answers "who's hitting my server the hardest?" in one line.
Status code distribution:
This shows how many requests returned each status code. Suddenly you can see your error rate at a glance.
Bandwidth per IP:
Field 10 is bytes transferred. This sums bandwidth consumption by IP address—useful for finding who's downloading your entire site.
Counting and Calculating
Awk maintains variables across lines:
This counts error lines. The variable persists, incrementing with each match.
Sum a field:
This totals bytes transferred—your bandwidth usage in one number.
Calculate averages:
Average bytes per request. Combine operations with semicolons.
Error rate as a percentage:
This computes what percentage of requests resulted in 4xx or 5xx errors.
Working with Timestamps
Filter by date:
This shows only January 15th entries.
Filter by hour using regex matching:
The tilde (~) tests if a field matches a pattern. This finds entries from 10:00-10:59.
Find the busiest hour:
The substr function extracts characters—here, the first two characters of the timestamp (the hour).
String Functions
Extract substrings:
First 10 characters of field 1.
Filter by string length:
URLs longer than 100 characters—potentially suspicious requests.
Split fields further:
The split function divides a field by a delimiter into an array.
Formatted Output
Use printf for clean reports:
IP addresses left-aligned in 15 characters, byte counts right-aligned in 10.
With headers:
BEGIN runs before any lines are processed. END runs after all lines are processed.
Combining Conditions
404 errors with response sizes over 1000 bytes—unusual, since 404 pages should be small.
All server errors or 404s.
Saving Complex Programs
For serious analysis, save awk programs to files:
This makes complex logic readable and reusable.
The Mental Model
Awk transforms how you see text files. A log line stops being a string you search through and becomes a row in a database you can query.
The field references ($1, $2, $7) are your column names. Pattern matching is your WHERE clause. Associative arrays are GROUP BY. The END block computes your aggregates.
Once you see logs this way, questions that seemed impossible become one-liners:
- "Which IPs are causing the most 500 errors?" Group by IP, filter by status, count.
- "What's our 95th percentile response time?" Collect values, sort, pick the right index.
- "When did traffic spike?" Group by minute, find the maximum.
Grep finds what you're looking for. Awk answers questions you didn't know you could ask.
Frequently Asked Questions About Awk
Was this page helpful?