My Linux Upskill Challenge - Day 8
Following the Linux Upskill Challenge: Day 8_
This is my notes for the Day 8 Lesson - Text Analysis Tools
Introduction
This lesson covers the most important Linux commands for viewing, exploring, and manipulating text files. These small but powerful tools are incredibly useful when you need to explore and analyze large text files created by applications, user logs, and other system files.
Sample Log File Setup
For this practice session, I asked ChatGPT to generate an Apache access log sample file with different sources and scenarios to test various combinations and common sysadmin use cases. In real-world scenarios, youβd work with actual access log files.
File location: /var/log/apache2/sample_access.log
192.168.1.10 - - [05/Jun/2025:18:01:01 -0600] "GET / HTTP/1.1" 200 720 "-" "curl/8.4.0"
10.0.0.2 - - [05/Jun/2025:18:01:03 -0600] "GET /index.html HTTP/1.1" 200 1024 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0"
172.16.5.5 - - [05/Jun/2025:18:01:04 -0600] "POST /login HTTP/1.1" 302 512 "http://myserver.com" "Mozilla/5.0 (Linux; Android 11; SM-G973U) Chrome/122.0.0.0 Mobile"
192.168.1.15 - - [05/Jun/2025:18:01:07 -0600] "GET /api/data HTTP/1.1" 200 2048 "-" "PostmanRuntime/7.30.0"
203.0.113.12 - - [05/Jun/2025:18:01:10 -0600] "GET /docs/index.html HTTP/1.1" 200 1560 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0"
192.168.1.10 - - [05/Jun/2025:18:01:14 -0600] "GET / HTTP/1.1" 200 720 "-" "curl/8.4.0"
192.168.1.15 - - [05/Jun/2025:18:01:16 -0600] "GET /favicon.ico HTTP/1.1" 404 230 "-" "PostmanRuntime/7.30.0"
10.0.0.2 - - [05/Jun/2025:18:01:20 -0600] "GET /dashboard HTTP/1.1" 200 1800 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Edge/115.0.0.0"
198.51.100.99 - - [05/Jun/2025:18:01:25 -0600] "GET /admin HTTP/1.1" 403 890 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 12_6_1) Safari/537.36"
10.10.10.1 - - [05/Jun/2025:18:01:28 -0600] "POST /upload HTTP/1.1" 500 10240 "-" "curl/8.3.0"
172.16.5.5 - - [05/Jun/2025:18:01:31 -0600] "GET /api/stats HTTP/1.1" 200 4096 "-" "Mozilla/5.0 (Linux; Android 11) Chrome/123.0.0.0 Mobile"
203.0.113.12 - - [05/Jun/2025:18:01:33 -0600] "GET /dashboard HTTP/1.1" 200 1800 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64) Firefox/121.0"
192.168.1.10 - - [05/Jun/2025:18:01:36 -0600] "GET /secret HTTP/1.1" 401 720 "-" "curl/8.4.0"
198.51.100.99 - - [05/Jun/2025:18:01:40 -0600] "GET /admin HTTP/1.1" 403 890 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 12_6_1) Safari/537.36"
10.10.10.1 - - [05/Jun/2025:18:01:45 -0600] "GET /static/image.png HTTP/1.1" 200 2048 "-" "Wget/1.21"
10.10.10.2 - - [05/Jun/2025:18:01:48 -0600] "GET /static/image.png HTTP/1.1" 200 2048 "-" "Wget/1.21"
192.168.1.15 - - [05/Jun/2025:18:01:50 -0600] "GET /login HTTP/1.1" 200 1024 "-" "Mozilla/5.0 (Windows NT 11.0; Win64; x64) Chrome/130.0.0.0"Core Text Analysis Tools
cat - Display File Contents
Dumps the complete content of a file to the terminal:
cat /var/log/apache2/sample_access.logBest for: Small files or when you need to see the entire content at once.
less - Interactive File Viewer
Opens a text file for interactive navigation with keyboard controls:
less /var/log/apache2/sample_access.logNavigation shortcuts:
- Arrow keys or
j/k: Move up/down line by line gg: Jump to the beginning of the fileG: Jump to the end of the file/pattern: Search for text patternsn/N: Jump to next/previous search resultq: Quit
Best for: Large files where you need to browse through content interactively.
tail - View File Endings
Displays the last part of a file (default: last 10 lines):
# Basic usage - last 10 lines
tail /var/log/apache2/sample_access.log
# Specify number of lines
tail -n 5 /var/log/apache2/sample_access.log
# Follow file in real-time (great for monitoring logs)
tail -f /var/log/apache2/sample_access.logTip: Use
Ctrl+Cto exit the-f(follow) mode.
Best for: Monitoring active log files or checking the most recent entries.
head - View File Beginnings
Similar to tail, but shows the first lines of a file:
# First 10 lines (default)
head /var/log/apache2/sample_access.log
# First 5 lines
head -n 5 /var/log/apache2/sample_access.logPattern Matching and Filtering
grep - Search Text Patterns
Filters lines containing specific text patterns (supports regular expressions):
# Basic search
grep "Mozilla" /var/log/apache2/sample_access.log
# Case-insensitive search
grep -i "curl" /var/log/apache2/sample_access.log
# Search for specific requests
grep "favicon.ico" /var/log/apache2/sample_access.log
# Exclude lines matching a pattern
grep -v "Chrome" /var/log/apache2/sample_access.log
# Show line numbers
grep -n "404" /var/log/apache2/sample_access.logThe Power of Pipes (|)
The pipe symbol | is fundamental to Unix philosophy - it lets you chain commands by passing the output of one command as input to another.
Basic Pipe Examples
# View IP addresses and scroll through them
cut -d" " -f1 /var/log/apache2/sample_access.log | less
# Filter for Chrome browsers, then show only macOS users
grep "Mozilla" /var/log/apache2/sample_access.log | grep "Macintosh"
# Exclude Chrome traffic from Mozilla browsers
grep "Mozilla" /var/log/apache2/sample_access.log | grep -v "Chrome"
# Find Apache processes (system administration example)
ps aux | grep apache2Data Extraction and Manipulation
cut - Extract Columns/Fields
Extracts specific sections from delimited text:
# Extract first field (space-delimited)
cut -d" " -f1 /var/log/apache2/sample_access.log
# Extract multiple fields
cut -d: -f1,3,5 /etc/passwd
# Extract field ranges
cut -d: -f1-3 /etc/passwd # Fields 1 through 3
cut -d: -f4- /etc/passwd # Field 4 to end
# Extract by character position
cut -c1-10 myfile.txt # First 10 charactersExample with /etc/passwd:
# Show usernames only
tail /etc/passwd | cut -d: -f1
# Show username, UID, and home directory
cut -d: -f1,3,6 /etc/passwdsort - Arrange Lines
Orders lines alphabetically or numerically:
# Basic alphabetical sort
sort names.txt
# Numeric sort in reverse order
sort -nr numbers.txt
# Sort by specific column (space-delimited)
sort -k2 file.txt
# Sort CSV by third column numerically
sort -t ',' -k3 -n data.csvuniq - Remove Duplicates
Removes consecutive duplicate lines (usually used with sort):
# Remove duplicates
sort file.txt | uniq
# Count occurrences of each line
sort file.txt | uniq -c
# Show only duplicated lines
sort file.txt | uniq -dAdvanced Text Processing
awk - Pattern Scanning and Data Extraction
Powerful tool for processing structured text data:
# Print specific columns (fields)
awk '{print $1}' /var/log/apache2/sample_access.log # IP addresses
awk '{print $4, $5}' /var/log/apache2/sample_access.log # Date and time
# Filter by conditions
awk '$9 == 404' /var/log/apache2/sample_access.log # 404 errors only
awk '$NF > 1000' /var/log/apache2/sample_access.log # Large responses
# Custom formatting
awk '{print "IP: " $1 ", Status: " $9}' /var/log/apache2/sample_access.logsed - Stream Editor
Search, replace, and edit text on-the-fly:
# Replace text (first occurrence per line)
sed 's/curl/CURL/' /var/log/apache2/sample_access.log
# Replace all occurrences
sed 's/curl/CURL/g' /var/log/apache2/sample_access.log
# Delete lines containing specific text
sed '/curl/d' /var/log/apache2/sample_access.log
# Show only matching lines
sed -n '/Mozilla/p' /var/log/apache2/sample_access.logWarning: Use
sed -i(in-place editing) carefully - it modifies the original file!
Output Redirection
Save Output to Files (>)
Redirect command output to files instead of the terminal:
# Create/overwrite file
ls -ltr > listing.txt
# Append to file
echo "New entry" >> listing.txt
# Redirect errors to file
command 2> errors.log
# Redirect both output and errors
command > output.log 2>&1Practical Log Analysis One-Liners
π Top Visitor IPs
cut -d' ' -f1 /var/log/apache2/sample_access.log | sort | uniq -c | sort -nrπ Request Methods Analysis
cut -d'"' -f2 /var/log/apache2/sample_access.log | cut -d' ' -f1 | sort | uniq -cπ§βπ» User-Agent Analysis
cut -d'"' -f6 /var/log/apache2/sample_access.log | sort | uniqπ HTTP Status Code Breakdown
cut -d'"' -f3 /var/log/apache2/sample_access.log | cut -d' ' -f2 | sort | uniq -cπ Find Failed Access Attempts
grep -E ' (401|403) ' /var/log/apache2/sample_access.logπ Filter by Browser Type
grep -i "chrome" /var/log/apache2/sample_access.logπ Most Requested Paths
cut -d'"' -f2 /var/log/apache2/sample_access.log | cut -d' ' -f2 | sort | uniq -c | sort -nrπ Find 404 Error Paths
grep ' 404 ' /var/log/apache2/sample_access.log | cut -d'"' -f2 | cut -d' ' -f2β° Traffic by Hour
cut -d'[' -f2 /var/log/apache2/sample_access.log | cut -d':' -f2 | sort | uniq -cAdvanced Analysis Examples
Most Active IPs with Request Count
cut -d' ' -f1 /var/log/apache2/sample_access.log | sort | uniq -c | sort -nr | head -10Browsers Causing Most Server Errors
grep " 5[0-9][0-9] " /var/log/apache2/sample_access.log | cut -d'"' -f6 | sort | uniq -c | sort -nrTraffic Analysis by Time Period
cut -d'[' -f2 /var/log/apache2/sample_access.log | cut -d']' -f1 | cut -d':' -f1-2 | sort | uniq -cLarge Response Analysis
awk '$NF > 2000 {print $1, $NF, $7}' /var/log/apache2/sample_access.log | sort -k2 -nrKey Takeaways
These text processing tools become incredibly powerful when combined. The Unix philosophy of βdo one thing wellβ means you can chain simple commands to perform complex analysis tasks. Practice chaining these commands with pipes to build your own custom analysis workflows.
Remember: always test your commands on sample data before running them on important production logs!
External Resources
- Text processing commands
- OSTechNix grep tutorial
- Where GREP came from
- SED onliners
- RegExr - a tool to learn, build and test Regular Expressions
- explainshell.com - write down a command-line to see the help text that matches each argument
Related Notes
- Previous Lesson: My Linux Upskill Challenge: Day 7
- Next Lesson: My Linux Upskill Challenge: Day 9
Nota diaria: 2025-06-05