3. Grep/Awk/Sed¶
Materials to download¶
Grep¶
Grep (an acronym for “Global Regular Expression Print”), finds a string in a given file or input.
Grep format:
grep [options] [regexp] [filename]
Grep usecases:
- Case-insensitive search (grep -i):
grep -i 'mary' mary-lamb.txt
- Whole-word search (grep -w):
grep -w 'as' mary-lamb.txt
- recursively search through sub-folders (grep -r <pattern> <path>):
grep -r '456' /<your_working_directory>/
- Inverted search (grep -v):
grep -v ‘the’ mary-lamb.txt
- Print additional (trailing) context lines after match (grep -A <NUM>):
grep -A1 'School' mary-lamb.txt
- Print additional (leading) context lines before match (grep -B <NUM>):
grep -B2 'School' mary-lamb.txt
- Print additional (leading and trailing) context lines before and after the match (grep -C <NUM>):
grep -C3 'School' mary-lamb.txt
- Print the filename for each match (grep -H <pattern> filename):
grep -H 'School' mary-lamb.txt
Regexp or regular expression:
Regexp is how we specify that we find to see a particular pattern (it could be words or characters).
- The period
.
matches any single character. *
when the previous pattern could be matched zero or more times.
grep 'M.a' mary-lamb.txt
grep 'M*y' Mary_Lamb_lyrics.txt
AWK:¶
awk [options] [filename]
Named after the authors: Aho, Weinberger, Kernighan
- Print everything in the text file:
awk '{print}' BRITE_students.txt
- Now, let’s get the more specific. Let’s ask for names only:
awk '{print $1}' BRITE_students.txt
- What if we want to see two columns at the same time, let’s say name and GPA?
awk '{print $1" "$3}' BRITE_students.txt
- Now what let’s see what your info is (exact match):
awk '$1=="Ali"' BRITE_students.txt
- How can we see a particular pattern in our cohort (not an exact match):
awk '/Kat/ {print $0}' BRITE_students.txt
- Question for you: How do you print the name and favorite sport of students whose names contain the letter “u”?
<insert code here>
- How many students are there whose name begins with “Kat”?
awk '/Kat/{++cnt} END {print "Count = ", cnt}' BRITE_students.txt
- You could also run loops in awk, print all :
awk 'BEGIN {
sum = 0; for (i = 0; i < 20; ++i) {
sum += i; if (sum > 50) exit(10); else print "Sum =", sum
}
}'
SED:¶
sed [options] [filename]
SED stands for “Stream EDitor”. It is a widely used text processing Linux tool.
- I want to read the first three lines of a text file:
cat BRITE_students.txt | sed -n 3p
- What if we want to replace one word with another:
cat mary-lamb.txt | sed 's/Mary/Maria/g'
- Let’s remove the 1st, 2nd and 5th lines from a text file:
sed -e '1d' -e '2d' -e '5d' BRITE_students.txt
- But what if we had a much longer list and wanted to remove more lines?
echo -e "1d\n2d\n5d" > my_lines.txt
cat my_lines.txt
sed -f my_lines.txt BRITE_students.txt
- Now let’s print the 2nd line to last:
cat BRITE_students.txt | sed -n 2,'$p'