3. Grep/Awk/Sed


Grep

Grep (an acronym for “Global Regular Expression Print”), finds a string in a given file or input.

Grep format:

grep [options] [regexp] [filename]

Grep usecases:

  1. Case-insensitive search (grep -i):
grep -i 'mary' mary-lamb.txt
  1. Whole-word search (grep -w):
grep -w 'as' mary-lamb.txt
  1. recursively search through sub-folders (grep -r <pattern> <path>):
grep -r '456' /<your_working_directory>/
  1. Inverted search (grep -v):
grep -v ‘the’ mary-lamb.txt
  1. Print additional (trailing) context lines after match (grep -A <NUM>):
grep -A1 'School'  mary-lamb.txt
  1. Print additional (leading) context lines before match (grep -B <NUM>):
grep -B2 'School'  mary-lamb.txt
  1. Print additional (leading and trailing) context lines before and after the match (grep -C <NUM>):
grep -C3 'School' mary-lamb.txt
  1. Print the filename for each match (grep -H <pattern> filename):
grep -H 'School' mary-lamb.txt

Regexp or regular expression:

Regexp is how we specify that we find to see a particular pattern (it could be words or characters).

  • The period . matches any single character.
  • * when the previous pattern could be matched zero or more times.
grep 'M.a' mary-lamb.txt
grep 'M*y' Mary_Lamb_lyrics.txt

AWK:

awk [options] [filename]

Named after the authors: Aho, Weinberger, Kernighan

  • Print everything in the text file:
awk '{print}' BRITE_students.txt

  • Now, let’s get the more specific. Let’s ask for names only:
awk '{print $1}' BRITE_students.txt

  • What if we want to see two columns at the same time, let’s say name and GPA?
awk '{print $1" "$3}' BRITE_students.txt

  • Now what let’s see what your info is (exact match):
awk '$1=="Ali"' BRITE_students.txt

  • How can we see a particular pattern in our cohort (not an exact match):
awk '/Kat/ {print $0}' BRITE_students.txt

  • Question for you: How do you print the name and favorite sport of students whose names contain the letter “u”?
<insert code here>

  • How many students are there whose name begins with “Kat”?
awk '/Kat/{++cnt} END {print "Count = ", cnt}' BRITE_students.txt

  • You could also run loops in awk, print all :
awk 'BEGIN {
   sum = 0; for (i = 0; i < 20; ++i) {
       sum += i; if (sum > 50) exit(10); else print "Sum =", sum
   }
}'

SED:

sed [options] [filename]

SED stands for “Stream EDitor”. It is a widely used text processing Linux tool.

  • I want to read the first three lines of a text file:
cat BRITE_students.txt | sed -n 3p

  • What if we want to replace one word with another:
cat mary-lamb.txt | sed 's/Mary/Maria/g'

  • Let’s remove the 1st, 2nd and 5th lines from a text file:
sed -e '1d' -e '2d' -e '5d' BRITE_students.txt

  • But what if we had a much longer list and wanted to remove more lines?
echo -e "1d\n2d\n5d" > my_lines.txt
cat my_lines.txt
sed -f my_lines.txt BRITE_students.txt

  • Now let’s print the 2nd line to last:
cat BRITE_students.txt | sed -n 2,'$p'