4. Grep/Awk/Sed

Grep

Grep (Global Regular Expression Print) finds a string in a given file or input.

grep [options] [regexp] [filename]

Usecases

  • Case-insensitive search (grep -i):

grep -i 'mary' mary-lamb.txt
  • Whole-word search (grep -w):

grep -w 'as' mary-lamb.txt
  • Inverted search (grep -v):

grep -v ‘the’ mary-lamb.txt
  • Print additional (trailing) context lines after match (grep -A <NUM>):

grep -A1 'eager'  mary-lamb.txt
  • Print additional (leading) context lines before match (grep -B <NUM>):

grep -B2 'fleece'  mary-lamb.txt
  • Print additional (leading and trailing) context lines before and after the match (grep -C <NUM>):

grep -C3 'appear' mary-lamb.txt

Exercises

  • Display all the lines of the file mary-lamb.txt that do NOT contain the word lamb.

  • Display only those lines of the file mary-lamb.txt that contain the word he in them. The search should NOT be sensitive to case.

  • Display only those lines of the file mary-lamb.txt which contain either lamb or Mary words in the them. The search should not be sensitive to case.

AWK

Named after the authors: Aho, Weinberger, Kernighan

awk [options] [filename]

Usecases

  • Print everything in the text file:

awk '{print}' BRITE_students.txt
  • Now, let’s get the more specific. Let’s ask for first names only:

awk '{print $1}' BRITE_students.txt
  • What if we want to see two columns at the same time (e.g. first and last names)?

awk '{print $1" "$2}' BRITE_students.txt
  • Now let’s see what your info is (exact match):

awk '$1=="Anastasia"' BRITE_students.txt
  • How can we see a particular pattern in our cohort (e.g. students in Campbell lab)?

awk '/Campbell/ {print $0}' BRITE_students.txt
  • How many students are there whose name begins with “B”?

awk '/B/{++cnt} END {print "Count = ", cnt}' BRITE_students.txt

Exercises

  • How do you print the first name and faculty advisor of students whose last names contain the letter u (file BRITE_students.txt)?

SED

SED stands for “Stream EDitor”. It is a widely used text processing Linux tool.

sed [options] [filename]

Usecases

  • Replacing or substituting string: sed command is mostly used to replace the text in a file. The below simple sed command replaces the word “unix” with “linux” in the file.

sed 's/unix/linux/' geekfile.txt

Here the s specifies the substitution operation. The / are delimiters. The unix is the search pattern and the linux is the replacement string.

By default, the sed command replaces the first occurrence of the pattern in each line and it won’t replace the second, third, …occurrence in the line.

  • Replacing the nth occurrence of a pattern in a line: Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word unix with linux in a line.

sed 's/unix/linux/2' geekfile.txt
  • Replacing all the occurrence of the pattern in a line: The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.

sed 's/unix/linux/g' geekfile.txt
  • Replacing from nth occurrence to all occurrences in a line: Use the combination of /1, /2, etc. and /g to replace all the patterns from the nth occurrence of a pattern in a line. The following sed command replaces the third, fourth, fifth, … unix word with linux word in a line.

sed 's/unix/linux/3g' geekfile.txt
  • Replacing string on a specific line number: You can restrict the sed command to replace the string on a specific line number. An example is:

sed '3 s/unix/linux/' geekfile.txt

The above sed command replaces the string only on the third line.

  • Deleting lines from a particular file: sed command can also be used for deleting lines from a particular file. To Delete a particular line, e.g. 4 in this example:

sed '4d' geekfile.txt
  • To delete a last line:

sed '$d' geekfile.txt
  • To delete 2-4 lines:

sed '2,4d' geekfile.txt
  • To delete 3-last lines:

sed '3,$d' geekfile.txt

Exercises

  • Replace word Mary with Maria in the file mary-lamb.txt.

  • Remove the 1st, 2nd and 5th lines from the file mary-lamb.txt.