4. Grep/Awk/Sed¶
Materials to download¶
Grep¶
Grep (Global Regular Expression Print) finds a string in a given file or input.
grep [options] [regexp] [filename]
Usecases¶
Case-insensitive search (
grep -i):
grep -i 'mary' mary-lamb.txt
Whole-word search (
grep -w):
grep -w 'as' mary-lamb.txt
Inverted search (
grep -v):
grep -v ‘the’ mary-lamb.txt
Print additional (trailing) context lines after match (
grep -A <NUM>):
grep -A1 'eager' mary-lamb.txt
Print additional (leading) context lines before match (
grep -B <NUM>):
grep -B2 'fleece' mary-lamb.txt
Print additional (leading and trailing) context lines before and after the match (
grep -C <NUM>):
grep -C3 'appear' mary-lamb.txt
Exercises¶
Display all the lines of the file mary-lamb.txt that do NOT contain the word lamb.
Display only those lines of the file mary-lamb.txt that contain the word he in them. The search should NOT be sensitive to case.
Display only those lines of the file mary-lamb.txt which contain either lamb or Mary words in the them. The search should not be sensitive to case.
AWK¶
Named after the authors: Aho, Weinberger, Kernighan
awk [options] [filename]
Usecases¶
Print everything in the text file:
awk '{print}' BRITE_students.txt
Now, let’s get the more specific. Let’s ask for first names only:
awk '{print $1}' BRITE_students.txt
What if we want to see two columns at the same time (e.g. first and last names)?
awk '{print $1" "$2}' BRITE_students.txt
Now let’s see what your info is (exact match):
awk '$1=="Anastasia"' BRITE_students.txt
How can we see a particular pattern in our cohort (e.g. students in Campbell lab)?
awk '/Campbell/ {print $0}' BRITE_students.txt
How many students are there whose name begins with “B”?
awk '/B/{++cnt} END {print "Count = ", cnt}' BRITE_students.txt
Exercises¶
How do you print the first name and faculty advisor of students whose last names contain the letter u (file BRITE_students.txt)?
SED¶
SED stands for “Stream EDitor”. It is a widely used text processing Linux tool.
sed [options] [filename]
Usecases¶
Replacing or substituting string:
sedcommand is mostly used to replace the text in a file. The below simplesedcommand replaces the word “unix” with “linux” in the file.
sed 's/unix/linux/' geekfile.txt
Here the s specifies the substitution operation. The / are delimiters. The unix is the search pattern and the linux is the replacement string.
By default, the sed command replaces the first occurrence of the pattern in each line and it won’t replace the second, third, …occurrence in the line.
Replacing the nth occurrence of a pattern in a line: Use the
/1,/2etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the wordunixwithlinuxin a line.
sed 's/unix/linux/2' geekfile.txt
Replacing all the occurrence of the pattern in a line: The substitute flag
/g(global replacement) specifies thesedcommand to replace all the occurrences of the string in the line.
sed 's/unix/linux/g' geekfile.txt
Replacing from nth occurrence to all occurrences in a line: Use the combination of
/1,/2, etc. and/gto replace all the patterns from the nth occurrence of a pattern in a line. The followingsedcommand replaces the third, fourth, fifth, …unixword withlinuxword in a line.
sed 's/unix/linux/3g' geekfile.txt
Replacing string on a specific line number: You can restrict the
sedcommand to replace the string on a specific line number. An example is:
sed '3 s/unix/linux/' geekfile.txt
The above sed command replaces the string only on the third line.
Deleting lines from a particular file:
sedcommand can also be used for deleting lines from a particular file. To Delete a particular line, e.g. 4 in this example:
sed '4d' geekfile.txt
To delete a last line:
sed '$d' geekfile.txt
To delete 2-4 lines:
sed '2,4d' geekfile.txt
To delete 3-last lines:
sed '3,$d' geekfile.txt
Exercises¶
Replace word Mary with Maria in the file mary-lamb.txt.
Remove the 1st, 2nd and 5th lines from the file mary-lamb.txt.