4. Grep/Awk/Sed¶
Materials to download¶
Grep¶
Grep (Global Regular Expression Print) finds a string in a given file or input.
grep [options] [regexp] [filename]
Usecases¶
Case-insensitive search (
grep -i
):
grep -i 'mary' mary-lamb.txt
Whole-word search (
grep -w
):
grep -w 'as' mary-lamb.txt
Inverted search (
grep -v
):
grep -v ‘the’ mary-lamb.txt
Print additional (trailing) context lines after match (
grep -A <NUM>
):
grep -A1 'eager' mary-lamb.txt
Print additional (leading) context lines before match (
grep -B <NUM>
):
grep -B2 'fleece' mary-lamb.txt
Print additional (leading and trailing) context lines before and after the match (
grep -C <NUM>
):
grep -C3 'appear' mary-lamb.txt
Exercises¶
Display all the lines of the file mary-lamb.txt that do NOT contain the word lamb.
Display only those lines of the file mary-lamb.txt that contain the word he in them. The search should NOT be sensitive to case.
Display only those lines of the file mary-lamb.txt which contain either lamb or Mary words in the them. The search should not be sensitive to case.
AWK¶
Named after the authors: Aho, Weinberger, Kernighan
awk [options] [filename]
Usecases¶
Print everything in the text file:
awk '{print}' BRITE_students.txt
Now, let’s get the more specific. Let’s ask for first names only:
awk '{print $1}' BRITE_students.txt
What if we want to see two columns at the same time (e.g. first and last names)?
awk '{print $1" "$2}' BRITE_students.txt
Now let’s see what your info is (exact match):
awk '$1=="Anastasia"' BRITE_students.txt
How can we see a particular pattern in our cohort (e.g. students in Campbell lab)?
awk '/Campbell/ {print $0}' BRITE_students.txt
How many students are there whose name begins with “B”?
awk '/B/{++cnt} END {print "Count = ", cnt}' BRITE_students.txt
Exercises¶
How do you print the first name and faculty advisor of students whose last names contain the letter u (file BRITE_students.txt)?
SED¶
SED stands for “Stream EDitor”. It is a widely used text processing Linux tool.
sed [options] [filename]
Usecases¶
Replacing or substituting string:
sed
command is mostly used to replace the text in a file. The below simplesed
command replaces the word “unix” with “linux” in the file.
sed 's/unix/linux/' geekfile.txt
Here the s
specifies the substitution operation. The /
are delimiters. The unix
is the search pattern and the linux
is the replacement string.
By default, the sed
command replaces the first occurrence of the pattern in each line and it won’t replace the second, third, …occurrence in the line.
Replacing the nth occurrence of a pattern in a line: Use the
/1
,/2
etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the wordunix
withlinux
in a line.
sed 's/unix/linux/2' geekfile.txt
Replacing all the occurrence of the pattern in a line: The substitute flag
/g
(global replacement) specifies thesed
command to replace all the occurrences of the string in the line.
sed 's/unix/linux/g' geekfile.txt
Replacing from nth occurrence to all occurrences in a line: Use the combination of
/1
,/2
, etc. and/g
to replace all the patterns from the nth occurrence of a pattern in a line. The followingsed
command replaces the third, fourth, fifth, …unix
word withlinux
word in a line.
sed 's/unix/linux/3g' geekfile.txt
Replacing string on a specific line number: You can restrict the
sed
command to replace the string on a specific line number. An example is:
sed '3 s/unix/linux/' geekfile.txt
The above sed
command replaces the string only on the third line.
Deleting lines from a particular file:
sed
command can also be used for deleting lines from a particular file. To Delete a particular line, e.g. 4 in this example:
sed '4d' geekfile.txt
To delete a last line:
sed '$d' geekfile.txt
To delete 2-4 lines:
sed '2,4d' geekfile.txt
To delete 3-last lines:
sed '3,$d' geekfile.txt
Exercises¶
Replace word Mary with Maria in the file mary-lamb.txt.
Remove the 1st, 2nd and 5th lines from the file mary-lamb.txt.