Awk is a command for doing things with text files. We give it a text file and it can do an action on each line. So we can print the whole line, or part of the line, or reorder parts.
As a simple example, here’s a CSV of people’s last name, first name, age, and nationality:
Branch, Polly, 28, Romanian
Moore, Luis, 25, Uruguayan
Conley, Julia, 37, Luxembourger
We want to run Awk and have it print their First and Last names, i.e
Polly Branch
Luis Moore
Julia Conley
Let’s do it. Firstly, download humans.csv to your Downloads folder. Then open a terminal and go to Downloads:
cd ~/Downloads
ls humans.csv
This should print ‘humans.csv’ if the file exists.
To use Awk, you type awk
, then tell it when you want it to do inside single quotes, then give it a file. The simplest example is telling it to print the whole file, like this:
awk '{print $0}' humans.csv
Your Awk ‘program’ here is the part inside quotes. Awk programs usually start with {
and end with }
. We need the quotes because command line needs to know to include the spaces. Whatever is inside the curly braces will be run for each line in the file.
So print
means ‘print’, and $0
means ‘the entire line’. But we just want to print part of the line. Awk splits the line into pieces, we can access the pieces with $1
, $2
, etc. So if we just want to print the first word on each line:
awk '{print $1}' humans.csv
This prints:
Branch,
Moore,
Conley,
They have commas after them because by default Awk splits using spaces. We can tell it we’re using commas by passing a -F
argument, like this:
awk -F, '{print $1}' humans.csv
And now we just print the last names!
Next we want to print the first name and the last name. If we run this:
awk -F, '{print $2 $1}' humans.csv
then something a bit weird happens.
PollyBranch
LuisMoore
JuliaConley
print
ignores the spaces we put between $2
and $1
. To make it put a space we can either use a comma, or put a space in quotes.
Using comma:
awk -F, '{print $2,$1}' humans.csv
or putting a space in quotes:
awk -F, '{print $2 " " $1}' humans.csv
This example says to Awk: “print the first name, then a space, then the last name”.
Polly Branch
Luis Moore
Julia Conley
Done!
…Almost. There is still an extra space at the start of the line. This is because the example CSV had commas and spaces, and we are telling Awk to split on commas. So when it splits a line, it splits it into: “Branch”, ” Polly”, ” 28″, and ” Romanian”.
So for this particular CSV we actually want to split using “comma followed by space”.
We can use -F', '
again:
awk -F', ' '{print $2 " " $1}' humans.csv
Next we’ll look at some more complex actions, and how to match only certain lines.
References
Inspired by https://gregable.com/2010/09/why-you-should-know-just-little-awk.html
GNU Getting Started with Awk