A little Awk

Awk is a command for doing things with text files. We give it a text file and it can do an action on each line. So we can print the whole line, or part of the line, or reorder parts.

As a simple example, here’s a CSV of people’s last name, first name, age, and nationality:
Branch, Polly, 28, Romanian
Moore, Luis, 25, Uruguayan
Conley, Julia, 37, Luxembourger

We want to run Awk and have it print their First and Last names, i.e
Polly Branch
Luis Moore
Julia Conley

Let’s do it. Firstly, download humans.csv to your Downloads folder. Then open a terminal and go to Downloads:

cd ~/Downloads
ls humans.csv

This should print ‘humans.csv’ if the file exists.

To use Awk, you type awk, then tell it when you want it to do inside single quotes, then give it a file. The simplest example is telling it to print the whole file, like this:

awk '{print $0}' humans.csv

Your Awk ‘program’ here is the part inside quotes. Awk programs usually start with { and end with }. We need the quotes because command line needs to know to include the spaces. Whatever is inside the curly braces will be run for each line in the file.

So print means ‘print’, and $0 means ‘the entire line’. But we just want to print part of the line. Awk splits the line into pieces, we can access the pieces with $1, $2, etc. So if we just want to print the first word on each line:

awk '{print $1}' humans.csv

This prints:
Branch,
Moore,
Conley,

They have commas after them because by default Awk splits using spaces. We can tell it we’re using commas by passing a -F argument, like this:
awk -F, '{print $1}' humans.csv

And now we just print the last names!

Next we want to print the first name and the last name. If we run this:
awk -F, '{print $2 $1}' humans.csv

then something a bit weird happens.
PollyBranch
LuisMoore
JuliaConley

print ignores the spaces we put between $2 and $1. To make it put a space we can either use a comma, or put a space in quotes.

Using comma:
awk -F, '{print $2,$1}' humans.csv

or putting a space in quotes:
awk -F, '{print $2 " " $1}' humans.csv

This example says to Awk: “print the first name, then a space, then the last name”.

Polly Branch
Luis Moore
Julia Conley

Done!

…Almost. There is still an extra space at the start of the line. This is because the example CSV had commas and spaces, and we are telling Awk to split on commas. So when it splits a line, it splits it into: “Branch”, ” Polly”, ” 28″, and ” Romanian”.

So for this particular CSV we actually want to split using “comma followed by space”.

We can use -F', ' again:
awk -F', ' '{print $2 " " $1}' humans.csv

Next we’ll look at some more complex actions, and how to match only certain lines.

References

Inspired by https://gregable.com/2010/09/why-you-should-know-just-little-awk.html
GNU Getting Started with Awk