I had to work on a task recently that involved filtering two CSVs and joining them in different ways. I have some Excel formula knowledge but I am much better at SQL, and this was a great problem for SQL to solve. So I decided to use my favorite SQL database: SQLite.
Here’s an example CSV with names and addresses (adapted from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html):
First name,Last name,Street,City,State,ZIP
John,Doe,120 jefferson st.,Riverside,NJ,08075
Jack,McGinnis,220 hobo Av.,Phila,PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside,NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD,91234
,Blankman,,SomeTown,SD,00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
I saved that as addresses.csv
in my Downloads directory.
You can load it into SQLite interactively. First you need to switch to
CSV mode with .mode csv
and then import the file with .import
:
$ sqlite3
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .mode csv
sqlite> .import /Users/preetam/Downloads/addresses.csv addresses
sqlite> select * from addresses;
John,Doe,"120 jefferson st.",Riverside,NJ,08075
Jack,McGinnis,"220 hobo Av.",Phila,PA,09119
"John ""Da Man""",Repici,"120 Jefferson St.",Riverside,NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD,91234
"",Blankman,"",SomeTown,SD,00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc","Desert City",CO,00123
You can switch to column mode with headers to make things easier to read.
sqlite> .mode columns
sqlite> .header on
sqlite> select * from addresses;
First name Last name Street City State ZIP
---------- ---------- ----------------- ---------- ---------- ----------
John Doe 120 jefferson st. Riverside NJ 08075
Jack McGinnis 220 hobo Av. Phila PA 09119
John "Da M Repici 120 Jefferson St. Riverside NJ 08075
Stephen Tyler 7452 Terrace "At SomeTown SD 91234
Blankman SomeTown SD 00298
Joan "the Jet 9th, at Terrace p Desert Cit CO 00123
Now I have a full interactive SQL engine I can use to work with this CSV. If I load other CSVs as other tables I can do complex JOINs and aggregations much faster than it would take me to learn the same Excel functions.
When you have to write out your results to another CSV, simply switch back to CSV mode and set an output file.
sqlite> .headers on
sqlite> .mode csv
sqlite> .output /Users/preetam/Downloads/addresses_modified.csv
sqlite> select "First name" || " " || "Last name" as "Full name", Street, City, State, ZIP from addresses;
sqlite> .quit
Now addresses_modified.csv
looks like
"Full name",Street,City,State,ZIP
"John Doe","120 jefferson st.",Riverside,NJ,08075
"Jack McGinnis","220 hobo Av.",Phila,PA,09119
"John ""Da Man"" Repici","120 Jefferson St.",Riverside,NJ,08075
"Stephen Tyler","7452 Terrace ""At the Plaza"" road",SomeTown,SD,91234
" Blankman","",SomeTown,SD,00298
"Joan ""the bone"", Anne Jet","9th, at Terrace plc","Desert City",CO,00123