Misframe

Oct 2, 2021

Editing CSV files with SQLite

If you know more SQL than Excel, use SQLite to edit CSV files!

I had to work on a task recently that involved filtering two CSVs and joining them in different ways. I have some Excel formula knowledge but I am much better at SQL, and this was a great problem for SQL to solve. So I decided to use my favorite SQL database: SQLite.

Here’s an example CSV with names and addresses (adapted from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html):

First name,Last name,Street,City,State,ZIP
John,Doe,120 jefferson st.,Riverside,NJ,08075
Jack,McGinnis,220 hobo Av.,Phila,PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside,NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD,91234
,Blankman,,SomeTown,SD,00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123

I saved that as addresses.csv in my Downloads directory.

You can load it into SQLite interactively. First you need to switch to CSV mode with .mode csv and then import the file with .import:

$ sqlite3
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> .mode csv
sqlite> .import /Users/preetam/Downloads/addresses.csv addresses
sqlite> select * from addresses;
John,Doe,"120 jefferson st.",Riverside,NJ,08075
Jack,McGinnis,"220 hobo Av.",Phila,PA,09119
"John ""Da Man""",Repici,"120 Jefferson St.",Riverside,NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD,91234
"",Blankman,"",SomeTown,SD,00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc","Desert City",CO,00123

You can switch to column mode with headers to make things easier to read.

sqlite> .mode columns
sqlite> .header on
sqlite> select * from addresses;
First name  Last name   Street             City        State       ZIP       
----------  ----------  -----------------  ----------  ----------  ----------
John        Doe         120 jefferson st.  Riverside   NJ          08075     
Jack        McGinnis    220 hobo Av.       Phila       PA          09119     
John "Da M  Repici      120 Jefferson St.  Riverside   NJ          08075     
Stephen     Tyler       7452 Terrace "At   SomeTown    SD          91234     
            Blankman                       SomeTown    SD          00298     
Joan "the   Jet         9th, at Terrace p  Desert Cit  CO          00123     

Now I have a full interactive SQL engine I can use to work with this CSV. If I load other CSVs as other tables I can do complex JOINs and aggregations much faster than it would take me to learn the same Excel functions.

When you have to write out your results to another CSV, simply switch back to CSV mode and set an output file.

sqlite> .headers on
sqlite> .mode csv
sqlite> .output /Users/preetam/Downloads/addresses_modified.csv
sqlite> select "First name" || " " || "Last name" as "Full name", Street, City, State, ZIP from addresses;
sqlite> .quit

Now addresses_modified.csv looks like

"Full name",Street,City,State,ZIP
"John Doe","120 jefferson st.",Riverside,NJ,08075
"Jack McGinnis","220 hobo Av.",Phila,PA,09119
"John ""Da Man"" Repici","120 Jefferson St.",Riverside,NJ,08075
"Stephen Tyler","7452 Terrace ""At the Plaza"" road",SomeTown,SD,91234
" Blankman","",SomeTown,SD,00298
"Joan ""the bone"", Anne Jet","9th, at Terrace plc","Desert City",CO,00123
Next read these: