How to import a CSV file into MySQL by adding NULLS for empty fields?

advertisements

I have a CSV file from the US Census that looks like this:

"ZIP5","ZIP4","ZIP9","STATE CODE","STATE","COUNTY CODE","COUNTY NAME","CBSA CODE","CBSA  TITLE","CBSA LSAD","METRO DIVISION CODE","METRO DIVISION TITLE","METRO DIVISION LSAD","CSA   CODE","CSA TITLE","CSA LSAD"
"04841",,"04841","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04843",,"04843","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan     Statistical Area",,,,,,
"04846",,"04846","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04847",,"04847","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04848",,"04848","23","ME","027","Waldo County",,,,,,,,,
"04849",,"04849","23","ME","027","Waldo County",,,,,,,,,
"04850",,"04850","23","ME","027","Waldo County",,,,,,,,,
"04851",,"04851","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04852",,"04852","23","ME","015","Lincoln County",,,,,,,,,

The file has over 2 million records. Most of the records don't have data in all the fields.

Here is the MySQL record layout I defined for the above CSV file:

+----------------------+------------------+------+-----+---------+----------------+
| Field                | Type             | Null | Key | Default | Extra          |
+----------------------+------------------+------+-----+---------+----------------+
| id                   | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| ZIP5                 | varchar(5)       | NO   |     | NULL    |                |
| ZIP4                 | varchar(5)       | NO   |     | NULL    |                |
| ZIP9                 | varchar(10)      | NO   |     | NULL    |                |
| STATE_CODE           | varchar(2)       | NO   |     | NULL    |                |
| STATE                | varchar(2)       | NO   |     | NULL    |                |
| COUNTY_CODE          | varchar(3)       | NO   |     | NULL    |                |
| COUNTY_NAME          | varchar(50)      | NO   |     | NULL    |                |
| CBSA_CODE            | varchar(5)       | NO   |     | NULL    |                |
| CBSA_TITLE           | varchar(50)      | NO   |     | NULL    |                |
| CBSA_LSAD            | varchar(50)      | NO   |     | NULL    |                |
| METRO_DIVISION_CODE  | varchar(5)       | NO   |     | NULL    |                |
| METRO_DIVISION_TITLE | varchar(50)      | NO   |     | NULL    |                |
| METRO_DIVISION_LSAD  | varchar(50)      | NO   |     | NULL    |                |
| CSA_CODE             | varchar(3)       | NO   |     | NULL    |                |
| CSA_TITLE            | varchar(50)      | NO   |     | NULL    |                |
| CSA_LSAD             | varchar(50)      | NO   |     | NULL    |                |
+----------------------+------------------+------+-----+---------+----------------+

(I just realized I should define ZIP5 as a Primary key?)

I have read that if you have an empty field in a CSV file, you should change it to \N, but is there a way to do this easily? I could write a PHP program to do this, but with over 2 million records it would take a very long time and my server doesn't have a lot of RAM.

How can I import this CSV file to MySQL successfully the easiest way? Are there some parameters on the LOAD command in MySQL that would do this? The way it works now, it complains that ZIP5 has data truncation and when I look in MySQL it has quotes in the zip code and only the first 4 digits. Thanks!


For a start, I see no primary key on the table you post above. First must always have a primary key. Usually we add a column called id with AUTOINCREMENT. For Zip codes and stuff it's also convenient to describe a complex key of 2-3 columns. As always depends on the circumstances.

As for the import. You have some solutions

  1. Run the script locally to generate SQL insert statements and then feed the data to the mysql server through any interface you have available.

  2. upload the CSV file to the server and use the command line mysql to import the CSV. MySQL has a built in CSV importer, though I never liked it ;)

  3. Run the script on the server and add a row at a time. In PHP you can either load the CSV line by line and INSERT on each line (remember to set_time_limit and memory_limit accordingly). Reminder, for step3, if you run it through a browser and not through command line then your browser will most probably time out. Rest assured through the script will not stop running until it's over.

I think I have a CSV importer (for HUGE CSV files - like geotagging). Let me know if you need it, I might be able to find it and post it here.

Unfortunately I could not find my csv importer. But looking at the php manual's first entry for fgetcsv and with a couple modifications...

set_time_limit(3600); // 1 hour max script execution time. Adjust it according to your expectations.
if (($handle = fopen("test.csv", "r")) !== FALSE) {
    // this will automate things but modify the csv head for each column to represent the actual column name in your table.
    $header = fgetcsv($handle, 1000, ",");
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $i = 0;
        $values = array();
        foreach($header as $key) {
            if (!empty($data[$i])) {
               $values[$key] = $data[$i];
           }
        }

        $keys = "`" . implode("`, `", array_keys($values)) . "`";
        $values = "'" . implode("', '", $values) . "'";
        $statement = "INSERT INTO `table_name` ({$keys}) VALUES ({$values})";
        // run the statement. the above is if you don't use PDO. For PDO transform accordingly. $values holds the column_name => value pairs. The values that can be null and should not be inserted you should give them default values in your mysql schema (table)
    }
    fclose($handle);
}

I hope this helps. Haven't tested it but looks ok ;)