Python CSV finds the string and passes the column number to the variable

advertisements

I just joined here after reading a ton of info over the last few months as I get grounds with Python.

Anyway, I'm very new and have been researching as much as possible but most of the answers are a bit out of my reach in understanding and don't seem to do exactly what I need.

From the reading I've done, I'm not sure if I should familiarize myself with Panda or not, but I basically need to do simple formatting, conversion and re-organization of an ALE file. An ALE is a simple tab-delimited list file that contains video clip names and metadata. The headers are located on row 8 and content data on 11 and down. Here's an example:

1 Heading
2 FIELD_DELIM   TABS
3 VIDEO_FORMAT  1080
4 AUDIO_FORMAT  48khz
5 FPS   23.976
6
7 Column
8 #### COLUMN HEADERS ####
9
10 Data
11 #### TAB DELIMITED DATA ####

For now, we'll assume my input files have been preformatted to strip rows 1-7, 9 and 10, so we just have a header row as row 1, and data starts on row 2.

My first task with this program is to convert an entire column of data into a new format, which I have working correctly, but only if I target the column specifically that I am looking for in a data set that has no headings.

for row in ale_file:
    row[3] = timecode_to_frames(row[3])
    print row

The problem is, I don't always know what column numbers the data exists in (as each program will output the metadata in different orders) but I do know what the header name is. Somehow I need to read the header row, and when it finds the three headers named "start", "end", and "duration", it will pass those column numbers to a variable. Then, in the for loop above, I would be able to run my timecode_to_frames function on the row numbers that match the headers.

I feel this should be fairly simple along these lines (forgive me if I'm horribly off):

for row in ale_file:
    for col in row:
        if col == 'start':
            start_col = ##column number##

Then in my existing code I could call the variable in:

for row in ale_file:
    row[start_col] = timecode_to_frames(row[start_col])
    print row

Side note: In my FOR loop, do I need to explicitly skip row 1 since it's just a header, as it won't have the properly formatted data the function is expecting. Perhaps nest the for loop in a while loop like while row != 0: or something?

Any help would be greatly appreciated, thanks!


If all you need is columnHeader along with respective columnValue, you can read 1st line (header) before the loop from file, and inside the loop use zip(header, row) to get tuple of (columnHeader, columnValue).

https://docs.python.org/2/library/functions.html#zip