Aggregate the list of lists that have a common element and save them in separate text files (python)

advertisements

I have a .txt file with the following data:

header1 header2 header3

173.012 -30.330 19

173.012 -30.349 19

173.012 -30.344 19

173.013 -30.345 21

173.013 -30.343 21

173.013 -30.349 21

173.014 -30.343 22

173.014 -30.325 22

173.014 -30.326 22

173.015 -30.348 24

173.016 -30.336 25

173.016 -30.318 25

173.016 -30.318 25

173.016 -30.318 25

173.016 -30.318 25

173.016 -30.318 25

173.016 -30.318 25

What I want to do:

  1. save off header information so that I can refer back to them in the future
  2. Group together every row that corresponds with the same header3 value and save it in its own separate .txt file. For instance, the expected output would give me one file that have the first three rows that have their 3rd element (header3 value) as 19, and then another .txt file which shall contain the next three rows as they contain header3 value as 21 and so on until the number of rows are done.

My attempts:

This is what I have so far: I tried using:

import re
def extract(oldfile,newfile,char):
    f = open(oldfile, “r”)
    f1 = f.readline()
    for x in range(len(f1)):
        if re.match(char, x):
            g = open(newfile, "w")
            g.write(x)

        else:
            print('does not work\n')

Problems with this: it does work, but each time I have to manually define what character 'char' must be used which needs to match in each line that is being read.

2.

def extract(oldfile):
    file = open(oldfile, "rU")
    f = file.readlines()
    f1 = map(str.strip,f)
    f2 = [sub.split ('\t') for sub in f1]
    for i in range(len(f2)):
        if f2[i][2] == f2[i+1][2]:
            print('works')
        else:
            print('no')

Here, my output is the following:

works
works
no
works
works
no
works
works
no
no
works
works
works
works
works
works

*(I understand this is the code to simply print and not to write off in a text file, I'm just trying to understand the structure of my for loop and whether it's working correctly or not!)

So my problem is: I'm not sure how to tell python group all the lists that are within the big list f2, that have the common third element and if they don't match, then move on to the next one. The problem I'm not able to solve is how should I design my for loop where the discontinuity in the terms matching would not stop the file, and simply move on and try and match the ones after it?

I'm not sure if I've done a good job explaining this, but my ultimate goal is this:

I want separate text files saved off, which have only the lines/rows that correspond to the same header3 value.


import itertools

# read in the lines from the input file
with open('/path/to/input.txt') as f:
    lines = f.readlines()

# write out the first line to a headers file
with open('headers', 'w') as o:
    o.write(lines[0])

# group lines by the last word on each (after splitting around spaces)
for group, items in itertools.groupby(lines[1:], lambda x: x.split()[-1]):
    # write out a 'group_n' file for each group (e.g. group_19, group_21, etc.)
    with open('group_%s' % group, 'w') as o:
        o.writelines(items)