I have a .txt file with the following data:
header1 header2 header3 173.012 -30.330 19 173.012 -30.349 19 173.012 -30.344 19 173.013 -30.345 21 173.013 -30.343 21 173.013 -30.349 21 173.014 -30.343 22 173.014 -30.325 22 173.014 -30.326 22 173.015 -30.348 24 173.016 -30.336 25 173.016 -30.318 25 173.016 -30.318 25 173.016 -30.318 25 173.016 -30.318 25 173.016 -30.318 25 173.016 -30.318 25
What I want to do:
- save off header information so that I can refer back to them in the future
- Group together every row that corresponds with the same header3 value and save it in its own separate .txt file. For instance, the expected output would give me one file that have the first three rows that have their 3rd element (header3 value) as 19, and then another .txt file which shall contain the next three rows as they contain header3 value as 21 and so on until the number of rows are done.
This is what I have so far: I tried using:
import re def extract(oldfile,newfile,char): f = open(oldfile, “r”) f1 = f.readline() for x in range(len(f1)): if re.match(char, x): g = open(newfile, "w") g.write(x) else: print('does not work\n')
Problems with this: it does work, but each time I have to manually define what character 'char' must be used which needs to match in each line that is being read.
def extract(oldfile): file = open(oldfile, "rU") f = file.readlines() f1 = map(str.strip,f) f2 = [sub.split ('\t') for sub in f1] for i in range(len(f2)): if f2[i] == f2[i+1]: print('works') else: print('no')
Here, my output is the following:
works works no works works no works works no no works works works works works works
*(I understand this is the code to simply print and not to write off in a text file, I'm just trying to understand the structure of my for loop and whether it's working correctly or not!)
So my problem is: I'm not sure how to tell python group all the lists that are within the big list f2, that have the common third element and if they don't match, then move on to the next one. The problem I'm not able to solve is how should I design my for loop where the discontinuity in the terms matching would not stop the file, and simply move on and try and match the ones after it?
I'm not sure if I've done a good job explaining this, but my ultimate goal is this:
I want separate text files saved off, which have only the lines/rows that correspond to the same header3 value.
import itertools # read in the lines from the input file with open('/path/to/input.txt') as f: lines = f.readlines() # write out the first line to a headers file with open('headers', 'w') as o: o.write(lines) # group lines by the last word on each (after splitting around spaces) for group, items in itertools.groupby(lines[1:], lambda x: x.split()[-1]): # write out a 'group_n' file for each group (e.g. group_19, group_21, etc.) with open('group_%s' % group, 'w') as o: o.writelines(items)