I am currently trying to count repeated values in a column of a CSV file and return the value to another CSV column in a python.
For example, my CSV file :
KeyID GeneralID 145258 KL456 145259 BG486 145260 HJ789 145261 KL456
What I want to achieve is to count how many data have the same
GeneralID and insert it into a new CSV column. For example,
KeyID Total_GeneralID 145258 2 145259 1 145260 1 145261 2
I have tried to split each column using the split method but it didn't work so well.
My code :
case_id_list_data =  with open(file_path_1, "rU") as g: for line in g: case_id_list_data.append(line.split('\t')) #print case_id_list_data #the result is dissatisfying #I'm stuck here..
And if you are adverse to pandas and want to stay with the standard library:
import csv from collections import Counter with open('file1', 'rU') as f: reader = csv.reader(f, delimiter='\t') header = next(reader) lines = [line for line in reader] counts = Counter([l for l in lines]) new_lines = [l + [str(counts[l])] for l in lines] with open('file2', 'wb') as f: writer = csv.writer(f, delimiter='\t') writer.writerow(header + ['Total_GeneralID']) writer.writerows(new_lines)
KeyID GeneralID Total_GeneralID 145258 KL456 2 145259 BG486 1 145260 HJ789 1 145261 KL456 2