How do you take Python sort data and run math on the tuple without messing up the sort order?


I am writing a script to list the 20 largest files in a target directory. Once I have the files, I perform some math on the size to apply the correct human readable sizing information, i.e., Kb, Mb, Gb.

This however is getting the sort out of order. How can I do this, and keep the sort order intact?

#! /usr/bin/env python

import operator, os, sys

args = sys.argv
if len(args) != 2:
    print "You must one enter one directory as an argument."
    target = args[1]

data = {}
for root, dirs, files in os.walk(target):
   for name in files:
       filename = os.path.join(root, name)
       if os.path.exists(filename):
            size = float(os.path.getsize(filename))
            data[filename] = size

sorted_data = sorted(data.iteritems(), key=operator.itemgetter(1), reverse=True)
total = str(len(sorted_data))

while len(sorted_data) > 20:

final_data = {}
for name in sorted_data:
    size = str(name[1])
    if size >= 1024:
        size = round(float(size) / 1024, 2)
        if size >= 1024:
            size = round(size / 1024, 2)
            if size >= 1024:
                size = round(size / 1024, 2)
                size = str(size) + "Gb"
                size = str(size) + "Mb"
        size = str(size) + "Kb"
    final_data[name] = size

print "The 20 largest files are:\n"
for name in final_data:
    print str(final_data[name]) + " " + str(name)
print "\nThere are a total of " + total + " files located in " + target

Your problem is that you create a brand new dictionary to store the modified filesize data. Because that dictionary doesn't contain any information about the file sizes, and because dictionaries don't store their information in any fixed order, you lose your sort order. But it's simple to recover; simply iterate over the sorted_data instead of the over the final_data, using final_data to access the human-readable file sizes. So something like this:

for filename, size in sorted_data:
    print filename, final_data[filename]

But an even better solution would be to put your human-readable string generating code into a function!

def human_readable_size(size):
    # logic to convert size
    return hr_size

Now you don't even have to create a dictionary:

for filename, size in sorted_data:
    print filename, human_readable_size(size)