I have a large image dataset (around 3000 files). My problem is simple, I want to copy randomly chosen image files to another destination. I use
random.sample to select five hundred images and store their names in a list. I now want to copy files from the src folder to a destination folder if their name exists in the list (and was hence randomly chosen).
The following code however copies ALL files in the folder, whether or not their names appear in the randomly selected list. help
import os.path import os import glob import random import shutil dirfiles = os.listdir("/media/Data/Leaves/Leaves") myfiles =  myfiles.append(random.sample(dirfiles,500)) print myfiles final_list=myfiles print final_list count=0 for elem in final_list: print elem count= count+1 print count src = '/home/mjanja/Desktop/Leaves' dst = '/home/mjanja/Desktop/Positive Leaves' for filename in final_list: for file in glob.glob( os.path.join(src,filename)): shutil.copy(file,dst) print "Copied file!!" +infile
Your use of glob.glob in this case is where the danger occurs. That returns an iterator of all files that fit the pattern you provide. You're building up a list of 500 specific files, but then matching by pattern ... depending on what characters are in your filename this could give you extremely surprising results, as the pattern might very well match more files than your original 500.
You are also doing some steps unnecessarily, and could wrap it all up in a function:
import os import random import shutil def copy_sample(src, dst, size=500): files = [os.path.join(src, i) for i in random.sample(os.listdir(src), size)] count = len(files) for index, afile in enumerate(files): try: shutil.copy(afile, dst) print 'Copied file %s (%d/%d)' % (afile, index + 1, count) except Exception, msg: print 'Failed file %s (%d/%d) -- %s' % (afile, index + 1, count, msg) src = '/home/mjanja/Desktop/Leaves' dst = '/home/mjanja/Desktop/Positive Leaves' copy_sample(src, dst)