Shutil and copy files


I have a large image dataset (around 3000 files). My problem is simple, I want to copy randomly chosen image files to another destination. I use random.sample to select five hundred images and store their names in a list. I now want to copy files from the src folder to a destination folder if their name exists in the list (and was hence randomly chosen).

The following code however copies ALL files in the folder, whether or not their names appear in the randomly selected list. help

import os.path
import os
import glob
import random
import shutil

dirfiles = os.listdir("/media/Data/Leaves/Leaves")
myfiles = []

print myfiles

print final_list

for elem in final_list:
    print elem
    count= count+1

print count

src = '/home/mjanja/Desktop/Leaves'
dst = '/home/mjanja/Desktop/Positive Leaves'
for filename in final_list:
    for file in glob.glob( os.path.join(src,filename)):

print "Copied file!!" +infile

Your use of glob.glob in this case is where the danger occurs. That returns an iterator of all files that fit the pattern you provide. You're building up a list of 500 specific files, but then matching by pattern ... depending on what characters are in your filename this could give you extremely surprising results, as the pattern might very well match more files than your original 500.

You are also doing some steps unnecessarily, and could wrap it all up in a function:

import os
import random
import shutil

def copy_sample(src, dst, size=500):
    files = [os.path.join(src, i) for i in random.sample(os.listdir(src), size)]
    count = len(files)
    for index, afile in enumerate(files):
           shutil.copy(afile, dst)
           print 'Copied file %s (%d/%d)' % (afile, index + 1, count)
        except Exception, msg:
           print 'Failed file %s (%d/%d) -- %s' % (afile, index + 1, count, msg)

src = '/home/mjanja/Desktop/Leaves'
dst = '/home/mjanja/Desktop/Positive Leaves'

copy_sample(src, dst)