Delete sequential, duplicate, and duplicate files

advertisements

I have a server running Windows Server 2003 R2 Enterprise with directories of anywhere between 50,000 to 250,000 1KB text files each. The filenames are sequential (e.g., MLLP000001.rcv, MLLP000002.rcv, etc.) and identical files will be sequential. Once subsequent files differ, I can expect I won't receive another identical file.

I need a script that will do the following, but I don't know where to begin.

for each file in the target directory index 'i'
{
  for each file in the target directory index 'j' = i+1
  {
    compare the hash values of files i and j

    if the hashes are identical
      delete file j
    if the hashes differ
      set i = j // to skip past the files that are now deleted
      break
  }
}

I tried DOS batch scripts, but that's really cumbersome, I can't break out of the inner loop, and it trips over itself because the outer loop has a list of files in the directory, but that list is constantly changing. VBScript doesn't have a hash function as far as I know.


Since the files are only 1KB in size, why not do a bitwise compare and avoid the hash?