Comparing files with multiple columns


I am doing a directory cleanup to check for files that are not being used in our testing environment. I have a list of all the file names which are sorted alphabetically in a text file and another file I want to compare against.

Here is how the first file is setup:

It is a simple, one script name per line text file of all the scripts in the directory I want to clean up based on the other file below.

The file I want to compare against is a tab file which lists a script that each server runs as a test and there are obviously many duplicates. I want to strip out the testing script names from this file and compare spit it out to another file, use uniq and sort so that I can diff this file with the above to see which testing scripts are not being used.

The file is setup as such:

server: :

There are some lines with less and some with more. My first impulse was to make a Perl script to split the line and push the values in an list if they are not there but that seems wholly inefficient. I am not to experienced in awk but I figured there is more than one way to do it. Any other ideas to compare these files?

A Perl solution that makes a %needed hash of the files being used by the servers and then checks against the file containing all the file names.

use strict;
use warnings;
use Inline::Files;

my %needed;
while (<SERVTEST>) {
    my (undef, @files) = split /\t/;
    @needed{ @files } = (1) x @files;

while (<TESTFILES>) {
    if (not $needed{$_}) {
        print "Not needed: $_\n";

*** prints

Not needed:
Not needed: