Comparing files with multiple columns

advertisements

I am doing a directory cleanup to check for files that are not being used in our testing environment. I have a list of all the file names which are sorted alphabetically in a text file and another file I want to compare against.

Here is how the first file is setup:

test1.pl
test2.pl
test3.pl

It is a simple, one script name per line text file of all the scripts in the directory I want to clean up based on the other file below.

The file I want to compare against is a tab file which lists a script that each server runs as a test and there are obviously many duplicates. I want to strip out the testing script names from this file and compare spit it out to another file, use uniq and sort so that I can diff this file with the above to see which testing scripts are not being used.

The file is setup as such:

server: : test1.pl test2.pl test3.pl test4.sh test5.sh

There are some lines with less and some with more. My first impulse was to make a Perl script to split the line and push the values in an list if they are not there but that seems wholly inefficient. I am not to experienced in awk but I figured there is more than one way to do it. Any other ideas to compare these files?


A Perl solution that makes a %needed hash of the files being used by the servers and then checks against the file containing all the file names.

#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;

my %needed;
while (<SERVTEST>) {
    chomp;
    my (undef, @files) = split /\t/;
    @needed{ @files } = (1) x @files;
}

while (<TESTFILES>) {
    chomp;
    if (not $needed{$_}) {
        print "Not needed: $_\n";
    }
}

__TESTFILES__
test1.pl
test2.pl
test3.pl
test4.pl
test5.pl
__SERVTEST__
server1::   test1.pl    test3.pl
server2::   test2.pl    test3.pl
__END__
*** prints

C:\Old_Data\perlp>perl t7.pl
Not needed: test4.pl
Not needed: test5.pl