Perl; How to filter a hash by value (specifying a condition)

advertisements

I'm not very expert in perl language but I encountered a problem that I couldn't fix, even after a long research on the web. Briefly, I have an hash of hashes like this:

my %HoH = (
    chr1 => { start => 30, end => 55, },
    chr1 => { start => 18, end => 21, },
    chr1 => { start => 30, end => 80, }
);

I simply would like to find a way to filter it ( I mean, obtaining a new hash of hashes in output) for particular values. In particular, given an interval, let's say 40-60, I want a new hash of hashes with only elements overlapping this interval.

in other words I would like to get as output:

my %HoH = (
    chr1 => { start => 30, end => 55, },
    chr1 => { start => 30, end => 80, }
);

As first attempt, I thought to try something like this:

identify and then delete all elements with "end" < 40 and: identify and then delete all elements with "start" > 60.

So I just tried:

grep { $HoH{$_}{"end"} < 40 } keys(%HoH);
delete $HoH{$_} for grep { $HoH{$_}{"end"} < 40} keys(%HoH);

But just after the first of the two filters I found in the output only last element and I really don't understand where is the mistake:

hash size is 1
chr1: start=30 end=80

printed out with the following:

my $len = keys %HoH;
print "hash size is $len\n";

foreach my $chr ( keys %HoH ) {
   print "$chr: ";
   for my $position ( keys %{ $HoH{$chr} } ) {
      print "$position=$HoH{$chr}{$position} ";
   }
   print "\n";
}

It seems quite complex for me this time, I would be glad if somebody of you could give me some help.


As another poster mentions - your problems isn't your hash merge, it's that hashes cannot have duplicate keys:

use strict;
use warnings;
use Data::Dumper;

my %HoH = (
    chr1 => { start => 30, end => 55, },
    chr2 => { start => 18, end => 21, },
    chr3 => { start => 30, end => 80, }
);

grep { $HoH{$_}{"end"} < 40 } keys(%HoH);
delete $HoH{$_} for grep { $HoH{$_}{"end"} < 40} keys(%HoH);

print Dumper \%HoH;

This works correctly - note the different hash keys. I would note though - you're iterating your keys, grepping them, then deleting them. It might be better to:

foreach my $element ( keys %HoH ) {
    delete $HoH{$element}
        unless ( $HoH{$element}{start} < 40
              or $HoH{$element}{end}   > 60 );
}

print Dumper \%HoH;

You could do what you're trying to do via an array of hashes:

use strict;
use warnings;
use Data::Dumper;

my @AoH = (
    { start => 30, end => 55, },
    { start => 18, end => 21, },
    { start => 30, end => 80, }
);

print Dumper \@AoH;

my @filtered = grep { $_->{start} > 40 or $_->{end} < 60 } @AoH;
print Dumper \@filtered;

Note - in your original example, your grep/delete lines are doing the same thing, and you can do a compound grep to test for both conditions.