I want to compare two fields of two files, such as follows:
Compare the 2nd filed of file one with the 1st field of file two, print the match (even if the match is repeated) and all the columns of file one and two.
File 1:
G4 b45 3 4
G4 b45 1 3
G3 b23 2 2
G3 b22 2 6
G3 b22 2 4
File 2:
b45 a b c
b64 d e f
b23 g h i
b22 j k l
b20 m n o
Output:
G4 b45 a b c 3 4
G4 b45 a b c 1 3
G3 b23 g h i 2 2
G3 b22 j k l 2 6
G3 b22 j k l 2 4
I have tried this with the following awk command using associative arrays:
awk 'FNR==NR {array1[$2] = $1 ; arrayrest[$2] = substr($0, index($0, $2)); next}($1 in array1) {print array1[$1] "\t" $0 "\t" arrayrest[$1]}' file1 file2
But there are two problems:
- It does not print the lines if the match is repeated while I want them to be printed.
- It repeats the first field of file two in the output.
How could I make this awk command work nicely? Thanks in advance.
Not quite the exact output formatting you want but the right output contents.
awk 'FNR==NR{seen[$1]=$0; next} ($2 in seen) {$2=seen[$2]}7' file2 file1
Add | column -t
to get more consistent column spacing.