How to count how many times a number appears in a column of an HTML table

advertisements

I have an HTML report with a table and I need to count how many times a number appears in a specific column. This should be made in bash (on Fedora).
Let's take the next example. I need to count how many times number 3 and number 2 appear (on column 3) in all table:
<tr>
<td>test11</td>
<td>test12</td>
<td>3</td>
<td>test14</td>
</tr>
<tr>
<td>test11</td>
<td>test12</td>
<td>3</td>
<td>test14</td>
</tr>
<tr>
<td>test11</td>
<td>test12</td>
<td>2</td>
<td>test14</td>
</tr>


Using gnu awk you can do:

awk -v RS='</tr>' -v F='<td>' '$4~/>3</{a++; next} $4~/>2</{b++; next}
     END{printf "3-count=%d, 2-count=%d\n", a, b}' file
3-count=2, 2-count=1