How do I search for attributes that are not accurate in a particular group?

advertisements

I am looking to identify all instances of an attribute where the value is not equal within a group. For example:

+------------+---------+------------+
|group_id    |  area   |     pip    |
+------------+---------+------------+
|  432       |  23     |  jack      |
|  432       |  23     |  jack      |
|  745       |  45     |  bill      |
|  745       |  45     |  bill      |
|  848       |  67     |  lynn      |
|  848       |  65     |  lynn      |
|  23        |  33     |  hop       |
|  23        |  33     |  hope      |
|  670       |  893    |  sue       |
|  670       |  893    |  sue       |
+------------+---------+------------+

What I need to know is, all attributes that are different within a group ID. So the output table should be:

+------------+---------+------------+
|group_id    |  area   |     pip    |
+------------+---------+------------+
|  848       |  67     |  lynn      |
|  848       |  65     |  lynn      |
|  23        |  33     |  hop       |
|  23        |  33     |  hope      |
+------------+---------+------------+

This is the output because the area for group_id 848 is different, and the pip for group 23 is misspelled. I would also like to return all fields (as I have more than 3 I am looking at). Thank you.


In this solution the only thing you have to do in order to support additional attributes, is to add them to this part -
order by area,pip,...

select      *

from       (select      t.*
                       ,min (rnk) over (partition by group_id)  as min_rnk
                       ,max (rnk) over (partition by group_id)  as max_rnk

            from       (select      t.*
                                   ,rank () over (partition by group_id order by area,pip)  as rnk

                        from        t
                        ) t
            ) t

where       min_rnk <> max_rnk
;

or

select      *

from       (select      t.*
                       ,count (distinct rnk) over (partition by group_id)  as distinct_rnk

            from       (select      t.*
                                   ,rank () over (partition by group_id order by area,pip)  as rnk

                        from        t
                        ) t
            ) t

where       distinct_rnk > 1
;

The count (distinct rnk) is more natural to understand but may involve significant performance penalty