Return duplicates in a list based on 2 criteria

advertisements

I have a list that contains 2 data sets.

a = data.frame(c(1,1,1,1,1,2,2,2,2,2), c("a","b", "c", "d","e","e","f", "g", "h","i"))
colnames(a) = c("Numbers","Letters")
c = data.frame(c(3,3,3,3,3,4,4,4,4,4), c("q","r", "s", "t","u","u","v", "w", "x","y"))
colnames(c) = c("Numbers","Letters")
my.list = list(a,c)
my.list

I am interest in returning only the letters that are found in common between the unique numbers of each data set. The desired results are given by the following:

new_a = data.frame(c(1,2),c("e","e"))
new_c = data.frame(c(3,4),c("u","u"))
colnames(new_a) = c("Numbers","Letters")
colnames(new_c) = c("Numbers","Letters")
my.new.list = list(new_a,new_c)
my.new.list

As you will see, letter "e" is the only common letter that numbers "1" and "2" share in data set 1 while letter "u" is the only common letter shared by numbers 3 and 4 in data set 2.

I am trying to do this for a very large list. To give you an idea of my true problem, I have a list where each element is a state. Within each state, I have multiple asset managers or "accounts" and each account holds multiple tickers. I am trying to find the tickers that the accounts have in common for each geographical locations. In the above example, the numbers would be the accounts, the letters would be the tickers and the two data sets contained in the list would be two different states. I hope that helps frame my problem.

Thanks!


We can use Reduce with intersect in base R

 lapply(my.list, function(x) x[with(x, Letters %in%
                 Reduce(intersect, split(Letters, Numbers))),])

Or using dplyr

 library(dplyr)
 lapply(my.list, function(x)
                    x %>%
                        group_by(Letters) %>%
                        filter(n_distinct(Numbers)==2))

Instead of having a list, it can be changed to a single dataset with an additional grouping column and then do the same,

 library(tidyr)
 unnest(my.list, group) %>%
            group_by(group, Letters) %>%
            filter(n_distinct(Numbers)==2)

If we don't know the number of unique Numbers in each list elements

  unnest(my.list, group) %>%
              group_by(group) %>%
              mutate(n= n_distinct(Numbers)) %>%
              group_by(Letters, add=TRUE) %>%
              filter(n_distinct(Numbers)==n) %>%
              select(-n)