How to find the first occurrence of a vector of numeric elements in a data frame column?

advertisements

I have a data frame (min_set_obs) which contains two columns: the first containing numeric values, called treatment, and the second an id column called seq:

min_set_obs
 Treatment seq
       1   29
       1   23
       3   60
       1   6
       2   41
       1   5
       2   44

Let's say I have a vector of numeric values, called key:

key
[1] 1 1 1 2 2 3

I.e. a vector of three 1s, two 2s, and one 3.

How would I go about identifying which rows from my min_set_obs data frame contain the first occurrence of values from the key vector?

I'd like my output to look like this:

Treatment seq
   1   29
   1   23
   3   60
   1   6
   2   41
   2   44

I.e. the sixth row from min_set_obs was 'extra' (it was the fourth 1 when there should only be three 1s), so it would be removed.

I'm familiar with the %in% operator, but I don't think it can tell me the position of the first occurrence of the key vector in the first column of the min_set_obs data frame.

Thanks


Use dplyr, you can firstly count the keys using table and then take the top n rows correspondingly from each group:

library(dplyr)
m <- table(key)

min_set_obs %>% group_by(Treatment) %>% do({
    # as.character(.$Treatment[1]) returns the treatment for the current group
    # use coalesce to get the default number of rows (0) if the treatment doesn't exist in key
    head(., coalesce(m[as.character(.$Treatment[1])], 0L))
})

# A tibble: 6 x 2
# Groups:   Treatment [3]
#  Treatment   seq
#      <int> <int>
#1         1    29
#2         1    23
#3         1     6
#4         2    41
#5         2    44
#6         3    60