I have a data frame (min_set_obs) which contains two columns: the first containing numeric values, called treatment, and the second an id column called seq:

```
min_set_obs
Treatment seq
1 29
1 23
3 60
1 6
2 41
1 5
2 44
```

Let's say I have a vector of numeric values, called `key`

:

```
key
[1] 1 1 1 2 2 3
```

I.e. a vector of three 1s, two 2s, and one 3.

How would I go about identifying which rows from my `min_set_obs`

data frame contain the first occurrence of values from the `key`

vector?

I'd like my output to look like this:

```
Treatment seq
1 29
1 23
3 60
1 6
2 41
2 44
```

I.e. the sixth row from `min_set_obs`

was 'extra' (it was the fourth 1 when there should only be three 1s), so it would be removed.

I'm familiar with the `%in%`

operator, but I don't think it can tell me the position of the first occurrence of the `key`

vector in the first column of the `min_set_obs`

data frame.

Thanks

Use `dplyr`

, you can firstly count the `keys`

using `table`

and then take the top n rows correspondingly from each group:

```
library(dplyr)
m <- table(key)
min_set_obs %>% group_by(Treatment) %>% do({
# as.character(.$Treatment[1]) returns the treatment for the current group
# use coalesce to get the default number of rows (0) if the treatment doesn't exist in key
head(., coalesce(m[as.character(.$Treatment[1])], 0L))
})
# A tibble: 6 x 2
# Groups: Treatment [3]
# Treatment seq
# <int> <int>
#1 1 29
#2 1 23
#3 1 6
#4 2 41
#5 2 44
#6 3 60
```