Filter rows with measurements for more than a year in R

advertisements

This is a subset of my dataset, with measurements of the variable elevated in several experiments

        Experiment.Name Sampling.Year   elevated
3409       Swiss Jura_c          1999   17.30000
3410       Swiss Jura_c          1999    9.10000
3411 SwissFACE_lolium_c          2000   -1.45545
3412 SwissFACE_lolium_c          2000   -2.94843
3413 SwissFACE_lolium_c          2000   -3.74132
3414 SwissFACE_lolium_c          2000   -1.42080
3461              DRI_c          1993  122.87900
3462              DRI_c          1993   13.71500
3463              DRI_c          1993    0.91800
3464              DRI_c          1993    1.29800
3465              DRI_c          1993    2.43600
3466              DRI_c          1993    3.46600
3467              DRI_c          1994    0.42700
3469              DRI_c          1994    1.74100
3470              DRI_c          1994    1.01700
3471              DRI_c          1994    2.38300
3640 Bonanza Creek_pb_f          2001 3222.00000
3641 Bonanza Creek_pg_f          2001 3455.00000
3665    Fork Mountain_f          2000    0.24900
3669    Fork Mountain_f          2001    0.23100
4037            KFFL_wh          2003   42.07000

I would like to subset the whole dataset so I only keep those experiments which contains measurements of elevated for more than one year. For instance, in the above table I would exclude the rows corresponding to Swiss Jura_c experiment because it only has measurements for one single year: 1999. However, I would include the rows corresponding to the DRI_c experiment because it contains measurements for more than one year: 1993 and 1994. How can I achieve such subset selection in R? Thanks


Try

library(data.table)
setDT(df1)[, .SD[uniqueN(Sampling.Year)>1], Experiment.Name]

Or

library(dplyr)
 df1 %>%
    group_by(Experiment.Name) %>%
    filter(n_distinct(Sampling.Year)>1)

data

df1 <- structure(list(Experiment.Name = c("Swiss Jura_c",
"Swiss Jura_c",
"SwissFACE_lolium_c", "SwissFACE_lolium_c", "SwissFACE_lolium_c",
"SwissFACE_lolium_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c",
"DRI_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c", "Bonanza Creek_pb_f",
"Bonanza Creek_pg_f", "Fork Mountain_f", "Fork Mountain_f", "KFFL_wh"
), Sampling.Year = c(1999L, 1999L, 2000L, 2000L, 2000L, 2000L,
1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L,
1994L, 2001L, 2001L, 2000L, 2001L, 2003L), elevated = c(17.3,
9.1, -1.45545, -2.94843, -3.74132, -1.4208, 122.879, 13.715,
0.918, 1.298, 2.436, 3.466, 0.427, 1.741, 1.017, 2.383, 3222,
3455, 0.249, 0.231, 42.07)), .Names = c("Experiment.Name",
"Sampling.Year",
"elevated"), row.names = c(3409L, 3410L, 3411L, 3412L, 3413L,
3414L, 3461L, 3462L, 3463L, 3464L, 3465L, 3466L, 3467L, 3469L,
3470L, 3471L, 3640L, 3641L, 3665L, 3669L, 4037L), class = "data.frame")