How do I create a new column based on the values ​​in other columns in a data frame?

advertisements

I have a dataframe "df" like following. First column is the Samples. I wanted to create a new column based on the values in other columns.

Input:

Sample  Region4 Region1 Region5 Region3 Region2 Type
T1       0       0.289    0.378   0       1      K
T2       0       0.167     0     0.875  0.389    K
T3    0.186     0.12345    0      0     0.187    K
T4   0.11234     0.1789    0     0.457  0.786    L
T5    0.2347     0.2567    0      0       0      L
T6   0.28769       0     0.123   0.1987 0.1565   L
T7    0.142        0     0.1987   0       0      M
T8       0       0.1256  0.123   0.129   0.111   M
T9    0.187      0.987     0     0.237   0.783   M

In the "New" column "0" should be assigned if the sample showing value <0.2 in at least one of the Regions and "2" should be assigned if the sample showing value >=0.2 in at least one of the Regions. It should look like following:

Output:

Sample  Region4 Region1 Region5 Region3 Region2 Type  New
T1       0       0.289    0.378   0       1      K     2
T2       0       0.167     0     0.875  0.389    K     2
T3    0.186     0.12345    0      0     0.187    K     0
T4   0.11234     0.1789    0     0.457  0.786    L     2
T5    0.2347     0.2567    0      0       0      L     2
T6   0.28769       0     0.123   0.1987 0.1565   L     2
T7    0.142        0     0.1987   0       0      M     0
T8       0       0.1256  0.123   0.129   0.111   M     0
T9    0.187      0.987     0     0.237   0.783   M     2


We can do this in a vectorized way with rowSums

nm1 <- startsWith(names(df1), "Region")
df1$New <- c(0, 2)[(rowSums(df1[nm1] >=0.2) !=0)+1]
df1$New
#[1] 2 2 0 2 2 2 0 0 2


Or another option with Reduce

c(0, 2)[Reduce(`|`, lapply(df1[nm1], `>=`, 0.2)) + 1]
#[1] 2 2 0 2 2 2 0 0 2