Can I use the dplyr mutate () statements or do something else to make a conditional time from the diagnostic variable?

advertisements

I would like to create a "time since diagnosis" variable that is conditional on two other existing variables in my data.

Here is some example data:

id <- c("0001", "0001", "0001", "0002", "0002", "0002", "0003", "0003", "0003", "0003")
dementia <- c(0, 0, 1, 0, 1, 1, 0, 1, 0, 1)
age_visit <- c("80", "81", "82","50", "51", "52","60", "61", "62", "63")
ds <- data.frame(id, dementia, age_visit)

I have a binary diagnosis variable dementia in a long format data set.
Looks like this:

    id     dementia   age_visit
1 0001        0        80
2 0001        0        81
3 0001        1        82
4 0002        0        50
5 0002        1        51
6 0002        1        52
7 0003        0        60
8 0003        1        61
9 0003        0        62

I want an age_at_diagnosis variable that pastes the age_visit for the first instance of the dementia diagnosis, when it first equals 1. This step isn't vital if there is a way to simply skip to the last step which is the time since the first instance of diagnosis. The main problem is that individuals can be diagnosed then have another assessment that is null. I want the first case and then the time since that first assessment to be the time since diagnosis.

So the end result would look like this, with time_sincedx being age_visit - age_at_dx:

     id   dementia    age_visit   age_at_dx    time_sincedx
1  0001        0        80        NA           NA
2  0001        0        81        NA           NA
3  0001        1        82        82            0
4  0002        0        50        NA           NA
5  0002        1        51        51            0
6  0002        1        52        51            1
7  0003        0        60        NA           NA
8  0003        1        61        61            0
9  0003        0        62        61            1
10 0003        1        63        61            2

Is there any way to do this with dplyr? I've tried this but it's not quite right. It pastes each age at each occasion leaving me with zero's down the time_since_dx column.

df <- mutate(df, age_at_dx = ifelse(dementia==1, age_at_visit, NA))
df$time_sincedx<- df$age_at_visit - df$age_atdx

Any ideas much appreciated!


A little subsetting and tidyr::fill to deal with excess NA values will get you there:

library(tidyverse)

ds %>% group_by(id) %>%    # evaluate patients individually
    mutate(age_visit = as.integer(as.character(age_visit)),    # factor to integer
           # if no dementia, NA else min age where dementia == 1
           age_at_dx = ifelse(dementia == 0, NA, min(age_visit[dementia == 1]))) %>%
    fill(age_at_dx) %>%    # fill in NAs after non-NA (where dx == 1, then 0 like line 9)
    mutate(time_since_dx = age_visit - age_at_dx)

## Source: local data frame [10 x 5]
## Groups: id [3]
##
##        id dementia age_visit age_at_dx time_since_dx
##    <fctr>    <dbl>     <int>     <int>         <int>
## 1    0001        0        80        NA            NA
## 2    0001        0        81        NA            NA
## 3    0001        1        82        82             0
## 4    0002        0        50        NA            NA
## 5    0002        1        51        51             0
## 6    0002        1        52        51             1
## 7    0003        0        60        NA            NA
## 8    0003        1        61        61             0
## 9    0003        0        62        61             1
## 10   0003        1        63        61             2

or to skip the age_at_dx column,

ds %>% group_by(id) %>%
    mutate(age_visit = as.integer(as.character(age_visit)),
           time_since_dx = age_visit - min(age_visit[dementia == 1]),
           time_since_dx = ifelse(time_since_dx < 0, NA, time_since_dx))    # make negatives NA

## Source: local data frame [10 x 4]
## Groups: id [3]
##
##        id dementia age_visit time_since_dx
##    <fctr>    <dbl>     <int>         <int>
## 1    0001        0        80            NA
## 2    0001        0        81            NA
## 3    0001        1        82             0
## 4    0002        0        50            NA
## 5    0002        1        51             0
## 6    0002        1        52             1
## 7    0003        0        60            NA
## 8    0003        1        61             0
## 9    0003        0        62             1
## 10   0003        1        63             2