How do I assign a unique identifier to each record sequence?

advertisements

This question already has an answer here:

  • Add ID column by group [duplicate] 4 answers

How to assign unique id to each records sequence?

For example I have the following table:

time    machine visitor
11:30   A       123
11:31   A       123
11:33   A       123
11:34   A       256
11:35   A       256
11:36   A       256
11:37   A       256
11:38   A       789
11:40   A       789
11:42   A       789
11:50   A       123
11:51   A       123

And as a result I would like the following session id to be added to each record:

time    machine visitor session
11:30   A       123     1
11:31   A       123     1
11:33   A       123     1
11:34   A       256     2
11:35   A       256     2
11:36   A       256     2
11:37   A       256     2
11:38   A       789     3
11:40   A       789     3
11:42   A       789     3
11:50   A       123     4
11:51   A       123     4

I wrote a loop that is supposed to do that but it's way too slow:

session = 1
for (i in 2:nrow(df)) {
  if(df[i, ]$visitor != df[i-1, ]$visitor)
  {
    session = session + 1
  }
  df[i, ]$session = session
}


Probably not the most legible way to do this but you can do the following:

df$session <- cumsum(c(TRUE,as.logical(diff(df$visitor))))

To break it down a little:

> diff(df$visitor) #Difference between values in each row.
[1]   0   0 133   0   0   0 533   0   0
> c(TRUE,as.logical(diff(df$visitor))) #Converts to logical and add a lag:
 [1] TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
> cumsum(c(TRUE,as.logical(diff(df$visitor)))) #Then cumulative sum.
[1] 1 1 1 2 2 2 2 3 3 3