Repeat one record for N times and create a new sequence from 1 to N

advertisements

I want to repeat the rows of a data.frame for N times. Here N calculates based on the difference between the values of a first and second column in each row of a data.frame. Here I am facing a problem with N. In particular, N may change per each row. And I need to create a new column by creating a sequence from a first value to second value in row 1 by increasing K. Here K remains constant for all the rows.

Ex: d1<-data.frame(A=c(2,4,6,8,1),B=c(8,6,7,8,10))

In the above dataset, there are 5 rows. THe difference between first and second values in first row is 7. Now I need to replicate the first row for 7 times and need to create a new column with the sequence of 2,3,4,5,6,7 and 8.

I can create a dataset by using the following code.

dist<-1
rec_len<-c()
seqe<-c()
for(i in 1:nrow(d1))
{
    a<-seq(d1[i,"A"],d1[i,"B"],by=dist)
    rec_len<-c(rec_len,length(a))
    seqe<-c(seqe,a)
}
d1$C<-rec_len

d1<-d1[rep(1:nrow(d1),d1$C),]
d1$D<-seqe
row.names(d1)<-NULL

But it is taking very long time. Is there any possibity to speed up the process?


A data.table approach for this can be to use 1:nrow(df) as grouping variable to make rowwise operation for creating a list with the sequences of A and B, and then unlist, i.e.

library(data.table)

setDT(d1)[, C := B - A + 1][,
     D := list(list(seq(A, B))), by = 1:nrow(d1)][,
                lapply(.SD, unlist), by = 1:nrow(d1)][,
                                              nrow := NULL][]

Which gives,

   A  B  C  D
 1: 2  8  7  2
 2: 2  8  7  3
 3: 2  8  7  4
 4: 2  8  7  5
 5: 2  8  7  6
 6: 2  8  7  7
 7: 2  8  7  8
 8: 4  6  3  4
 9: 4  6  3  5
10: 4  6  3  6
11: 6  7  2  6
12: 6  7  2  7
13: 8  8  1  8
14: 1 10 10  1
15: 1 10 10  2
16: 1 10 10  3
17: 1 10 10  4
18: 1 10 10  5
19: 1 10 10  6
20: 1 10 10  7
21: 1 10 10  8
22: 1 10 10  9
23: 1 10 10 10
    A  B  C  D

Note You can easily change K within seq, i.e.

setDT(d1)[, C := B - A + 1][,
     D := list(list(seq(A, B, by = 0.2))), by = 1:nrow(d1)][,
                lapply(.SD, unlist), by = 1:nrow(d1)][,
                                              nrow := NULL][]