Denormalize the list in a data frame in R

advertisements

This question already has an answer here:

  • Split comma-separated column into separate rows 4 answers

I have a data frame that looks like this (from mongo db..)

team_id <- c(1,2)
member <- c("15,25,35","12,22,32")
data.frame (team_id,member)

which I'm trying to convert data frame like this..

team_id2 <- c(1,1,1,2,2,2)
member2 <- c(15,25,35,12,22,32)
data.frame (team_id2, member2)

I tried to use 'unlist' but cannot get "team_id" to be repeated every column. I will appreciate any guidance on this!


We can use cSplit from library(splitstackshape). It is easy and compact to work with cSplit for these kind of problems. We just provide the column to split i.e. member, the delimiter (,) and the direction (long).

library(splitstackshape)
cSplit(d1, "member", sep=",", "long")
#    team_id member
#1:       1     15
#2:       1     25
#3:       1     35
#4:       2     12
#5:       2     22
#6:       2     32


or using data.table, we convert the 'data.frame' to 'data.table' (setDT(d1)), grouped by 'team_id', we split the 'member' by , and unlist the output.

library(data.table)
setDT(d1)[, .(member=unlist(tstrsplit(member, ","))), team_id]
#   team_id member
#1:       1     15
#2:       1     25
#3:       1     35
#4:       2     12
#5:       2     22
#6:       2     32


Or using tidyr, we can split the 'member' by , and unnest (from tidyr)

library(tidyr)
library(stringr)
unnest(d1[1], member= str_split(d1$member, ","))
#Source: local data frame [6 x 2]

#  team_id member
#   (dbl)  (chr)
#1       1     15
#2       1     25
#3       1     35
#4       2     12
#5       2     22
#6       2     32


Or we can use a base R solution. We use strsplit to split the 'member' column into a list, set the names of the list as 'team_id' and use stack to convert the list to data.frame

stack(setNames(strsplit(as.character(d1$member), ","), d1$team_id))[2:1]

data

d1 <- data.frame (team_id,member)