方法:acastを使う
補足:library(reshape2)が必要
#サンプルデータ作成
v.x1 <- c("oda","oda","oda","toyo","toyo","toyo","ie","ie","ie","ie")
v.x2 <- c("a","a","b","b","b","c","c","c","c","c")
v.x3 <- c(1000,900,800,700,600,500,400,300,200,100)
df.x <- data.frame(user_id = v.x1, item = v.x2, price =v.x3)
df.x
> df.x user_id item price 1 oda a 1000 2 oda a 900 3 oda b 800 4 toyo b 700 5 toyo b 600 6 toyo c 500 7 ie c 400 8 ie c 300 9 ie c 200 10 ie c 100
#クロス集計
tmp <- acast(df.x, user_id ~ item, sum, value.var = "price")
tmp
> tmp a b c ie 0 0 1000 oda 1900 800 0 toyo 0 1300 500
#補足:user_idの列を無理やり作る
df.cross <- data.frame(tmp)
write.csv(df.cross,file = "./cross.csv")
df.cross <- fread("./cross.csv") #llibrary(data.table)が必要
df.cross <- data.frame(df.cross)
df.cross
> df.cross V1 a b c 1 ie 0 0 1000 2 oda 1900 800 0 3 toyo 0 1300 500
names(df.cross)[1] <- c("user_id")
df.cross
> df.cross user_id a b c 1 ie 0 0 1000 2 oda 1900 800 0 3 toyo 0 1300 500
追記:acastは遅いのでspreadがお勧め
R 高速に大規模データのクロス集計をおこなう(tally, spread)
0 件のコメント :
コメントを投稿
注: コメントを投稿できるのは、このブログのメンバーだけです。