r - 如何将 data.table 和 base r 函数的混合链接在一起?

我正在使用 data.table 包来处理非常大的数据集,而 value 它的速度和清晰度。但我是新手,并且 chaining 一起工作时遇到困难,尤其是在处理一组混合数据时。table 和基本 R 函数。我的问题是,如何将下面的示例函数链接在一起,形成一个无缝的代码串来定义目标 data 对象?

下面是正确的输出,通过单独(未链接)运行每一行代码生成,生成代码显示在输出的正下方:

> data
    ID Period State Values
 1:  1      1    X0      5
 2:  1      2    X1      0
 3:  1      3    X2      0
 4:  1      4    X1      0
 5:  2      1    X0      1
 6:  2      2    XX      0
 7:  2      3    XX      0
 8:  2      4    XX      0
 9:  3      1    X2      0
10:  3      2    X1      0
11:  3      3    X9      0
12:  3      4    X3      0
13:  4      1    X2      1
14:  4      2    X1      2
15:  4      3    X9      3
16:  4      4    XX      0

library(data.table)

data <- 
  data.frame(
    ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
    Period = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
    Values_1 = c(5, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0),
    Values_2 = c(5, 2, 0, 12, 2, 0, 0, 0, 0, 0, 0, 2, 4, 5, 6, 0),
    State = c("X0","X1","X2","X1","X0","X2","X0","X0", "X2","X1","X9","X3", "X2","X1","X9","X3")
  )

# changes State to "XX" if remaining Values_1 + Values_2 cumulative sums = 0 for each ID: 
setDT(data)[, State := ifelse(rev(cumsum(rev(Values_1 + Values_2))), State, "XX"), ID]

# create new column "Values", which equals "Values_1":
setDT(data)[,Values := Values_1] 

# in base R, drops columns Values_1 and Values_2:
data <- subset(data, select = -c(Values_1,Values_2)) # How to do this step in data.table, if possible or advisable?  

# in base R, changes all "XX" elements in State column to "HI":
data$State <- gsub('XX','HI', data$State) # How to do this step in data.table, if possible or advisable?

对于它的价值,下面是我尝试使用 '%>%' 管道 operators 链接在一起的尝试,但失败了(错误消息 Error in data$State : object of type 'closure' is not subsettable),虽然我宁愿使用数据链接在一起。table operators:

data <- 
  data.frame(
    ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
    Period = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
    Values_1 = c(5, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0),
    Values_2 = c(5, 2, 0, 12, 2, 0, 0, 0, 0, 0, 0, 2, 4, 5, 6, 0),
    State = c("X0","X1","X2","X1","X0","X2","X0","X0", "X2","X1","X9","X3", "X2","X1","X9","X3")
  ) %>%
  setDT(data)[, State := ifelse(rev(cumsum(rev(Values_1 + Values_2))), State, "XX"), ID] %>%
  setDT(data)[,Values := Values_1] %>%
  subset(data, select = -c(Values_1,Values_2)) %>%
  data$State <- gsub('XX','HI', data$State)

回答1

如果我理解正确,OP 想要

  • 将列 Value_1 重命名为 Value(或者用 OP 的话说:创建新列“Values”,等于“Values_1”)
  • 删除列 Value_2
  • State 列中所有出现的 XX 替换为 HI

这是我在 data.table 语法中要做的:

setDT(data)[, State := ifelse(rev(cumsum(rev(Values_1 + Values_2))), State, "XX"), ID][
  , Values_2 := NULL][
    State == "XX", State := "HI"][]
setnames(data, "Values_1", "Values")
data
ID Period Values  State
 1:     1      1      5     X0
 2:     1      2      0     X1
 3:     1      3      0     X2
 4:     1      4      0     X1
 5:     2      1      1     X0
 6:     2      2      0     HI
 7:     2      3      0     HI
 8:     2      4      0     HI
 9:     3      1      0     X2
10:     3      2      0     X1
11:     3      3      0     X9
12:     3      4      0     X3
13:     4      1      1     X2
14:     4      2      2     X1
15:     4      3      3     X9
16:     4      4      0     HI

setnames() 通过引用更新,例如,无需复制。无需创建 Values_1 的副本并稍后删除 Values_1

此外,[State == "XX", State := "HI"] 仅在受影响的行中通过引用将 XX 替换为 HI,而

[, State := gsub('XX','HI', State)] 将替换整列。

data.table chaining 在适当的地方使用。

顺便说一句:我想知道为什么不能在第一条语句中立即用 HI 替换 XX

setDT(data)[, State := ifelse(rev(cumsum(rev(Values_1 + Values_2))), State, "HI"), ID][
  , Values_2 := NULL][]
setnames(data, "Values_1", "Values")

回答2

您可以使用括号符号 [ 进行链接。这样你只需要调用 setDT() 一次,因为你正在继续 data.table 宇宙中的所有操作,所以 data 不会停止成为 data.table。此外 setDT() 就地修改,因此它不需要分配(尽管通过管道将其返回 value 分配给 data 这也很好)。

首先定义数据并将其设为 data.table

library(data.table)
data <-
    data.frame(
        ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
        Period = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
        Values_1 = c(5, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0),
        Values_2 = c(5, 2, 0, 12, 2, 0, 0, 0, 0, 0, 0, 2, 4, 5, 6, 0),
        State = c("X0", "X1", "X2", "X1", "X0", "X2", "X0", "X0", "X2", "X1", "X9", "X3", "X2", "X1", "X9", "X3")
    ) |>
    setDT()

然后定义您需要的列。请注意 https://atrebas.github.io/post/2020-06-17-datatable-introduction/#computation-on-columns 的函数符号。

data[, `:=`(
    State = ifelse(
        rev(cumsum(rev(Values_1 + Values_2))),
        State, "XX"
    )
),
by = ID
][
    ,
    `:=`(
        Values = Values_1,
        Values_1 = NULL,
        Values_2 = NULL,
        State = gsub("XX", "HI", State)
    )
]

输出:

data
#     ID Period State Values
#  1:  1      1    X0      5
#  2:  1      2    X1      0
#  3:  1      3    X2      0
#  4:  1      4    X1      0
#  5:  2      1    X0      1
#  6:  2      2    HI      0
#  7:  2      3    HI      0
#  8:  2      4    HI      0
#  9:  3      1    X2      0
# 10:  3      2    X1      0
# 11:  3      3    X9      0
# 12:  3      4    X3      0
# 13:  4      1    X2      1
# 14:  4      2    X1      2
# 15:  4      3    X9      3
# 16:  4      4    HI      0

您可能想进一步阅读数据中的 https://atrebas.github.io/post/2020-06-17-datatable-introduction/#chaining-commands。我认为该页面是对包的语法和功能的极好总结,值得一读。

回答3

您可以使用 magrittr 包到 chaining 数据。tables 在 [ 之前使用 .。试试下面的代码:

library(dplyr)
library(magrittr)
library(data.table)
data <- 
  data.frame(
    ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
    Period = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
    Values_1 = c(5, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0),
    Values_2 = c(5, 2, 0, 12, 2, 0, 0, 0, 0, 0, 0, 2, 4, 5, 6, 0),
    State = c("X0","X1","X2","X1","X0","X2","X0","X0", "X2","X1","X9","X3", "X2","X1","X9","X3")
  ) %>% 
  setDT(data) %>%
  .[, State := ifelse(rev(cumsum(rev(Values_1 + Values_2))), State, "XX"), ID] %>%
  .[,Values := Values_1] %>%
  select(-c(Values_1, Values_2)) %>%
  mutate(State = gsub('XX','HI', State))

输出:

rn ID Period State Values
 1:  1  1      1    X0      5
 2:  2  1      2    X1      0
 3:  3  1      3    X2      0
 4:  4  1      4    X1      0
 5:  5  2      1    X0      1
 6:  6  2      2    HI      0
 7:  7  2      3    HI      0
 8:  8  2      4    HI      0
 9:  9  3      1    X2      0
10: 10  3      2    X1      0
11: 11  3      3    X9      0
12: 12  3      4    X3      0
13: 13  4      1    X2      1
14: 14  4      2    X1      2
15: 15  4      3    X9      3
16: 16  4      4    HI      0

相似文章

随机推荐

最新文章