我有以下输入
| ID | Column1 | Column2 | Value |
|:-- |:-------:|:-------:| -----:|
| A | x | a | 4 |
| A | x | b | 1 |
| A | y | a | 5 |
| A | y | b | 6 |
| B | u | a | 9 |
| B | u | b | 2 |
| B | v | a | 7 |
| B | v | b | 4 |
| B | w | a | 1 |
| B | w | b | 3 |
我希望得到以下输出,其中输出中的 values 将添加每个 ID 的输入中可能的第 1 列和第 2 列的 combinations,类似于下面
| ID | Column1 | Value |
|:-- |:--------------:| -----------:|
| A | xa+ya | 4+5 |
| A | xa+yb | 4+6 |
| A | xb+ya | 1+5 |
| A | xb+yb | 1+6 |
| B | ua+va+wa | 9+7+1 |
| B | ua+va+wb | 9+7+3 |
| B | ua+vb+wa | 9+4+1 |
| B | ua+vb+wb | 9+4+3 |
| B | ub+va+wa | 2+7+1 |
| B | ub+va+wb | 2+7+3 |
| B | ub+vb+wa | 2+4+1 |
| B | ub+vb+wb | 2+4+3 |
回答1
combinations 的逻辑不是很清楚。也许这有帮助 - paste
(str_c
来自 stringr
),'Column1','Column2' 到单个列,然后按 'ID' 分组,summarise
'ColumnA' 与 combn
ations 到paste
元素,同时从 Value
列获取 sum
,然后 filter
根据 'Column1' 中唯一的 values 取出不需要的 combinations
library(dplyr)
library(stringr)
df1 %>%
mutate(ColumnA = str_c(Column1, Column2)) %>%
group_by(ID) %>%
summarise(ColumnA = combn(ColumnA, n_distinct(Column1),
FUN = str_c, collapse = "+"),
Value = combn(Value, n_distinct(Column1), FUN = sum), .groups = 'drop') %>%
mutate(new1 = str_remove_all(ColumnA, ".(?=(\\+|$))|\\+"),
new2 = str_replace_all(new1, "(.)\\1+", "\\1")) %>%
filter(nchar(new2) == nchar(new1)) %>%
select(ID, Column1 = ColumnA, Value)
-输出
# A tibble: 12 × 3
ID Column1 Value
<chr> <chr> <int>
1 A xa+ya 9
2 A xa+yb 10
3 A xb+ya 6
4 A xb+yb 7
5 B ua+va+wa 17
6 B ua+va+wb 19
7 B ua+vb+wa 14
8 B ua+vb+wb 16
9 B ub+va+wa 10
10 B ub+va+wb 12
11 B ub+vb+wa 7
12 B ub+vb+wb 9
或者另一种选择是
library(tidyr)
df1 %>%
mutate(ColumnA = str_c(Column1, Column2)) %>%
group_by(ID) %>%
group_modify(~ bind_cols(
split(.x$ColumnA, .x$Column1) %>%
expand.grid %>%
unite(ColumnA, everything(), sep = "+"),
Value = split(.x$Value, .x$Column1) %>%
expand.grid %>%
rowSums(.)) ) %>%
ungroup
-输出
# A tibble: 12 × 3
ID ColumnA Value
<chr> <chr> <dbl>
1 A xa+ya 9
2 A xb+ya 6
3 A xa+yb 10
4 A xb+yb 7
5 B ua+va+wa 17
6 B ub+va+wa 10
7 B ua+vb+wa 14
8 B ub+vb+wa 7
9 B ua+va+wb 19
10 B ub+va+wb 12
11 B ua+vb+wb 16
12 B ub+vb+wb 9
数据
df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "B",
"B", "B"), Column1 = c("x", "x", "y", "y", "u", "u", "v", "v",
"w", "w"), Column2 = c("a", "b", "a", "b", "a", "b", "a", "b",
"a", "b"), Value = c(4L, 1L, 5L, 6L, 9L, 2L, 7L, 4L, 1L, 3L)),
row.names = c(NA,
-10L), class = "data.frame")