这是我的数据框
df2 <- structure(list(Code = c("ICB-9_label_1", "1", "2", "3",
"4", "5", "1", "ICB-10_label_2", "3", "4", "5",
"1", "2", "3", "3", "5", "1", "2",
"3", "4", "5", "1", "2", "3", "4",
"5", "1", "2", "3", "4", "5", "1",
"2", "3", "4", "5", "1", "2", "3",
"4", "5", "1", "2", "3", "4", "5",
"1"), Description = c("description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here")), row.names = c(NA, -47L), class = c("tbl_df",
"tbl", "data.frame"))
这是 table 的样子:
Code Description
ICB-9_label_1 description here
1 description here
2 description here
3 description here
4 description here
5 description here
1 description here
ICB-10_label_2 description here
3 description here
4 description here
我想创建一个名为“标签”的第三列。它会一直说“ICB_9_label_1”,直到它到达“ICB_10_label_2”的行号,然后该列会一直说“ICB_10_label_2”。我不想覆盖第一列中的数字,因为 1, 2, 3, 4, 5 values 很重要。
回答1
有多种方法可以做到这一点。一种选择是提取具有“标签”的行,而其他人返回 NA 然后使用 fill
将 NA 元素更改为以前的非 NA value
library(dplyr)
library(tidyr)
library(stringr)
df2 <- df2 %>%
mutate(Labels = str_extract(Code, '.*label.*')) %>%
fill(Labels, .direction = 'downup')
-输出
df2
# A tibble: 47 × 3
Code Description Labels
<chr> <chr> <chr>
1 ICB-9_label_1 description here ICB-9_label_1
2 1 description here ICB-9_label_1
3 2 description here ICB-9_label_1
4 3 description here ICB-9_label_1
5 4 description here ICB-9_label_1
6 5 description here ICB-9_label_1
7 1 description here ICB-9_label_1
8 ICB-10_label_2 description here ICB-10_label_2
9 3 description here ICB-10_label_2
10 4 description here ICB-10_label_2
# … with 37 more rows
或者将 base R
与 grep
和 cumsum
一起使用
transform(df2, Labels = grep('label', Code,
value = TRUE)[cumsum(grepl('label', Code))])