r - R 创建创建标签或类别的新列,直到它到达新标签

这是我的数据框

df2 <- structure(list(Code = c("ICB-9_label_1", "1", "2", "3", 
"4", "5", "1", "ICB-10_label_2", "3", "4", "5", 
"1", "2", "3", "3", "5", "1", "2", 
"3", "4", "5", "1", "2", "3", "4", 
"5", "1", "2", "3", "4", "5", "1", 
"2", "3", "4", "5", "1", "2", "3", 
"4", "5", "1", "2", "3", "4", "5", 
"1"), Description = c("description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here")), row.names = c(NA, -47L), class = c("tbl_df", 
"tbl", "data.frame"))

这是 table 的样子:

Code             Description
ICB-9_label_1     description here          
1                 description here          
2                 description here          
3                 description here          
4                 description here          
5                 description here          
1                 description here          
ICB-10_label_2    description here          
3                 description here          
4                 description here

我想创建一个名为“标签”的第三列。它会一直说“ICB_9_label_1”,直到它到达“ICB_10_label_2”的行号,然后该列会一直说“ICB_10_label_2”。我不想覆盖第一列中的数字,因为 1, 2, 3, 4, 5 values 很重要。

回答1

有多种方法可以做到这一点。一种选择是提取具有“标签”的行,而其他人返回 NA 然后使用 fill 将 NA 元素更改为以前的非 NA value

library(dplyr)
library(tidyr)
library(stringr)
df2 <- df2 %>% 
  mutate(Labels = str_extract(Code, '.*label.*')) %>% 
  fill(Labels, .direction = 'downup')

-输出

df2
# A tibble: 47 × 3
   Code           Description      Labels        
   <chr>          <chr>            <chr>         
 1 ICB-9_label_1  description here ICB-9_label_1 
 2 1              description here ICB-9_label_1 
 3 2              description here ICB-9_label_1 
 4 3              description here ICB-9_label_1 
 5 4              description here ICB-9_label_1 
 6 5              description here ICB-9_label_1 
 7 1              description here ICB-9_label_1 
 8 ICB-10_label_2 description here ICB-10_label_2
 9 3              description here ICB-10_label_2
10 4              description here ICB-10_label_2
# … with 37 more rows

或者将 base Rgrepcumsum 一起使用

transform(df2, Labels = grep('label', Code, 
       value = TRUE)[cumsum(grepl('label', Code))])

相似文章