所以我有一个问题,我的 .CSV 文件中的数据看起来像这样,这就是我需要它的方式,(唯一的区别是我需要 Time 和 Date 在 2 个单独的列中)。 (编辑)
这是我需要我的数据在 R Studio 中看起来像的目标格式,而不是将时间戳列分隔为 Date 和时间列。 (这是从EXCEL复制粘贴的)
timestamp axis1 axis2 axis3 VM standing stepping sitting
2022-03-17 11:29 0 0 0 0 0 0 1
2022-03-17 11:29 0 0 0 0 0 0 1
2022-03-17 11:29 0 0 0 0 0 0 1
2022-03-17 11:29 0 0 0 0 0 0 1
2022-03-17 11:29 0 0 0 0 0 0 1
2022-03-17 11:29 0 0 0 0 0 0 1
2022-03-17 11:29 0 0 0 0 0 0 1
但是我一开始导入的数据都集中在 1 列中,
data1 <- read_csv("AE1_07 DBG (2022-03-17)1sec.csv")
1
2022-03-17 11:29:00,0,0,0,0,0,0,1
2
2022-03-17 11:29:01,0,0,0,0,0,0,1
3
2022-03-17 11:29:02,0,0,0,0,0,0,1
4
2022-03-17 11:29:03,0,0,0,0,0,0,1
5
2022-03-17 11:29:04,0,0,0,0,0,0,1
6
2022-03-17 11:29:05,0,0,0,0,0,0,1
7
2022-03-17 11:29:06,0,0,0,0,0,0,1
8
2022-03-17 11:29:07,0,0,0,0,0,0,1
9
2022-03-17 11:29:08,0,0,0,0,0,0,1
10
2022-03-17 11:29:09,0,0,0,0,0,0,1
11
2022-03-17 11:29:10,0,0,0,0,0,0,1
12
2022-03-17 11:29:11,124,72,61,155.82,0,0,1
13
2022-03-17 11:29:12,195,121,101,250.73,0,0,1
14
2022-03-17 11:29:13,36,21,0,41.68,0,0,1
15
2022-03-17 11:29:14,72,184,91,217.53,0,0,1
16
2022-03-17 11:29:15,80,145,53,173.88,0,0,1
这是我在记事本中打开文件时的“原始”数据
timestamp,axis1,axis2,axis3,vectormagnitude,inclinestanding,inclinestepping,inclinesittinglying
2022-03-11 17:00:00,139,15,19,141.09,0,0,1
2022-03-11 17:00:01,183,15,10,183.89,0,0,1
2022-03-11 17:00:02,185,0,21,186.19,0,0,1
2022-03-11 17:00:03,184,12,20,185.47,0,0,1
2022-03-11 17:00:04,183,13,19,184.44,0,0,1
2022-03-11 17:00:05,179,12,21,180.63,0,0,1
回答1
如果您的第二个代码块是真正的原始文件,那么我们会遇到一些问题,因为在正确解析之前,交错的单数行将需要外部操作或 R 中性能较低的 readLines(.)
步骤。我将暂时跳过这种可能性......
如果第一个代码块是真正的原始文件,那么我们有两种可能性:
如果它真的是制表符分隔的(只有空格在 date 和时间之间的
timestamp
字段中),那么我们可以简单地使用read.table
。我将从一个主要包含标签的文件quux.tsv
开始。dat <- read.table("quux.tsv", header = TRUE, sep = "\t") dat # timestamp axis1 axis2 axis3 VM standing stepping sitting # 1 2022-03-17 11:29 0 0 0 0 0 0 1 # 2 2022-03-17 11:29 0 0 0 0 0 0 1 # 3 2022-03-17 11:29 0 0 0 0 0 0 1 # 4 2022-03-17 11:29 0 0 0 0 0 0 1 # 5 2022-03-17 11:29 0 0 0 0 0 0 1 # 6 2022-03-17 11:29 0 0 0 0 0 0 1 # 7 2022-03-17 11:29 0 0 0 0 0 0 1
我们可以使用以下方法拆分
timestamp
:dat$date <- sub(" .*", "", dat$timestamp) dat$time <- sub(".* ", "", dat$timestamp) dat$timestamp <- NULL dat # axis1 axis2 axis3 VM standing stepping sitting date time # 1 0 0 0 0 0 0 1 2022-03-17 11:29 # 2 0 0 0 0 0 0 1 2022-03-17 11:29 # 3 0 0 0 0 0 0 1 2022-03-17 11:29 # 4 0 0 0 0 0 0 1 2022-03-17 11:29 # 5 0 0 0 0 0 0 1 2022-03-17 11:29 # 6 0 0 0 0 0 0 1 2022-03-17 11:29 # 7 0 0 0 0 0 0 1 2022-03-17 11:29
但是,如果它是(多)空格分隔的,我将使用一个没有制表符的
quux.txt
文件,只有在您的第一个块中的多个空格。dat <- fread("quux.txt") # Warning in fread("quux.txt") : # Detected 8 column names but the data has 9 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file. dat # V1 timestamp axis1 axis2 axis3 VM standing stepping sitting # <IDat> <char> <int> <int> <int> <int> <int> <int> <int> # 1: 2022-03-17 11:29 0 0 0 0 0 0 1 # 2: 2022-03-17 11:29 0 0 0 0 0 0 1 # 3: 2022-03-17 11:29 0 0 0 0 0 0 1 # 4: 2022-03-17 11:29 0 0 0 0 0 0 1 # 5: 2022-03-17 11:29 0 0 0 0 0 0 1 # 6: 2022-03-17 11:29 0 0 0 0 0 0 1 # 7: 2022-03-17 11:29 0 0 0 0 0 0 1
那个警告有点烦人,但在这种情况下,它可以被忽略,甚至可以使用
suppressWarnings(fread("quux.txt"))
抑制。从那里,它已经拆分了timestamp
(方便,但表明格式存在“问题”,因此警告),您只需要重命名列。另一种方法使用单行预处理以在标题行中拆分单词
timestamp
以便大多数解析器可以正确读取它。Sys.which("sed") # sed # "c:\\rtools40\\usr\\bin\\sed.exe" system("sed -i -e '1 s/timestamp/time stamp/' quux.txt") # [1] 0
(大多数 unix-y 操作系统在路径中有
sed
,而 windows 则凭借此优点安装 Rtools ...您的位置可能会有所不同。)细分:
-i -e '1 s/timestamp/time stamp/' ^ edit/change in-place ^ evaluate the next expression ^ this expression only on line 1 ^ `-------' `--------' search/replace
每个文件只需执行一次;再次运行是无害的,但是应该没有变化,因为
timestamp
已经被改变了,再也找不到了。从这里,大多数读者会看到它就好了,例如
read.table("quux.txt", header = TRUE) # time stamp axis1 axis2 axis3 VM standing stepping sitting # 1 2022-03-17 11:29 0 0 0 0 0 0 1 # 2 2022-03-17 11:29 0 0 0 0 0 0 1 # 3 2022-03-17 11:29 0 0 0 0 0 0 1 # 4 2022-03-17 11:29 0 0 0 0 0 0 1 # 5 2022-03-17 11:29 0 0 0 0 0 0 1 # 6 2022-03-17 11:29 0 0 0 0 0 0 1 # 7 2022-03-17 11:29 0 0 0 0 0 0 1 readr::read_table("quux.txt") # -- Column specification ---------------------------------------------------------------------------------------------------------- # cols( # time = col_date(format = ""), # stamp = col_time(format = ""), # axis1 = col_double(), # axis2 = col_double(), # axis3 = col_double(), # VM = col_double(), # standing = col_double(), # stepping = col_double(), # sitting = col_double() # ) # # A tibble: 7 x 9 # time stamp axis1 axis2 axis3 VM standing stepping sitting # <date> <time> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 2022-03-17 11:29 0 0 0 0 0 0 1 # 2 2022-03-17 11:29 0 0 0 0 0 0 1 # 3 2022-03-17 11:29 0 0 0 0 0 0 1 # 4 2022-03-17 11:29 0 0 0 0 0 0 1 # 5 2022-03-17 11:29 0 0 0 0 0 0 1 # 6 2022-03-17 11:29 0 0 0 0 0 0 1 # 7 2022-03-17 11:29 0 0 0 0 0 0 1