python - 使用 pandas 的内置函数为 09:00-10:00/11:00-14:00/15:00-18:00 等字符串间隔生成 date_range

我一直在阅读论坛,在互联网上调查。但无法弄清楚如何应用 pandas 函数来恢复整个代码:

def get_time_and_date(schedule, starting_date, position):
    # calculate time and date for each start and ending time if the ending time < starting time, add one day to the ending.
    my_time = datetime.strptime(schedule.split('-')[position], '%H:%M')
    my_date = datetime.strptime(starting_date, '%Y-%m-%d')
    # get the starting hour for the range if we are calculating last interval
    if position == 1: 
        starting_hour = datetime.strptime(schedule.split('-')[0], '%H:%M')
        starting_hour = datetime(my_date.year, my_date.month, my_date.day, starting_hour.hour, 0)
    # convert unify my_time and my_date normalizing the minutes
    if hora.minute >= 30:
        my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, my_time.hour, 30)
    else:
        my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, hora.hour, 0)
    # if final time of the day < than starting time, means there is a day jump, so we add a day
    if position == 1 and my_hour_and_date < starting_hour: my_hour_and_date += timedelta(days=1)
    return my_hour_and_date


def get_time_interval_ranges(schedule, my_date):
    # get all match schedules if there are any
    schedules = schedule.split('/')
    intervals_list = []        
    # loop through al the schedules and add the range with the split separator "Separa aquí"
    for my_schedule in schedules:
        current_range = pd.date_range(start=get_time_and_date(my_schedule, my_date, 0), end=get_time_and_date(my_schedule, my_date, 1), freq="30min").strftime('%Y-%m-%d, %H:%M').to_list()        
        intervals_list += current_range
        intervals_list.append('separate_range_here')  
    return intervals_list


def generate_time_intervals(df, column_to_process, new_column):
    #generate range of times column
    df[new_column] = df.apply(lambda row: get_time_interval_ranges(row[column_to_process], row['my_date'], True), axis=1)
    return df

我相信有更好的方法可以做到这一点,但我不知道如何。我给第一个函数(generate_time_intervals) 的是一个带有一些列的数据框,但只有 Date (yyyy-mm-dd) 和时间表很重要。

当时间表是 09:00-15:00 时,这很容易,只需用“-”分隔并将其提供给内置函数 data_range。问题在于处理可怕的时间,例如标题上的时间或 09:17-16:24 之类的时间。

有没有什么办法可以在没有太多循环和我的代码中的排序的情况下处理这个问题?

编辑:

使用此输入:

Worker Date Schedule
Worker1 2022-05-01 09:00-10:00/11:00-14:00/15:00-18:00
Worker2 2022-05-01 09:37-15:38

我想要这个输出:

Date Interval Working Minutes
2022-05-01 09:00 30
2022-05-01 09:30 53
2022-05-01 10:00 30
2022-05-01 10:30 30
2022-05-01 11:00 60
2022-05-01 11:30 60
2022-05-01 12:00 60
2022-05-01 12:30 60
2022-05-01 13:00 60
2022-05-01 13:30 60
2022-05-01 14:00 30
2022-05-01 14:30 30
2022-05-01 15:00 60
2022-05-01 15:30 38
2022-05-01 16:00 30
2022-05-01 16:30 30
2022-05-01 17:00 30
2022-05-01 17:30 30
2022-05-01 18:00 0

回答1

使用日期时间:

df= pd.DataFrame({'schedule':['09:17-16:24','19:40-21:14']})
schedules = df.schedule.str.split('-',expand=True)
start = pd.to_datetime(schedules[0]).dt.round('H')
end = pd.to_datetime(schedules[1]).dt.round('H')

df['interval_out'] = start.dt.hour.astype(str) + ':00 - ' + end.dt.hour.astype(str) + ':00'

结果:

>>> df
      schedule
0  09:17-16:24
1  19:40-21:14

>>> schedules
       0      1
0  09:17  16:24
1  19:40  21:14

>>> start
0   2022-05-18 09:00:00
1   2022-05-18 20:00:00
Name: 0, dtype: datetime64[ns]
>>> end
0   2022-05-18 16:00:00
1   2022-05-18 21:00:00
Name: 1, dtype: datetime64[ns]

>>> df
      schedule   interval_out
0  09:17-16:24   9:00 - 16:00
1  19:40-21:14  20:00 - 21:00
>>>

当然,如果你想扩大它,四舍五入应该是地板和天花板......

编辑:尝试最初的问题......如果您阅读 Pandas 中的日期时间函数(现在我学到了......),它也会有所帮助:facepalm:

  • 将块扩展为单个项目开始/停止
  • 地板/天花板它们用于启动/停止
  • 使用方便的 pandas 函数计算间隔...
  • 将间隔分解为行
  • 计算最晚开始
  • 计算很快停止
  • 计算办公室里实际有多少人
  • 在插槽上分组数据,添加丢失的分钟数和工作分钟数 * 工人
  • 进行计算
df['timeblocks']= df.Schedule.str.split('/')
df2 = df.explode('timeblocks')
timeblocks = df2.timeblocks.str.split('-',expand=True)
df2['start'] = pd.to_datetime(df2.Date + " " + timeblocks[0])
df2['stop'] = pd.to_datetime(df2.Date + " " + timeblocks[1])
df2['start_slot'] = df2.start.dt.floor('30min')
df2['stop_slot'] = df2.stop.dt.ceil('30min')

df2['intervals'] = df2.apply(lambda x: pd.date_range(x.start_slot, x.stop_slot, freq='30min'), axis=1)
df3 = df2.explode('intervals')
df3['late_start'] = (df3.start>df3.intervals)*(df3.start-df3.intervals).dt.seconds/60
df3['soon_stop']= ((df3.stop>df3.intervals) & (df3.stop<(df3.intervals+pd.Timedelta('30min'))))*((df3.intervals+pd.Timedelta('30min'))-df3.stop).dt.seconds/60
df3['someone'] =  (df3.start<df3.intervals+pd.Timedelta('30min'))&(df3.stop>df3.intervals)#+pd.Timedelta('30min'))
df4 = df3.groupby('intervals').agg({'late_start':sum, 'soon_stop':sum, 'someone':sum})
df4['worked_time'] = df4.someone*30 - df4.late_start - df4.soon_stop

df4

>>> df4
                     late_start  soon_stop  someone  worked_time
intervals
2022-05-01 09:00:00         0.0        0.0        1         30.0
2022-05-01 09:30:00         7.0        0.0        2         53.0
2022-05-01 10:00:00         0.0        0.0        1         30.0
2022-05-01 10:30:00         0.0        0.0        1         30.0
2022-05-01 11:00:00         0.0        0.0        2         60.0
2022-05-01 11:30:00         0.0        0.0        2         60.0
2022-05-01 12:00:00         0.0        0.0        2         60.0
2022-05-01 12:30:00         0.0        0.0        2         60.0
2022-05-01 13:00:00         0.0        0.0        2         60.0
2022-05-01 13:30:00         0.0        0.0        2         60.0
2022-05-01 14:00:00         0.0        0.0        1         30.0
2022-05-01 14:30:00         0.0        0.0        1         30.0
2022-05-01 15:00:00         0.0        0.0        2         60.0
2022-05-01 15:30:00         0.0       22.0        2         38.0
2022-05-01 16:00:00         0.0        0.0        1         30.0
2022-05-01 16:30:00         0.0        0.0        1         30.0
2022-05-01 17:00:00         0.0        0.0        1         30.0
2022-05-01 17:30:00         0.0        0.0        1         30.0
2022-05-01 18:00:00         0.0        0.0        0          0.0

相似文章

最新文章