原始话单是这样的:
USERID | STARTTIME | ENDTIME | SERVICETYPE | CHANNELCODE | PROGRAMNAME |
---|---|---|---|---|---|
xxxxxxxxxxxxxxxxx1 | 2021-05-24 19:52:28 | 2021-05-24 23:56:27 | 1 | 精灵宝可梦 |
老板让分析每个用户各个时段都在看啥,所以我想吧话单拆分成这样:
USERID | STARTTIME | ENDTIME | SERVICETYPE | CHANNELCODE | PROGRAMNAME | PERIODTIME |
---|---|---|---|---|---|---|
xxxxxxxxxxxxxxxxx1 | 2021-05-24 19:52:28 | 2021-05-24 20:00:00 | 1 | 精灵宝可梦 | 2021-05-24 19:00:00 | |
xxxxxxxxxxxxxxxxx1 | 2021-05-24 20:00:00 | 2021-05-24 21:00:00 | 1 | 精灵宝可梦 | 2021-05-24 20:00:00 | |
xxxxxxxxxxxxxxxxx1 | 2021-05-24 21:00:00 | 2021-05-24 22:00:00 | 1 | 精灵宝可梦 | 2021-05-24 21:00:00 | |
xxxxxxxxxxxxxxxxx1 | 2021-05-24 22:00:00 | 2021-05-24 23:00:00 | 1 | 精灵宝可梦 | 2021-05-24 22:00:00 | |
xxxxxxxxxxxxxxxxx1 | 2021-05-24 23:00:00 | 2021-05-24 23:56:27 | 1 | 精灵宝可梦 | 2021-05-24 23:00:00 |
目前的方法是根据起止时间生成时间序列,然后 for 循环生成新的行,再拼接成一个新的 dataframe
# split_data ... for i in range(0, len_date-1): df_y['Period'] = date_rng[i] df_y['EndTime'] = date_rng[i+1] df_y['StartHour'] = date_rng[i] df_y['EndHour'] = date_rng[i+1] df_x = df_x.append(df_y) ...
主函数还要写个 for 循环遍历整个话单
for i in range(len(df_tmp)): df_x = split_data(df_tmp.iloc[i]) df_t = df_t.append(df_x) df_t
这样能获得想要的结果,但话单太多了,跑起来没完没了……
有没有更好的方法能提高下效率?
![]() | 1 swulling 2021-08-12 10:54:17 +08:00 把话单拆分,多进程跑,不够就多加机器 |
2 princelai 2021-08-12 16:19:12 +08:00 ![]() |