python - 将多列拆分为多列,pandas

我有一个数据框

df = pd.DataFrame({'≤8': {1: '3687 55.5', 2: '838 66.5', 3: '8905 66.9'},
 '9–13': {1: '2234 33.6', 2: '419 33.3', 3: '3362 25.2'},
 '14–15': {1: '290 4.4', 2: nan, 3: '473 3.6'},
 '16–17': {1: '194 2.9', 2: nan, 3: '252 1.9'},
 '18–20': {1: '185 2.8', 2: nan, 3: '184 1.4'},
 '≥21': {1: '52 0.8', 2: '0 0.0', 3: '144 1.1'}})

          ≤8       9–13    14–15    16–17    18–20      ≥21
1  3687 55.5  2234 33.6  290 4.4  194 2.9  185 2.8   52 0.8
2   838 66.5   419 33.3      NaN      NaN      NaN    0 0.0
3  8905 66.9  3362 25.2  473 3.6  252 1.9  184 1.4  144 1.1

我想把所有的列分成两列,所以有一个 int 列和一个 float 列(注意,我不在乎按类型分开,类型只是巧合)。我做到了拆分列,但我不知道如何将结果列表分配到新列中。我还想尽可能保持它为 pythonic/pandonic,所以我不想单独遍历每一列。

rev = gestation_cols.apply(lambda x: pd.Series([i for i in x.str.split(' ')]))

             ≤8          9–13       14–15       16–17       18–20         ≥21
0  [3687, 55.5]  [2234, 33.6]  [290, 4.4]  [194, 2.9]  [185, 2.8]   [52, 0.8]
1   [838, 66.5]   [419, 33.3]         NaN         NaN         NaN    [0, 0.0]
2  [8905, 66.9]  [3362, 25.2]  [473, 3.6]  [252, 1.9]  [184, 1.4]  [144, 1.1]
3  [1559, 48.6]  [1075, 33.5]  [209, 6.5]  [165, 5.1]  [173, 5.4]   [26, 0.8]

编辑:为清楚起见,我不想拆分单个列或单独拆分每一列。我知道我可以一一创建新列,这简直是不好的做法。我希望每一列都分成两列。

回答1

对于矢量版本,您可以暂时 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html 并使用 https://pandas.pydata.org/docs/reference/api/pandas.Series.str.split.html

df.stack().str.split().unstack()

输出:

≤8          9–13       14–15       16–17       18–20         ≥21
1  [3687, 55.5]  [2234, 33.6]  [290, 4.4]  [194, 2.9]  [185, 2.8]   [52, 0.8]
2   [838, 66.5]   [419, 33.3]         NaN         NaN         NaN    [0, 0.0]
3  [8905, 66.9]  [3362, 25.2]  [473, 3.6]  [252, 1.9]  [184, 1.4]  [144, 1.1]

要让多个列使用 expand=True 参数,您最终会得到一个 MultiIndex,然后您可以根据需要对其进行修改:

df.stack().str.split(expand=True).unstack()

输出:

0                                  1                             
     ≤8  9–13 14–15 16–17 18–20  ≥21    ≤8  9–13 14–15 16–17 18–20  ≥21
1  3687  2234   290   194   185   52  55.5  33.6   4.4   2.9   2.8  0.8
2   838   419   NaN   NaN   NaN    0  66.5  33.3   NaN   NaN   NaN  0.0
3  8905  3362   473   252   184  144  66.9  25.2   3.6   1.9   1.4  1.1

重组订单

(df.stack()
   .str.split(expand=True)
   .unstack()
   .swaplevel(axis=1)
   [df.columns]
)

输出:

≤8        9–13       14–15      16–17      18–20       ≥21     
      0     1     0     1     0    1     0    1     0    1    0    1
1  3687  55.5  2234  33.6   290  4.4   194  2.9   185  2.8   52  0.8
2   838  66.5   419  33.3   NaN  NaN   NaN  NaN   NaN  NaN    0  0.0
3  8905  66.9  3362  25.2   473  3.6   252  1.9   184  1.4  144  1.1

回答2

如果您遍历它应该会更快(您正在遍历列,这与行循环相比是可以的):

pd.concat({key : value.str.split(expand = True) 
           for key, value in df.items()}, 
           axis = 1)

     ≤8        9–13       14–15      16–17      18–20       ≥21
      0     1     0     1     0    1     0    1     0    1    0    1
1  3687  55.5  2234  33.6   290  4.4   194  2.9   185  2.8   52  0.8
2   838  66.5   419  33.3   NaN  NaN   NaN  NaN   NaN  NaN    0  0.0
3  8905  66.9  3362  25.2   473  3.6   252  1.9   184  1.4  144  1.1

当然,只有测试才能知道——当你可以在宽格式中运行该过程时,翻转到长(堆栈)然后回到宽(取消堆栈)感觉有点贵。