我有一个 table,它显示了四个同事试图将多个对象分类为 a、b、c 或 d 的结果。如果同事能够就分类达成一致,或者如果只有一位同事能够对对象进行分类,那么我想在新列中显示同事的分类。如果同事不同意,我想创建一个单独的数据框来显示这些对象。对于每个对象,最多只分配两个同事尝试对其进行分类,因此不会出现三个同事无法就分类达成一致的情况。
如果只有一个同事能够识别它,那么很容易显示一个对象的分类,但是当有两个同事时我会很挣扎。鉴于我的菜鸟 python 技能,我只能做到以下几点。
我正在寻找的最终结果是第一行是“a”,第三行是“b”,第四行是“d”。第二行将由更有经验的同事挑选出来进行手动分类。
df_test = pd.DataFrame({'check1':['a','a','unknown','d'],
'check2':['unknown','b','unknown','unknown'],
'check3':['unknown','unknown','c','d'],
'check4':['unknown','unknown','c','unknown']})
cols = ['check_ind','check1_ind','check2_ind','check3_ind','check4_ind']
for col in cols:
df_test[col] = 0
checks = [('check1','check1_ind'),('check2','check2_ind'),('check3','check3_ind'),('check4','check4_ind')]
rows = df_test.shape[0]
for r in range(rows):
for c in checks:
if df_test.iloc[r, df_test.columns.get_loc(c[0])] != 'unknown':
df_test.iloc[r, df_test.columns.get_loc(c[1])] = 1
sumcolumn = df_test['check1_ind'] + df_test['check2_ind'] + df_test['check3_ind'] + df_test['check4_ind']
df_test['body_check'] = sumcolumn
回答1
df.replace('unknown', np.nan, inplace=True)
df.apply(lambda x: x.dropna().unique()[0] if x.nunique() == 1 else 'No Consensus', axis=1)
输出:
0 a
1 No Consensus
2 c
3 d
dtype: object
正在使用:
df['consensus'] = df.apply(lambda x: x.dropna().unique()[0] if x.nunique() == 1 else np.nan, axis=1)
print(df)
...
check1 check2 check3 check4 consensus
0 a NaN NaN NaN a
1 a b NaN NaN NaN
2 NaN NaN c c c
3 d NaN d NaN d
回答2
这样的事情应该可以解决问题:
def function(series):
val_counts = series.value_counts()
if val_counts.size > 1:
return 'No Consensus'
else:
return val_counts.index[0]
df_test.replace({'unknown': np.nan}).apply(function, axis=1)
回答3
要获得高效的矢量方法,请使用 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mode.html:
df2 = (df_test
.mask(df_test.eq('unknown'))
.mode(1)
# ensure having a "1" column
.reindex(columns=[0,1])
)
print(df2)
# 0 1
# 0 a NaN
# 1 a b
# 2 c NaN
# 3 d NaN
m = df2[1].notna()
df_test['consensus'] = df2[0].mask(m, 'No consensus')
print(df_test)
输出:
check1 check2 check3 check4 consensus
0 a unknown unknown unknown a
1 a b unknown unknown No consensus
2 unknown unknown c c c
3 d unknown d unknown d