我有一个带有数字列的数据框。
示例数据框
Year | Code | Price |
---|---|---|
2022 | 530010 | 11728.7 |
2022 | 540060 | 4793.21 |
2022 | 514008 | -15665.40 |
2022 | 540860 | 6991.10 |
2022 | 540060 | 1382.00 |
我遵循 function 将每个代码分配给类别以创建一个新列
function 你将代码存储到部门中
def department_group (Code):
if (Code == '514008') or (Code == '215080') or (Code == '215980'):
return 'Accounting and Administration'
elif (Code == '515000') :
return 'Customer Services'
elif (Code == '540060') or (Code == '550010') or (Code == '550012') or (Code == '550028') :
return 'Maintenance Department'
elif (Code == '220000') or (Code == '220992') or (Code == '220095') :
return 'Management'
elif (Code == '550000') or (Code == '550055') or (Code == '550060') or (Code == '550065') :
return 'Marketing Department'
elif ((Code == '530010') or (Code == '540860') or (Code == '560016') or (Code == '570000')
or (Code == '570010') or (Code == '570020')) :
return 'Sales Department'
else:
return Code
df['department'] = df['Code'].apply(department_group )
df['department'].value_counts()
但是,当我计算类别时,与我的代码计数不匹配。解决问题的任何建议/更好的方法。
提前致谢
回答1
我会将其更改为 np.select() 以使其更易于阅读
#Accounting and Administration
a_and_a = ['514008', '215080', '215980']
#Customer Services
customer_service = ['515000']
#Maintenance Department
m_d = ['540060', '550010', '550012', '550028']
#Management
management_lst = ['220000', '220992', '220095']
#Marketing Department
m_and_d = ['550000', '550055', '550060', '550065']
#Sales Department
s_d = ['530010', '540860', '560016', '570000', '570010', '570020']
cond_lst = [df.Code.isin(a_and_a), df.Code.isin(customer_service), df.Code.isin(m_d), df.Code.isin(management_lst), df.Code.isin(m_and_d), df.Code.isin(s_d)]
choice_lst = ['Accounting and Administration', 'Customer Services', 'Maintenance Department', 'Management' , 'Marketing Department', 'Sales Department']
df['Department'] = np.select(cond_lst, choice_lst, 'Missing')
df
这很好,因为它易于扩展和组织。如果您更喜欢 def,那么这也是一种选择,但这可能有助于更容易阅读和解决您的问题。