python - Function 根据未给出预期结果的代码对部门进行分组 python

我有一个带有数字列的数据框。

示例数据框

Year Code Price
2022 530010 11728.7
2022 540060 4793.21
2022 514008 -15665.40
2022 540860 6991.10
2022 540060 1382.00

我遵循 function 将每个代码分配给类别以创建一个新列

function 你将代码存储到部门中

def department_group (Code): 
        if (Code == '514008') or (Code == '215080') or  (Code == '215980'):
            return 'Accounting and Administration'
        elif (Code == '515000') :
            return 'Customer Services'
        elif (Code == '540060') or (Code == '550010')  or (Code == '550012')  or (Code == '550028') :
            return 'Maintenance Department'
        elif (Code == '220000') or (Code == '220992') or (Code == '220095') :
            return 'Management'
        elif (Code == '550000')  or (Code == '550055') or (Code == '550060') or (Code == '550065') :
            return 'Marketing Department'
        elif ((Code == '530010') or (Code == '540860') or (Code == '560016') or  (Code == '570000')
          or (Code == '570010') or (Code == '570020')) :
            return 'Sales Department'            
        else:
            return Code

df['department'] = df['Code'].apply(department_group )
df['department'].value_counts()

但是,当我计算类别时,与我的代码计数不匹配。解决问题的任何建议/更好的方法。

提前致谢

回答1

我会将其更改为 np.select() 以使其更易于阅读

#Accounting and Administration
a_and_a = ['514008', '215080', '215980']
#Customer Services
customer_service = ['515000']
#Maintenance Department
m_d = ['540060', '550010', '550012', '550028']
#Management
management_lst = ['220000', '220992', '220095']
#Marketing Department
m_and_d = ['550000', '550055', '550060', '550065']
#Sales Department
s_d = ['530010', '540860', '560016', '570000', '570010', '570020']  

cond_lst = [df.Code.isin(a_and_a), df.Code.isin(customer_service), df.Code.isin(m_d), df.Code.isin(management_lst), df.Code.isin(m_and_d), df.Code.isin(s_d)]
choice_lst = ['Accounting and Administration', 'Customer Services', 'Maintenance Department', 'Management' , 'Marketing Department', 'Sales Department']
df['Department'] = np.select(cond_lst, choice_lst, 'Missing')
df

这很好,因为它易于扩展和组织。如果您更喜欢 def,那么这也是一种选择,但这可能有助于更容易阅读和解决您的问题。

相似文章

随机推荐

最新文章