python - 从 strings 中提取特定的 dates

我正在尝试从文本中提取一些特定的 dates 。文本如下所示:

'Shares of Luxury Goods Makers Slip on Russia Export Ban',
'By Investing.com\xa0-\xa0Mar 15, 2022 By Dhirendra Tripathi',
'Investing.com – Stocks of European retailers such as LVMH (PA:LVMH), Kering (PA:PRTP), H&M (ST:HMb), Moncler (MI:MONC) and Hermès (PA:HRMS) were all down around 4% Tuesday... ',
'',
'',
'',
' ',
'Europe Stocks Open Lower as Wider Sanctions, Covid Rebound Hit Mood',
'By Investing.com\xa0-\xa0Mar 15, 2022 By Geoffrey Smith\xa0',
'Investing.com -- European stock markets opened lower on Tuesday as a fresh round of EU sanctions, a rebound in Covid-19 cases and more signs of red-hot inflation all weighed on... ',
'',
'\xa0',

显然,在这个小片段中,id 只想提取:2022 年 3 月 15 日和 2022 年 3 月 15 日。

我尝试过:

datefinder.find_dates(text)

dateutil.parser

第一个返回我想要的所有 dates 以及大量不存在的其他内容。

第二个返回“String 不包含 date:”

谁能想到我能做到这一点的最好方法?

回答1

你可以使用正则表达式

import re

line = r'By Investing.com\xa0-\xa0Mar 15, 2022 By Geoffrey Smith\xa0'

re_results = re.findall(r'[A-Z][a-z]{2} \d{1,2}, \d{4}', line)

for result in re_results:
    print(result)

输出:

Mar 15, 2022

你可以在这里测试正则表达式 https://regexr.com/

回答2

使用 re

import re

x = ['Shares of Luxury Goods Makers Slip on Russia Export Ban',
     'By Investing.com\xa0-\xa0Mar 15, 2022 By Dhirendra Tripathi',
     'Investing.com – Stocks of European retailers such as LVMH (PA:LVMH), Kering (PA:PRTP), H&M (ST:HMb), Moncler (MI:MONC) and Hermès (PA:HRMS) were all down around 4% Tuesday... ',
     '',
     '',
     '',
     ' ',
     'Europe Stocks Open Lower as Wider Sanctions, Covid Rebound Hit Mood',
     'By Investing.com\xa0-\xa0Mar 15, 2022 By Geoffrey Smith\xa0',
     'Investing.com -- European stock markets opened lower on Tuesday as a fresh round of EU sanctions, a rebound in Covid-19 cases and more signs of red-hot inflation all weighed on... ',
     '',
     '\xa0', ]

for line in x:
    m = re.search(r'\w{3} \d{1,2}, \d{4}', line)
    if m:
        print(m.group())

输出:

Mar 15, 2022
Mar 15, 2022

请注意,这只会以 [3 letters] [1-2 numbers] [4 numbers] 的形式匹配 dates

相似文章

随机推荐

最新文章