python - Django 上传 pdf 然后运行脚本来抓取 pdf 并输出结果

我正在尝试创建一个 Django 网络应用程序,该应用程序允许用户上传 pdf 然后让脚本抓取它并输出并保存脚本抓取的某些文本。

我能够找到一些https://github.com/axelpale/minimal-django-file-upload-example。我有抓取 pdf 的脚本。不知道如何将它们联系在一起以完成此任务。

视图.py

from django.shortcuts import redirect, render
from .models import Document
from .forms import DocumentForm

def my_view(request):
    print(f"Great! You're using Python 3.6+. If you fail here, use the right version.")
    message = 'Upload PDF'
    # Handle file upload
    if request.method == 'POST':
        form = DocumentForm(request.POST, request.FILES)
        if form.is_valid():
            newdoc = Document(docfile=request.FILES['docfile'])
            newdoc.save()

        # Redirect to the document list after POST
        return redirect('my-view')
    else:
        message = 'The form is not valid. Fix the following error:'
else:
    form = DocumentForm()  # An empty, unbound form

# Load documents for the list page
documents = Document.objects.all()

# Render list page with the documents and the form
context = {'documents': documents, 'form': form, 'message': message}
return render(request, 'list.html', context)

表格.py

from django import forms

class DocumentForm(forms.Form):
    docfile = forms.FileField(label='Select a file')

模型.py

from django.db import models

class Document(models.Model):
    docfile = models.FileField(upload_to='documents/%Y/%m/%d')

列表.html

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
<title>webpage</title>
    </head>
<body>
    <!-- Upload form. Note enctype attribute! -->
    <form action="{% url "my-view" %}" method="post" enctype="multipart/form-data">
        {% csrf_token %}
        {{ message }}
        <p>{{ form.non_field_errors }}</p>

        <!-- Select a file: text -->
        <p>{{ form.docfile.label_tag }} {{ form.docfile.help_text }}</p>

        <!-- choose file button -->
        <p>
            {{ form.docfile.errors }}
            {{ form.docfile }}
        </p>

        <!-- Upload button -->

        <p><input type="submit" value="Upload"/></p>
    </form>
</body>

编辑添加的 urls.py

urls.py

from django.urls import path
from .views import my_view

urlpatterns = [
    path('', my_view, name='my-view')
]

Scrape.py

想要输出并保存Plan_Name。

import os
import pdfplumber
import re
directory = r'C:User/Ant_Esc/Desktop'

for filename in os.listdir(directory):
    if filename.endswith('.pdf'):
        fullpath = os.path.join(directory, filename)
        #print(fullpath)
        all_text = ""
        with pdfplumber.open(fullpath) as pdf:
            for page in pdf.pages:
                text = page.extract_text()
                #print(text)
                all_text += ' ' + text
                all_text = all_text.replace('\n','')
            pattern ='Plan Title/Name  .*? Program/Discipline'
            Plan_Name = re.findall(pattern, all_text,re.DOTALL)
            for i in Plan_Name:
                Plan_Name = i.removesuffix('Program/Discipline')
                Plan_Name = Plan_Name.removeprefix('Plan Title/Name  ')

回答1

我已经浏览了您的代码,您可以确认以下两个查询吗?我觉得这些都不见了。

  1. 您对上述代码有任何错误吗?
  2. 在 URL.py 中添加 URL 条目
  3. 你从哪里调用scrap.py?

我的建议是您可以在 view.py newdoc.save() 中成功保存文件后调用 srap.py,或者您可以使用 super 方法从模型中调用 scrap.py。

如果您需要更多帮助,请告诉我。

相似文章