我正在尝试创建一个 Django 网络应用程序,该应用程序允许用户上传 pdf 然后让脚本抓取它并输出并保存脚本抓取的某些文本。
我能够找到一些https://github.com/axelpale/minimal-django-file-upload-example。我有抓取 pdf 的脚本。不知道如何将它们联系在一起以完成此任务。
视图.py
from django.shortcuts import redirect, render
from .models import Document
from .forms import DocumentForm
def my_view(request):
print(f"Great! You're using Python 3.6+. If you fail here, use the right version.")
message = 'Upload PDF'
# Handle file upload
if request.method == 'POST':
form = DocumentForm(request.POST, request.FILES)
if form.is_valid():
newdoc = Document(docfile=request.FILES['docfile'])
newdoc.save()
# Redirect to the document list after POST
return redirect('my-view')
else:
message = 'The form is not valid. Fix the following error:'
else:
form = DocumentForm() # An empty, unbound form
# Load documents for the list page
documents = Document.objects.all()
# Render list page with the documents and the form
context = {'documents': documents, 'form': form, 'message': message}
return render(request, 'list.html', context)
表格.py
from django import forms
class DocumentForm(forms.Form):
docfile = forms.FileField(label='Select a file')
模型.py
from django.db import models
class Document(models.Model):
docfile = models.FileField(upload_to='documents/%Y/%m/%d')
列表.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>webpage</title>
</head>
<body>
<!-- Upload form. Note enctype attribute! -->
<form action="{% url "my-view" %}" method="post" enctype="multipart/form-data">
{% csrf_token %}
{{ message }}
<p>{{ form.non_field_errors }}</p>
<!-- Select a file: text -->
<p>{{ form.docfile.label_tag }} {{ form.docfile.help_text }}</p>
<!-- choose file button -->
<p>
{{ form.docfile.errors }}
{{ form.docfile }}
</p>
<!-- Upload button -->
<p><input type="submit" value="Upload"/></p>
</form>
</body>
编辑添加的 urls.py
urls.py
from django.urls import path
from .views import my_view
urlpatterns = [
path('', my_view, name='my-view')
]
Scrape.py
想要输出并保存Plan_Name。
import os
import pdfplumber
import re
directory = r'C:User/Ant_Esc/Desktop'
for filename in os.listdir(directory):
if filename.endswith('.pdf'):
fullpath = os.path.join(directory, filename)
#print(fullpath)
all_text = ""
with pdfplumber.open(fullpath) as pdf:
for page in pdf.pages:
text = page.extract_text()
#print(text)
all_text += ' ' + text
all_text = all_text.replace('\n','')
pattern ='Plan Title/Name .*? Program/Discipline'
Plan_Name = re.findall(pattern, all_text,re.DOTALL)
for i in Plan_Name:
Plan_Name = i.removesuffix('Program/Discipline')
Plan_Name = Plan_Name.removeprefix('Plan Title/Name ')
回答1
我已经浏览了您的代码,您可以确认以下两个查询吗?我觉得这些都不见了。
- 您对上述代码有任何错误吗?
- 在 URL.py 中添加 URL 条目
- 你从哪里调用scrap.py?
我的建议是您可以在 view.py newdoc.save() 中成功保存文件后调用 srap.py,或者您可以使用 super 方法从模型中调用 scrap.py。
如果您需要更多帮助,请告诉我。