Skip to content

Commit f419be2

Browse files
Created Python Script to Extract text from a PDF
1 parent 4157b3f commit f419be2

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

Diff for: extract_text_from_pdf/extract_text_from_pdf.py

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# import module PyPDF2
2+
import PyPDF2
3+
# put 'example.pdf' in working directory
4+
# and open it in read binary mode
5+
pdfFileObj = open('example.pdf', 'rb')
6+
# call and store PdfFileReader
7+
# object in pdfReader
8+
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
9+
# to print the total number of pages in pdf
10+
# print(pdfReader.numPages)
11+
# get specific page of pdf by passing
12+
# number since it stores pages in list
13+
# to access first page pass 0
14+
pageObj = pdfReader.getPage(0)
15+
# extract the page object
16+
# by extractText() function
17+
texts = pageObj.extractText()
18+
# print the extracted texts
19+
print(texts)

0 commit comments

Comments
 (0)