How to Extract Text from a PDF (Copy, Paste, Export)
Copy text from a PDF for editing, quoting, or translation. Works on normal and scanned PDFs.
Why extract text from a PDF?
PDFs are designed to look the same everywhere, which makes them perfect for sharing and printing. But that visual fidelity hides a problem: the text inside a PDF is often not really text. It can be a vector path that looks like the letter "A" but contains no letter "A" the computer can copy. Or it can be a flat image of a page that contains text a scanner saw but a machine cannot read.
When you need to actually do something with the content β quote it in an email, paste it into a translator, edit it in a document, or search through 200 pages for one paragraph β you need to extract the text. This guide shows you how.
The reasons people need to extract text from a PDF are practical and frequent:
- Quote a passage: Copy a paragraph from a research paper into an email without retyping it.
- Translate a document: Paste a foreign-language PDF into a translation tool. Translation engines need plain text.
- Edit and reformat: Pull text out of a PDF into Word or Google Docs to fix typos or update wording.
- Search inside scanned PDFs: A scanned book is a 300-page image. Extracting text with OCR makes it searchable.
- Repurpose content: Grab a chapter from an old PDF and put it in a new report.
- Data extraction: Pull tabular data out of a PDF report into a spreadsheet.
The challenge: PDFs come in two flavors, and they need different tools.
- Text-based PDFs (most modern PDFs): The text is real text, encoded in the file. Extracting is a fast, lossless operation.
- Scanned PDFs (images of pages): The text is just pixels. You need OCR (Optical Character Recognition) to read it.
Method 1: Use UtilBoxx's Free PDF Text Extractor (Recommended)
The fastest, safest, and most private way to extract text is UtilBoxx's PDF Extract Text tool. It runs entirely in your browser, handles both text-based and scanned PDFs, and never sends your file to a server.
Here is how to use it:
- Go to utilboxx.com/en/tools/pdf/extract-text
- Click the upload area and select your PDF (or drag and drop)
- The tool detects whether your PDF has embedded text or is a scanned image
- For text PDFs, it copies the text directly. For scanned PDFs, it runs OCR in your browser.
- Copy the result to your clipboard, or download it as a .txt file
Why we recommend this method:
- 100% free, no account, no signup, no email gate
- Privacy-first: everything happens locally in your browser. The file never reaches a server.
- Handles both kinds of PDF: text-based and scanned (with OCR)
- Works on any device: Windows, Mac, Linux, ChromeOS, iOS, Android
- No watermarks, no daily limits
- Fast: text-based extraction is near-instant. OCR runs at a few seconds per page.
If you need to grab text out of a PDF β once in a while or all day long β this is the most flexible tool you can use without installing anything.
Method 2: Adobe Acrobat Pro (Paid)
Adobe Acrobat Pro is the heavyweight of the PDF world. Its "Export PDF" tool lets you convert a PDF to Word, Excel, plain text, or a variety of other formats. For text-based PDFs, the export is clean. For scanned PDFs, Acrobat runs a high-quality OCR engine that recognizes dozens of languages and preserves layout reasonably well.
The catch is the price. Acrobat Pro costs roughly $19.99 per month (about $240 per year) on a subscription. For a one-off text extraction, that is a poor trade. You also need a desktop install, which can be heavy on older machines.
Acrobat is worth it only if you already use it for editing, redaction, e-signatures, or form creation. Its OCR is excellent, but if text extraction is all you need, a browser-based tool does the job without the bill.
Method 3: Command line with pdftotext (Poppler)
If you are comfortable in a terminal, the open-source tool pdftotext from the poppler-utils package is the fastest CLI option. It is available on macOS (via Homebrew), Linux (via apt/dnf/pacman), and Windows (via Cygwin or WSL).
Install it with `brew install poppler` (macOS) or `sudo apt install poppler-utils` (Debian/Ubuntu), then:
```bash # Extract text with default layout pdftotext input.pdf output.txt
# Preserve layout as much as possible pdftotext -layout input.pdf output.txt
# Extract text from a specific page range (pages 1-5) pdftotext -f 1 -l 5 input.pdf output.txt
# Extract text from a scanned PDF by combining pdftotext with OCRmyPDF ocrmypdf --skip-text input.pdf scanned-with-ocr.pdf pdftotext scanned-with-ocr.pdf output.txt ```
The `pdftotext` tool is the workhorse of PDF text extraction in the open-source world. It is fast, scriptable, and handles thousands of files in a batch. For scanned PDFs, OCRmyPDF is the de-facto choice: it adds a text layer to scanned PDFs without altering the original page images.
Common questions
Can I extract text from a scanned PDF?
Yes, but you need OCR. UtilBoxx's PDF Extract Text tool runs OCR in your browser, so the scanned image is converted to searchable text without uploading your file anywhere. Adobe Acrobat Pro also runs OCR on scanned PDFs. The CLI workflow is OCRmyPDF to add a text layer, then pdftotext to dump the text.
Does text extraction preserve the formatting?
Usually not. PDF text extraction gives you the words and paragraphs, but the visual formatting (bold, italics, font sizes, columns) is often lost. `pdftotext -layout` does a reasonable job of preserving column layout, and tools like Adobe's "Export to Word" preserve more visual structure, at the cost of being much heavier. For most use cases β quoting, translating, searching β plain text is enough.
Can I extract text from a password-protected PDF?
Yes, but you need the password. Password-protected PDFs can be opened with the password, then the text can be extracted normally. Most tools, including UtilBoxx, will prompt for the password when needed. If you do not have the password, the text is not accessible by design β this is a security feature, not a bug.
Will extraction work on every language?
Yes. Text-based extraction works on any language that is embedded in the PDF. OCR works on any language the OCR engine has been trained on. UtilBoxx's browser-based OCR supports a wide range of Latin, Cyrillic, and East Asian scripts. Adobe Acrobat Pro supports dozens more. For unusual scripts, command-line tools like Tesseract offer the broadest language coverage.
Is it safe to use an online text extractor?
It depends on the service. UtilBoxx processes everything in your browser β no upload, no server-side processing, no logs. With other tools, assume your file is being uploaded to a remote server and read their privacy policy carefully. Avoid uploading any document containing personal, financial, medical, or legally sensitive information to a text extractor you do not trust.
What is the difference between "copy text" and "extract text"?
In most tools the two are the same: the text content of the PDF. Some tools (like `pdftotext -layout`) try to preserve the visual layout in plain text. Others (like Adobe's "Export to Word") produce a structured document. UtilBoxx gives you clean plain text β perfect for pasting anywhere.
Conclusion
Extracting text from a PDF is a small task that comes up constantly and should not require a paid subscription or a software install. For most people, UtilBoxx's free PDF Extract Text tool is the obvious choice: it is private, fast, free, handles both text and scanned PDFs, and works in your browser.
If you already pay for Adobe Acrobat, its "Export PDF" feature is excellent. If you are scripting batch work, the combination of pdftotext and OCRmyPDF in the terminal is unbeatable.
For everything else, head to UtilBoxx PDF tools and you will find a complete, privacy-first toolkit for working with PDFs β all in your browser.