Introduction
Extracting images from PDFs is useful for document archiving, scanning, and data analysis. Some PDFs store scanned images of text, while others embed logos, charts, or photographs.
With PHP, we can:
✅ Extract images from PDFs using Imagick and FPDI
✅ Save extracted images in JPEG, PNG, or WebP formats
✅ Convert PDFs into images for OCR processing
✅ Automate image extraction from multiple-page PDFs
By the end of this guide, you'll have a fully functional PHP script for extracting and processing images from PDFs. 🚀
1. Installing Required Libraries (FPDI, TCPDF, and Imagick)
To extract images from PDFs, we use:
✔ FPDI – Reads and processes PDF pages
✔ TCPDF – Handles PDF rendering
✔ Imagick – Converts PDF pages into images
Install via Composer (Recommended)
composer require setasign/fpdi
composer require tecnickcom/tcpdf
composer require imagick/imagick
Include them in your PHP script:
require 'vendor/autoload.php';
use setasign\Fpdi\Fpdi;
use Imagick;
✅ Now, PHP is ready to extract images from PDFs.
2. Extracting Images from a PDF Using Imagick
Imagick allows us to convert PDF pages into images and extract embedded pictures.
Example: Convert a PDF Page to an Image
$imagick = new Imagick();
$imagick->setResolution(300, 300);
$imagick->readImage('document.pdf[0]'); // Extract first page
$imagick->setImageFormat('jpeg');
$imagick->writeImage('extracted_image.jpg');
echo "PDF page converted to an image!";
Explanation:
✅ Loads the first page of document.pdf
✅ Sets resolution to 300 DPI for better quality
✅ Saves the image as extracted_image.jpg
🔹 Modify [0]
to extract different pages ([1]
, [2]
, etc.).
3. Extracting All Images from a Multi-Page PDF
To extract all images from a PDF, loop through each page.
Example: Extract All Pages as Images
$imagick = new Imagick();
$imagick->setResolution(300, 300);
$imagick->readImage('document.pdf');
foreach ($imagick as $index => $page) {
$page->setImageFormat('jpeg');
$page->writeImage("page_$index.jpg");
}
echo "All images extracted from PDF!";
Explanation:
✅ Loops through all PDF pages
✅ Saves each page as page_0.jpg
, page_1.jpg
, etc.
🔹 Use png
or webp
for different output formats.
4. Extracting Embedded Images from a PDF (Not Whole Pages)
Some PDFs store embedded images separately, which can be extracted using Imagick.
Example: Extract Embedded Images
$imagick = new Imagick('document.pdf');
foreach ($imagick->getImageLayers() as $index => $image) {
$image->writeImage("image_$index.jpg");
}
echo "Embedded images extracted!";
Explanation:
✅ Extracts only images (not entire pages)
✅ Saves them as separate image files
🔹 Useful for extracting logos, charts, or scanned signatures.
5. Extracting Images Using FPDI (Alternative Method)
FPDI allows us to import specific pages from a PDF and save them as images.
Example: Extract Page as an Image Using FPDI
$pdf = new Fpdi();
$pdf->setSourceFile('document.pdf');
$pageCount = $pdf->setSourceFile('document.pdf');
for ($i = 1; $i <= $pageCount; $i++) {
$tplIdx = $pdf->importPage($i);
$pdf->AddPage();
$pdf->useTemplate($tplIdx);
$pdf->Output("extracted_page_$i.pdf", 'F');
}
echo "PDF pages extracted as separate files!";
Explanation:
✅ Extracts each page separately as a new PDF file
✅ Can be combined with Imagick to convert into images
6. Converting a PDF to High-Quality PNG for OCR Processing
For OCR (Optical Character Recognition) or document archiving, use high-resolution PNG images.
Example: Convert PDF to High-Quality PNG
$imagick = new Imagick();
$imagick->setResolution(600, 600);
$imagick->readImage('document.pdf[0]');
$imagick->setImageFormat('png');
$imagick->writeImage('high_res_page.png');
echo "High-quality PNG generated!";
Why Use PNG?
✔ PNG is lossless, ensuring better text clarity for OCR
✔ Ideal for digitizing scanned documents
7. Saving Extracted Images to a Database
To store extracted images in a MySQL database, convert them to base64 format.
Example: Save Images to a MySQL Database
$conn = new mysqli("localhost", "root", "", "pdf_images");
$imageData = file_get_contents('extracted_image.jpg');
$encoded = base64_encode($imageData);
$stmt = $conn->prepare("INSERT INTO images (image_data) VALUES (?)");
$stmt->bind_param("s", $encoded);
$stmt->execute();
echo "Image saved in database!";
✅ Allows easy retrieval and display of extracted images.
8. Displaying Extracted Images from the Database
Once images are stored, retrieve and display them in PHP.
Example: Display Extracted Image from Database
$result = $conn->query("SELECT image_data FROM images LIMIT 1");
$row = $result->fetch_assoc();
echo '<img src="data:image/jpeg;base64,'.$row['image_data'].'">';
✅ Dynamically displays extracted images.
9. Extracting and Emailing PDF Images
Extracted images can be sent via email as attachments.
Example: Email Extracted Images
use PHPMailer\PHPMailer\PHPMailer;
$mail = new PHPMailer();
$mail->addAttachment('extracted_image.jpg');
$mail->send();
echo "Image emailed successfully!";
✅ Ideal for automated document processing systems.
10. Automating Image Extraction for Bulk PDFs
For multiple PDFs, use a loop to batch process images.
Example: Batch Extract Images from Multiple PDFs
$files = glob("pdfs/*.pdf");
foreach ($files as $file) {
$imagick = new Imagick();
$imagick->readImage($file);
$imagick->writeImage(str_replace('.pdf', '.jpg', $file));
}
echo "Batch image extraction completed!";
✅ Automates the process for multiple PDFs.
Best Practices for Extracting Images from PDFs in PHP
✔ Use Imagick for high-quality image extraction.
✔ Convert PDF pages to PNG for better OCR accuracy.
✔ Use FPDI for extracting specific pages dynamically.
✔ Save extracted images in a database for easy retrieval.
✔ Automate extraction for bulk PDF processing.
Conclusion
With Imagick, FPDI, and TCPDF, you can extract images from PDFs, process them, and store them dynamically in PHP.
✅ Extract embedded images and full pages from PDFs.
✅ Convert PDFs into high-resolution images for OCR.
✅ Store, display, and email extracted images.
✅ Automate bulk image extraction for large datasets.
By implementing these techniques, you can efficiently process and manage images from PDFs in your PHP applications! 🚀