How to Extract Images from PDFs in PHP

How to Extract Images from PDFs in PHP

Introduction

Extracting images from PDFs is useful for document archiving, scanning, and data analysis. Some PDFs store scanned images of text, while others embed logos, charts, or photographs.

With PHP, we can:

Extract images from PDFs using Imagick and FPDI
Save extracted images in JPEG, PNG, or WebP formats
Convert PDFs into images for OCR processing
Automate image extraction from multiple-page PDFs

By the end of this guide, you'll have a fully functional PHP script for extracting and processing images from PDFs. 🚀

1. Installing Required Libraries (FPDI, TCPDF, and Imagick)

To extract images from PDFs, we use:

FPDI – Reads and processes PDF pages
TCPDF – Handles PDF rendering
Imagick – Converts PDF pages into images

Install via Composer (Recommended)

composer require setasign/fpdi
composer require tecnickcom/tcpdf
composer require imagick/imagick

Include them in your PHP script:

require 'vendor/autoload.php';

use setasign\Fpdi\Fpdi;
use Imagick;

Now, PHP is ready to extract images from PDFs.

2. Extracting Images from a PDF Using Imagick

Imagick allows us to convert PDF pages into images and extract embedded pictures.

Example: Convert a PDF Page to an Image

$imagick = new Imagick();
$imagick->setResolution(300, 300);
$imagick->readImage('document.pdf[0]'); // Extract first page
$imagick->setImageFormat('jpeg');
$imagick->writeImage('extracted_image.jpg');

echo "PDF page converted to an image!";

Explanation:

Loads the first page of document.pdf
Sets resolution to 300 DPI for better quality
Saves the image as extracted_image.jpg

🔹 Modify [0] to extract different pages ([1], [2], etc.).

3. Extracting All Images from a Multi-Page PDF

To extract all images from a PDF, loop through each page.

Example: Extract All Pages as Images

$imagick = new Imagick();
$imagick->setResolution(300, 300);
$imagick->readImage('document.pdf');

foreach ($imagick as $index => $page) {
    $page->setImageFormat('jpeg');
    $page->writeImage("page_$index.jpg");
}

echo "All images extracted from PDF!";

Explanation:

Loops through all PDF pages
Saves each page as page_0.jpg, page_1.jpg, etc.

🔹 Use png or webp for different output formats.

4. Extracting Embedded Images from a PDF (Not Whole Pages)

Some PDFs store embedded images separately, which can be extracted using Imagick.

Example: Extract Embedded Images

$imagick = new Imagick('document.pdf');

foreach ($imagick->getImageLayers() as $index => $image) {
    $image->writeImage("image_$index.jpg");
}

echo "Embedded images extracted!";

Explanation:

Extracts only images (not entire pages)
Saves them as separate image files

🔹 Useful for extracting logos, charts, or scanned signatures.

5. Extracting Images Using FPDI (Alternative Method)

FPDI allows us to import specific pages from a PDF and save them as images.

Example: Extract Page as an Image Using FPDI

$pdf = new Fpdi();
$pdf->setSourceFile('document.pdf');
$pageCount = $pdf->setSourceFile('document.pdf');

for ($i = 1; $i <= $pageCount; $i++) {
    $tplIdx = $pdf->importPage($i);
    $pdf->AddPage();
    $pdf->useTemplate($tplIdx);

    $pdf->Output("extracted_page_$i.pdf", 'F');
}

echo "PDF pages extracted as separate files!";

Explanation:

Extracts each page separately as a new PDF file
Can be combined with Imagick to convert into images

6. Converting a PDF to High-Quality PNG for OCR Processing

For OCR (Optical Character Recognition) or document archiving, use high-resolution PNG images.

Example: Convert PDF to High-Quality PNG

$imagick = new Imagick();
$imagick->setResolution(600, 600);
$imagick->readImage('document.pdf[0]');
$imagick->setImageFormat('png');
$imagick->writeImage('high_res_page.png');

echo "High-quality PNG generated!";

Why Use PNG?

PNG is lossless, ensuring better text clarity for OCR
Ideal for digitizing scanned documents

7. Saving Extracted Images to a Database

To store extracted images in a MySQL database, convert them to base64 format.

Example: Save Images to a MySQL Database

$conn = new mysqli("localhost", "root", "", "pdf_images");

$imageData = file_get_contents('extracted_image.jpg');
$encoded = base64_encode($imageData);

$stmt = $conn->prepare("INSERT INTO images (image_data) VALUES (?)");
$stmt->bind_param("s", $encoded);
$stmt->execute();

echo "Image saved in database!";

Allows easy retrieval and display of extracted images.

8. Displaying Extracted Images from the Database

Once images are stored, retrieve and display them in PHP.

Example: Display Extracted Image from Database

$result = $conn->query("SELECT image_data FROM images LIMIT 1");
$row = $result->fetch_assoc();
echo '<img src="data:image/jpeg;base64,'.$row['image_data'].'">';

Dynamically displays extracted images.

9. Extracting and Emailing PDF Images

Extracted images can be sent via email as attachments.

Example: Email Extracted Images

use PHPMailer\PHPMailer\PHPMailer;

$mail = new PHPMailer();
$mail->addAttachment('extracted_image.jpg');
$mail->send();

echo "Image emailed successfully!";

Ideal for automated document processing systems.

10. Automating Image Extraction for Bulk PDFs

For multiple PDFs, use a loop to batch process images.

Example: Batch Extract Images from Multiple PDFs

$files = glob("pdfs/*.pdf");

foreach ($files as $file) {
    $imagick = new Imagick();
    $imagick->readImage($file);
    $imagick->writeImage(str_replace('.pdf', '.jpg', $file));
}

echo "Batch image extraction completed!";

Automates the process for multiple PDFs.

Best Practices for Extracting Images from PDFs in PHP

Use Imagick for high-quality image extraction.
Convert PDF pages to PNG for better OCR accuracy.
Use FPDI for extracting specific pages dynamically.
Save extracted images in a database for easy retrieval.
Automate extraction for bulk PDF processing.

Conclusion

With Imagick, FPDI, and TCPDF, you can extract images from PDFs, process them, and store them dynamically in PHP.

Extract embedded images and full pages from PDFs.
Convert PDFs into high-resolution images for OCR.
Store, display, and email extracted images.
Automate bulk image extraction for large datasets.

By implementing these techniques, you can efficiently process and manage images from PDFs in your PHP applications! 🚀

Leave a Reply