Nov 30, 2024

export pdf pages as bitmap files

setup

routinely, i store figures as vectors as separate pdf files. but sometimes figures are required as separate bitmap files such as jpeg, png, or tiff. acrobat pro's one way to go, but unfortunately, my work laptop doesn't have it.

solution

the plan

place all pdf files to export under one folder and batch process with a for loop
export each figure, specifically:
- read pdf,
- get page 1, since each pdf has only 1 page,
- export with a zoom factor of 4 - that should be large enough.

programming language & module(s)

Python
pymupdf, relevant docs available at:
- pypi: https://pypi.org/project/PyMuPDF/;
- documetation: https://pymupdf.readthedocs.io/en/latest/
pyfiglet

file preps

the source pdf(s)

variables to customize

folder, for pdf file path,
image_ext, for bitmap file extension, e.g. 'jpg', 'jpeg'. note that 'tif(f)' is unfortunately not supported.
zoom_in_factor, controls the size of the exported images.

the script

pdf_to_image.py

 #! /usr/bin/python

def look_cool(slogan, font = 'slant'):
    import pyfiglet
    result = pyfiglet.figlet_format(slogan, font = font)
    print('\n' + result)


def pdf2img(folder, pdf_name, zoom_in_factor, image_ext):
    import os
    import fitz
    pdf_path = os.path.join(folder, pdf_name)
    print('\t%s' % pdf_path)
    pdf = fitz.open(pdf_path)
    # create a transformation matrix
    mat = fitz.Matrix(zoom_in_factor, zoom_in_factor)
    # render the 1st page to a pixmap
    pix = pdf[0].get_pixmap(matrix = mat)
    # save the pixmap as an image
    image_name = pdf_name.replace('.pdf', image_ext)
    pix.save(os.path.join(folder, image_name), jpg_quality = 100)
    pdf.close()


import os

folder = r'C:\Users\xiao\Desktop\pdf_to_export'
# options: '.png', '.jpg', '.jpeg', '.psd', '.ps'
image_ext = '.jpeg'
# tweak the number to get bitmaps of different zoom-in scales.
zoom_in_factor = 4.0

print('\nexporting the following pdfs as %s:' % image_ext)
pdfs = [item for item in os.listdir(folder) if item.endswith('.pdf')]
for pdf_name in pdfs:
    pdf2img(folder, pdf_name, zoom_in_factor, image_ext)

look_cool('All done !')

output

the exported images can be found in the same folder as the pdf docs.

note to self

this script assumes there's only 1 page in each pdf. tweak it in case more pages are to be exported.

Xiao

Some rights reserved

Except where otherwise noted, content on this page is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.