setup
routinely, i store figures as vectors as separate pdf files. but sometimes figures are required as separate bitmap files such as jpeg, png, or tiff. acrobat pro's one way to go, but unfortunately, my work laptop doesn't have it.
solution
the plan
- place all pdf files to export under one folder and batch process with a
for
loop - export each figure, specifically:
- read pdf,
- get page 1, since each pdf has only 1 page,
- export with a zoom factor of 4 - that should be large enough.
programming language & module(s)
- Python
- pymupdf, relevant docs available at:
- pypi: https://pypi.org/project/PyMuPDF/;
- documetation: https://pymupdf.readthedocs.io/en/latest/
- pyfiglet
file preps
- the source pdf(s)
variables to customize
folder
, for pdf file path,image_ext
, for bitmap file extension, e.g. 'jpg', 'jpeg'. note that 'tif(f)' is unfortunately not supported.zoom_in_factor
, controls the size of the exported images.
the script
pdf_to_image.py
#! /usr/bin/python | |
def look_cool(slogan, font = 'slant'): | |
import pyfiglet | |
result = pyfiglet.figlet_format(slogan, font = font) | |
print('\n' + result) | |
def pdf2img(folder, pdf_name, zoom_in_factor, image_ext): | |
import os | |
import fitz | |
pdf_path = os.path.join(folder, pdf_name) | |
print('\t%s' % pdf_path) | |
pdf = fitz.open(pdf_path) | |
# create a transformation matrix | |
mat = fitz.Matrix(zoom_in_factor, zoom_in_factor) | |
# render the 1st page to a pixmap | |
pix = pdf[0].get_pixmap(matrix = mat) | |
# save the pixmap as an image | |
image_name = pdf_name.replace('.pdf', image_ext) | |
pix.save(os.path.join(folder, image_name), jpg_quality = 100) | |
pdf.close() | |
import os | |
folder = r'C:\Users\xiao\Desktop\pdf_to_export' | |
# options: '.png', '.jpg', '.jpeg', '.psd', '.ps' | |
image_ext = '.jpeg' | |
# tweak the number to get bitmaps of different zoom-in scales. | |
zoom_in_factor = 4.0 | |
print('\nexporting the following pdfs as %s:' % image_ext) | |
pdfs = [item for item in os.listdir(folder) if item.endswith('.pdf')] | |
for pdf_name in pdfs: | |
pdf2img(folder, pdf_name, zoom_in_factor, image_ext) | |
look_cool('All done !') |
output
the exported images can be found in the same folder as the pdf docs.
note to self
- this script assumes there's only 1 page in each pdf. tweak it in case more pages are to be exported.