Installation

PyMuPDF can be installed from Python wheels for Windows (32bit and 64bit), Linux (64bit, Intel and ARM) and Mac OSX (64bit, Intel), Python versions 3.7 and up:

  1. python -m pip install --upgrade pip
  2. python -m pip install --upgrade pymupdf

PyMuPDF does not support Python versions prior to 3.6. Older wheels can be found in this repository and on PyPI. Please note that we generally follow the official Python release schedules. For Python versions dropping out of official support this means, that generation of wheels will also be ceased for them.

There are no mandatory external dependencies. However, some optional feature are available only if additional components are installed:

  • Pillow is required for Pixmap.pil_save() and Pixmap.pil_tobytes().

  • fontTools is required for Document.subset_fonts().

  • pymupdf-fonts is a collection of nice fonts to be used for text output methods.

  • Tesseract-OCR for optical character recognition in images and document pages. Tesseract is separate software, not a Python package. To enable OCR functions in PyMuPDF, the software must be installed and the system environment variable "TESSDATA_PREFIX" must be defined and contain the tessdata folder name of the Tesseract installation location. See below.

Note

You can install these additional components at any time – before or after installing PyMuPDF. PyMuPDF will detect their presence during import or when the respective functions are being used.

To install PyMuPDF from sources, follow these steps:

Step 1: Install MuPDF

For open source GNU AGPL licenses download from here.

If you are a commercial customer, please contact Artifex.

Install MuPDF following the instructions for your platform.

Step 2: Download and Generate PyMuPDF

Download the sources from https://pypi.org/project/PyMuPDF/#files and decompress them.

Adjust the setup.py script when necessary. Especially make sure that include_dirs and library_dirs point to the folders of your MuPDF installation. The easiest way to do this is setting the environment variable "PYMUPDF_DIRS" to the name of a JSON file, that contains a dictionary with these two keys having a list of folder names as values:

  1. {
  2. "include_dirs": ["folder1", "folder2", "folder3", ...],
  3. "library_dirs": ["folder1", "folder2", "folder3", ...],
  4. }

Now perform a python setup.py install.

Note

You can also install from sources of the Github repository. These do not contain the pre-generated files fitz.py or fitz_wrap.c, which instead are generated by the installation script setup.py. To use it, SWIG must be installed on your system.

Enabling Integrated OCR Support

If you do not intend to use this feature, skip this step. Otherwise, it is required for both installation paths: from wheels and from sources.

PyMuPDF will already contain all the logic to support OCR functions. But it additionally does need Tesseract’s language support data, so installation of Tesseract-OCR is still required.

The language support folder location must currently 1 be communicated via storing it in the environment variable "TESSDATA_PREFIX".

So for a working OCR functionality, make sure to complete this checklist:

  1. Install Tesseract.

  2. Locate Tesseract’s language support folder. Typically you will find it here:

    • Windows: C:\Program Files\Tesseract-OCR\tessdata

    • Unix systems: /usr/share/tesseract-ocr/4.00/tessdata

  3. Set the environment variable TESSDATA_PREFIX

    • Windows: set TESSDATA_PREFIX=C:\Program Files\Tesseract-OCR\tessdata

    • Unix systems: export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata

Note

This must happen outside Python – before starting your script. Just manipulating os.environ will not work!

Footnotes

1

In the next MuPDF version, it will be possible to pass this value as a parameter – directly in the OCR invocations.