Speed Comparsion ================ Thanks to the brilliant `xpdf reader`_ sources and the fact that pyxpdf is written in `cython`_ as Python C-API module makes it much faster than pure python based pdf parsers. Text Extraction --------------- Comparing text extraction (while maintaining layout) speed with popular `pdfminer.six`_ module. (python script used - compare.py_) `Running Python 3.6.9, gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, Ubuntu 18.04, on Azure Standard B2ms (2 vcpus, 8 GiB memory) [Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz]` .. code-block:: bash 'pdfminer_text' took: 0.9271 sec 'pyxpdf_text' took: 0.0424 sec 'pdfminer_text_100mb' took: 7.2833 sec 'pyxpdf_text_100mb' took: 0.3301 sec 'pdfminer_text_500mb' took: 36.5288 sec 'pyxpdf_text_500mb' took: 0.9786 sec ====== ============ ========== ============ Size pdfminer.six pyxpdf times faster ====== ============ ========== ============ 1 MB 0.9271 sec 0.0424 sec x21 100 MB 7.2833 sec 0.3301 sec x22 500 MB 36.5288 sec 0.9786 sec x37 ====== ============ ========== ============ **pyxpdf is atleast x20 times faster** .. _cython: https://cython.org/ .. _xpdf reader: https://www.xpdfreader.com/about.html .. _pdfminer.six: https://pdfminersix.readthedocs.io/en/latest/ .. _compare.py: https://github.com/ashutoshvarma/pyxpdf/blob/master/benchmark/compare.py