Output Devices

Output devices process PDF Page and generate/extract resources from them.

All the Output devices inherit from base Output Device:

class pyxpdf.xpdf.PDFOutputDevice

Generic PDF Output Device

All PDF Output Device inherit from this.

get(self, int page_no, **kwargs)

Get the output of page_no indexed page

Currently there are three Output devices implemented:

Page Iterator

To iterate over a PDF Output Device page wise, we have page_iterator:

class pyxpdf.xpdf.page_iterator(output, **kwargs)

Iterate over PDF output devices by page.

  • output – PDF output device to iterate over

  • kwargs – All the optional arguments to pass to get() method of output device


Iterate pages text from TextOutput

>>> text_out = TextOutput(doc)
>>> for page_text in page_iterator(text_out)
...     print(page_text)

Iterate images from RawImageOutput with specific crop_box

>>> image_out = RawImageOutput(doc)
>>> for image in page_iterator(image_out, crop_box=(0,0,500,500)):
...     image.show()    # pillow image