Output Devices¶
Output devices process PDF Page
and generate/extract
resources from them.
All the Output devices inherit from base Output Device:
-
class
pyxpdf.xpdf.
PDFOutputDevice
¶ Generic PDF Output Device
All PDF Output Device inherit from this.
-
get
(self, int page_no, **kwargs)¶ Get the output of page_no indexed page
-
Currently there are three Output devices implemented:
Page Iterator¶
To iterate over a PDF Output Device page wise, we have page_iterator:
-
class
pyxpdf.xpdf.
page_iterator
(output, **kwargs)¶ Iterate over PDF output devices by page.
- Parameters
output – PDF output device to iterate over
kwargs – All the optional arguments to pass to get() method of output device
Examples
Iterate pages text from
TextOutput
>>> text_out = TextOutput(doc) >>> for page_text in page_iterator(text_out) ... print(page_text)
Iterate images from
RawImageOutput
with specific crop_box>>> image_out = RawImageOutput(doc) >>> for image in page_iterator(image_out, crop_box=(0,0,500,500)): ... image.show() # pillow image