Output Devices¶
Output devices process PDF Page and generate/extract
resources from them.
All the Output devices inherit from base Output Device:
-
class
pyxpdf.xpdf.PDFOutputDevice¶ Generic PDF Output Device
All PDF Output Device inherit from this.
-
get(self, int page_no, **kwargs)¶ Get the output of page_no indexed page
-
Currently there are three Output devices implemented:
Page Iterator¶
To iterate over a PDF Output Device page wise, we have page_iterator:
-
class
pyxpdf.xpdf.page_iterator(output, **kwargs)¶ Iterate over PDF output devices by page.
- Parameters
output – PDF output device to iterate over
kwargs – All the optional arguments to pass to get() method of output device
Examples
Iterate pages text from
TextOutput>>> text_out = TextOutput(doc) >>> for page_text in page_iterator(text_out) ... print(page_text)
Iterate images from
RawImageOutputwith specific crop_box>>> image_out = RawImageOutput(doc) >>> for image in page_iterator(image_out, crop_box=(0,0,500,500)): ... image.show() # pillow image