Re: simple PDF inspection code

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: simple PDF inspection code

Steve White-12
Hi,

By way of investigating these questions, I wrote Java code based on
the 'pdfbox' library to spit out PDF file information, in particular,
the "ToUnicode" entry for each font on each page.

The code is really awful -- really an experiment into the 'pdfbox'
interface.  Please, no aesthetic comments.

But it is useful to see just what text-conversion information is
packaged in a PDF file.
=======================================
To use it, you need:
* a Java SDK
* the Java 'pdfbox' library, and the path to its jar files
* the Java 'commons-logging' library etc.

To build:
javac -classpath /usr/share/java/pdfbox.jar -Xlint PDFView.java

To run:
java -classpath
/usr/share/java/pdfbox.jar:/usr/share/java/commons-logging.jar:.
PDFView pdf_file_path.pdf

PDFView.java (6K) Download Attachment