By way of investigating these questions, I wrote Java code based on
the 'pdfbox' library to spit out PDF file information, in particular,
the "ToUnicode" entry for each font on each page.
The code is really awful -- really an experiment into the 'pdfbox'
interface. Please, no aesthetic comments.
But it is useful to see just what text-conversion information is
packaged in a PDF file.
To use it, you need:
* a Java SDK
* the Java 'pdfbox' library, and the path to its jar files
* the Java 'commons-logging' library etc.
javac -classpath /usr/share/java/pdfbox.jar -Xlint PDFView.java