|
| Scanned Image vs Saved Image |
 |
Fri, 27 Apr 2007 07:23:20 -070 |
Is there any sort of flag on a PDF document indicating if a document is a
scanned image vs a document that has been saved via user imput (forms)?
We're using a document management system and want to determine which PDF
documents are forms and which ones are a scanned image.
|
| Post Reply
|
| Re: Scanned Image vs Saved Image |
 |
Fri, 27 Apr 2007 08:42:46 -070 |
There isn't any separate flag indicating such a state. All PDF files contact
text, graphics, and other objects. In general, you can't check for the presence
or absence of text in the file; a scanned image file might have no text, or
there might be OCR text.
If you know or can control the source of the files, you can use that knowledge
to set up tests to examine the files and determine their type in an empirical
manner. For example, if all of your forms have live fields, you can check for
the presence of those objects, and flag the files accordingly for your workflow
processing, but if the forms are flattened, that will not work.
Another flag could be that you'd create all of your form files with a key
metadata field indicator (i.e., document info property FormFields with a Yes
value). PDF files from any other source wouldn't include that unique field and
value.
|
| Post Reply
|
|
|
|
|
|
|
|
|
|