Welcome!

Java Authors: Don MacVittie, Maureen O'Gara, Liz McMillan, Walter H. Pinson, III, Yakov Werde

Related Topics: Java

Java: Article

Making PDFs Portable: Integrating PDF and Java Technology

Making PDFs Portable: Integrating PDF and Java Technology

Encryption/Decryption
A popular PDF feature allows for encrypting document contents and setting access controls limiting who can view the unencrypted document. Specifically, a PDF document is encrypted with a master password and optionally a user password. If a user password has been provided, then a PDF reader such as Acrobat will prompt for a password before letting the document be viewed. The master password is required to change document permissions.

The PDF specification lets creators of PDF documents restrict certain operations when viewing the PDF in Acrobat. Some of the available document restrictions are:

  • Printing
  • Changing content
  • Extracting text
A full explanation of PDF document security lies outside the bounds of this article and interested developers should reference the relevant sections of the PDF specification and evaluate its capabilities. The security model used in PDF documents is pluggable and lets different security handlers be employed when encrypting documents. As of this writing, PDFBox supports the "Standard" security handler, which is what most PDF documents use.

To encrypt a document, it must first be assigned a security handler and then encrypted with a master password and user password. For example, the following code encrypts a document so a user can open it in Acrobat without entering a password (i.e., no user password), but can't print the document using the access control mechanism.


//load the document
PDDocument pdf =
PDDocument.load( "test.pdf" );
//create the encryption options
PDStandardEncryption encryptionOptions =
new PDStandardEncryption();
encryptionOptions.setCanPrint( false );
pdf.setEncryptionDictionary(
encryptionOptions );
//encrypt the document
pdf.encrypt( "master", null );
//save the encrypted document
//to the file system
pdf.save( "test-output.pdf");

For a more complete example, reference the source code for the encryption utility included in the PDFBox distribution: org.pdfbox.Encrypt.

Many applications can generate PDF documents but don't allow control over the document's security options. PDFBox can be used here to intercept and encrypt the PDF before it's sent to the user.

Form Integration
When an application's output is a series of form field values, it is usually desirable to let the user save the form for record keeping. PDF technology is a great choice for this kind of output. A developer can write code to output PDF instructions manually to draw images, tables and text. Or encapsulate the data in XML and use an XSL-FO engine to create a PDF document. However, these approaches can be time-consuming, error-prone and inflexible. A better approach for simple forms might be to create a template and generate a filled-in document for any given set of input data based on the template.

A form many of us may be familiar with is the Employment Eligibility Verification, or I-9 form: http://uscis.gov/graphics/formsfee/forms/files/i-9.pdf

Using one of the example applications distributed with PDFBox, the form field names can be listed:


java org.pdfbox.examples.fdf.PrintFields i-9.pdf

Another example utility populates a given field with textual data:


java org.pdfbox.examples.fdf.SetField i-9.pdf NAME1 Smith

Opening the PDF document in Acrobat shows that the "Last Name" field has been filled in. This functionality can be recreated in code:


PDDocument pdf =
PDDocument.load( "i-9.pdf" );
PDDocumentCatalog docCatalog =
pdf.getDocumentCatalog();
PDAcroForm acroForm =
docCatalog.getAcroForm();
PDField field =
acroForm.getField( "NAME1" );
field.setValue( "Smith" );
pdf.save( "i-9-copy.pdf" );

It's also possible to extract the values of a form field that has been previously populated, as below:


PDField field =
acroForm.getField( "NAME1" );
System.out.println(
"First Name=" + field.getValue() );

Acrobat offers the option of exporting and importing form data in a special file format called "Forms Data Format." These files come in two flavors, FDF and XFDF. An FDF stores the form data in the same format as PDF, while XFDF stores data in XML format. PDFBox handles both FDF and XFDF data with a single object: FDFDocument. The following snippet shows how to export FDF data for the I-9 form above:


PDDocument pdf =
PDDocument.load( "i-9.pdf" );
PDDocumentCatalog docCatalog =
pdf.getDocumentCatalog();
PDAcroForm acroForm =
docCatalog.getAcroForm();
FDFDocument fdf = acroForm.exportFDF();
fdf.save( "exportedData.fdf" );

PDFBox Form Integration Steps

  1. Create PDF Form Template using Acrobat or other visual tool
  2. Track the name of each desired form field
  3. Store the template PDF where the application can access it
  4. When the PDF is requested, use PDFBox to parse the template PDF
  5. Populate the required form fields
  6. Stream the PDF back to the user
Utilities
Besides the library APIs mentioned above, PDFBox also has a set of command-line utilities. Table 2 lists the class name of each utility along with a short description.

Remarks
The PDF specification weighs in at 1,172 pages so implementing it is quite an undertaking. As such, PDFBox is distributed with the proviso that it is a work in progress, with new features being added over time. Its main weakness is in creating PDF documents from scratch. However, there are several other Open Source Java projects that can be used to fill the gap. For instance, the Apache FOP project lets programmers generate a PDF from a specialized XML document that describes the PDF document. Also, iText provides a high-level API for creating document elements such as tables and lists.

The next version of PDFBox will add support for the new PDF 1.5 object stream and cross-reference streams. After that will be support for embedding fonts and images. Hopefully through efforts like PDFBox, robust support for PDF technology can be made available for Java applications.

References

  • PDFBox: www.pdfbox.org/
  • Apache FOP: http://xml.apache.org/fop/
  • iText: www.lowagie.com/iText/
  • PDF Reference: http://partners.adobe.com/asn/tech/pdf/specifications.jsp
  • Jakarta Lucene: http://jakarta.apache.org/lucene/
  • More Stories By Ben Litchfield

    Ben Litchfield is a business systems consultant within the development & integration practice at LPA Systems. He has been the lead developer of PDFBox for the past two years. Ben holds a BS in Software Engineering from the Rochester Institute of Technology. He has been providing solutions for enterprise applications for the past five years.

    Comments (3) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    Lucious 03/23/05 02:27:55 PM EST

    I can't believe I found this!! I was searching for tools to update pdf files in my java programs. All I could find was commercial tools that would charge for single/multiple CPU and development liscenses and then charge MORE for deployment!! I actually gave up and downloaded the pdf specs (over 1200 pages) to develop tools of my own. I can't wait to start using these tools!

    Maulik 03/23/05 08:09:43 AM EST

    Great easy-to-follow article for someone who knows next to nothing about integrating Java and PDF. Good job.

    Richard Bouchard 03/12/05 09:15:57 PM EST

    Excellent article and has great utility.