Sunday, April 17, 2016

PDF text layout made easy with PDFBox-Layout

More than a decade ago I was using iText to create PDF documents from scratch. It was quite easy to use, and did all the stuff I needed like organizing text in paragraphs, performing word wrapping and marking up text with bold and italic. But once upon a time Bruno Lowagie - the developer of iText - switched from open source to a proprietary license for reasons I do understand.

So when I now had to do some PDF processing for a new project, I was looking for an alternative. PDFBox is definitely the best open source choice, since it is quite mature.But when I was searching on how to do layout, I found a lot of people looking for exactly those features, and the common answer was: you have to do it on your own! Say what? Ouch. There must be someone out there, who already wrote that stuff... Sure there is, but google did not find him. So I started to write some simple word wrapping. And some simple pagination. And some simple markup for easy highlighting with bold an italic. Don't get me wrong: the stuff I wrote is neither sophisticated nor complete. It is drop dead simple, and does the things I need. But just in case someone out there may find it useful, I made it public under MIT license on GitHub.

column


PDFBox-Layout

PDFBox-Layout acts as a layer on top of PDFBox that performs some basic layout operations for you:
  • word wrapping
  • text alignment
  • paragraphs
  • pagination
The API actually has two parts: the (low-level) text layout API, and the document layout API.

The Text Layout API

The text layout API is thought for direct usage with the low level PDFBox API. You may organize text into blocks, do word wrapping, alignment, and highlight text with markup. Means: most features described in the remainder of this article may be used directly with PDFBox without the document layout API.  For more details on this API see the Text API Wiki page. What the document layout API gives you as a surplus, is paragraph layout and pagination.

The Document Layout API

The ingredients of the document layout API are documents, paragraphs and layouts. It is thought to easily create complete PDF documents from scratch, and performs things like word-wrapping, paragraph layout and pagination for you.
Let's start with a simple example:

document = new Document(Constants.A4, 40, 60, 40, 60);

Paragraph paragraph = new Paragraph();
paragraph.addText("Hello Document", 20, PDType1Font.HELVETICA);
document.add(paragraph);

final OutputStream outputStream = 
    new FileOutputStream("hellodoc.pdf");
document.save(outputStream);

We start with creating a Document, which acts as a container for elements like e.g. paragraphs. You specify the media box - A4 in this case - and the left, right, top and bottom margin of the document. The margins are applied to each page. After that we create a paragraph which is a container for text fragments. We add a text "Hello Document" with the font type HELVETICA and size 20 to the paragraph. That's it, let's save it to a file. The result looks like this:

hello

Word Wrapping

As already said, you can also perform word wrapping with PDFBox-Layout. Just use the method setMaxWidth() to set a maximum width, and the text container will do its best to not exceed the maximum width by word wrapping the text:

Paragraph paragraph = new Paragraph();
paragraph.addText(
    "This is some slightly longer text wrapped to a width of 100.", 
    11, PDType1Font.HELVETICA);
paragraph.setMaxWidth(100);
document.add(paragraph);

wrapped1

If you do not specify an explicit max width, the documents media box and the margins dictate the max width for a paragraph. Means: you may just write text, write text and more text without the need for any line beaks, and the layout will do the word wrapping in order to fit the paragraph into the page boundaries.

Text-Alignment

As you might have already seen, you can specify a text alignment on the paragraph:
Paragraph paragraph = new Paragraph();
paragraph.addText(
    "This is some slightly longer text wrapped to a width of 100.", 
paragraph.setMaxWidth(100);
paragraph.setAlignment(Alignment.Right);
document.add(paragraph);

wrapped-right

The alignment tells the draw method what to do with extra horizontal space, where the extra space is the difference between the width of the text container and the line. This means, that the alignment is effective only in case of multiple lines. Currently, Left, Center and Right alignment is supported.

Layout

The paragraphs in a document are sized and positioned using a layout strategy. By default, paragraphs are stacked vertically by the VerticalLayout. If a paragraph’s width is smaller than the page width, you can specify an alignment with a  layout hint:

document.add(paragraph, 
    new VerticalLayoutHint(Alignment.Left, 10, 10, 20, 0));

You can combine text- and paragraph-alignment anyway you want:

aligned

An alternative to the vertical layout is the Column-Layout, which allows you to arrange the paragraphs in multiple columns on a page.

Document document = 
    new Document(Constants.A4, 40, 60, 40, 60);
 
Paragraph title = new Paragraph();
title.addMarkup("*This Text is organized in Colums*", 
    20, BaseFont.Times);
document.add(title, VerticalLayoutHint.CENTER);
document.add(new VerticalSpacer(5));

// use column layout from now on
document.add(new ColumnLayout(2, 10));

Paragraph paragraph1 = new Paragraph();
paragraph1.addMarkup(text1, 11, BaseFont.Times);
document.add(paragraph1);
...

column

But you may also set an absolute position on an element. If this is set, the layout will ignore this element, and render it directly at the given position:

Paragraph footer = new Paragraph();
footer.addMarkup("This is some example footer", 6, BaseFont.Times);
paragraph.setAbsolutePosition(new Position(20, 20));
document.add(paragraph);

Pagination

As you add more and more paragraphs to the document, the layout automatically creates a new page if the content does not fit completely on the current page. Elements have different strategies how they will divide on multiple pages. Text is simply split by lines. Images may decide to either split, or - if they fit completely on the next page - to introduce some vertical spacer in order to be drawn on the next page. Anyway, you can always insert a NEW_PAGE element to trigger a new page.

Markup

Often you want use just some basic text styling: use a bold font here, some words emphasized with italic there, and that's it. Let's say we want to use different font types for the following sentence:

"Markup supports bold, italic, and even mixed markup."

If you want to do that using the standard API, it would look like this:

Paragraph paragraph = new Paragraph();
paragraph.addText("Markup supports ", 11, PDType1Font.HELVETICA);
paragraph.addText("bold", 11, PDType1Font.HELVETICA_BOLD);
paragraph.addText(", ", 11, PDType1Font.HELVETICA);
paragraph.addText("italic", 11, PDType1Font.HELVETICA_OBLIQUE);
paragraph.addText(", and ", 11, PDType1Font.HELVETICA);
paragraph.addText("even ", 11, PDType1Font.HELVETICA_BOLD);
paragraph.addText("mixed", 11, PDType1Font.HELVETICA_BOLD_OBLIQUE);
paragraph.addText(" markup", 11, PDType1Font.HELVETICA_OBLIQUE);
paragraph.addText(".\n", 11, PDType1Font.HELVETICA);
document.add(paragraph);

That's annoying, isn't it? That's what the markup API is intended for. Use * to mark bold content, and _ for italic. Let's do the same example with markup:

Paragraph paragraph = new Paragraph();
paragraph.addMarkup(
    "Markup supports *bold*, _italic_, and *even _mixed* markup_.\n", 
    11, 
    PDType1Font.HELVETICA, 
    PDType1Font.HELVETICA_BOLD,
    PDType1Font.HELVETICA_OBLIQUE,
    PDType1Font.HELVETICA_BOLD_OBLIQUE);
document.add(paragraph);

To make things even more easy, you may specify only the font family instead:

paragraph = new Paragraph();
paragraph.addMarkup(
    "Markup supports *bold*, _italic_, and *even _mixed* markup_.\n",
    11, BaseFont.Helvetica);

markup


That’s it

This was a short overview on what PDFBox Layout can do for you. Have a look at the Wiki and the examples for more information and some visual impressions.

20 comments:

  1. Hi in first thanks for the Library it helps alot, but i have a problem how to change the BaseFont for some Font i provide?

    ReplyDelete
    Replies
    1. Hi, it's very nice to heir that it is actually helpful :-)

      Concerning your question: I guess you are targeting the Paragraph.addMarkup() method, is that correct? If so, there is an additional signature that allows you to specify every single (PD)Font directly; the signature with the font family is just a short cut (see the last two examples in the article).

      Or do you want to define your own BaseFonts resp. font families? If this is what you want, I would ask you to open an issue and describe your needs.

      Best regards
      Ralf

      Delete
  2. Hi normally i do something like this to load a new font

    File file = new File("Verdana.ttf");
    PDFont font = PDTrueTypeFont.load(document, file, true);

    but this is not alload whit the library because this declaration

    Document document = new Document(Constants.A4, hMargin, hMargin,
    vMargin, vMargin);

    for loading a true type font it needs a PDDocument type of file, so my question is how to load a .ttf file font and to use in paragraph.addMarkup()

    beforehand thank you very much

    ReplyDelete
    Replies
    1. Hi,
      I just added a method Document.getPDDocument() that makes the PDDocument available before rendering. I guess this is necessary for any task where you need the PDDocument before rendering. Using that I was able to load and use any TTF.

      I will make a release the next days, I'm still working on the topic indentation and lists, which was much more work than I thought ;-)

      Hope this helps
      Ralf

      Delete
    2. Release 0.6.0 has just been published.

      Delete
    3. Wow thanks alot woorks like a charm, my new questions is how i make a page landscape normally this is something easy like this.

      doc = new PDDocument();
      PDFont font = PDType1Font.HELVETICA;
      PDPage page = new PDPage();
      page.setMediaBox(PDPage.PAGE_SIZE_A4);
      page.setRotation(90);
      doc.addPage(page);

      i try to do the same thing with this

      float hMargin = 30;
      float vMargin = 30;
      Document document = new Document(PDRectangle.A6, hMargin, hMargin,
      vMargin, vMargin);
      document.getPDDocument().getPage(0).setRotation(90);

      but i got out of index exeption becouse in this moment the document doesn't have pages i try to acomodate the

      document.getPDDocument().getPage(0).setRotation(90);

      but is the same exept in the render listener but when i add the rotation in here it creates the page in landscape but with all the content rotated like in this file

      https://drive.google.com/file/d/0B49ypOKAaMl6VGJoX3U2Wmdlbjg/view?usp=sharing

      how do make a page landscape?

      beforehand thanks

      Delete
    4. Hi,
      I've opened an issue on that.

      In the meantime you may use the following workaround. As far as I know, you can either use rotation or a different media box to render landscape, according to the PDF docs this is equivalent. Given that, you can do the following:

      final PDRectangle A4_LANDSCAPE = new
      PDRectangle(Constants.A4.getHeight(), Constants.A4.getWidth());

      document.addRenderListener(new RenderListener() {

      @Override
      public void beforePage(RenderContext renderContext) throws IOException {
      PDPage page = renderContext.getPage();
      page.setMediaBox(A4_LANDSCAPE);
      renderContext.resetPositionToUpperLeft();
      }

      ...
      });

      Watch the resetPositionToUpperLeft(), that's necessary to make it (currently) work.

      Regards
      Ralf

      Delete
    5. Hi works wonderfully, yo work is awesome thanks alot.

      Delete
  3. First of all i want to thank you for your work I use the library for all my PDF needs and it's wonderful.

    I came across to a problem I don't think is from the library but I think your the best person to ask.
    When I use overlay in a landscape page the background PDF that I use gets rotated in 90° degrees.
    Even if I rotate the original PDF of the background the effect is the same

    hhere is an example https://drive.google.com/file/d/0B49ypOKAaMl6ckJEOEJSVDNhLXc/view?usp=sharing

    Do you know a way to fix this?

    ReplyDelete
  4. here is my class https://raw.githubusercontent.com/corrortiz/ConCons/master/src/com/aohys/copiaIMSS/Utilidades/Reportes/MedicamentosPDF.java

    ReplyDelete
    Replies
    1. Well, if you create a Page format using PageFormat.with().A5().landscape() this actually means: create an A5 portrait and rotate by 90 degrees. That's quite perfect for the PDF standard, but you also create a format using the A5 landscape dimesions directly without rotation. I guess that is the case for the PDF you are using for the overlay... and it gets rotated by the first PDFs rotation matrix.

      Try creating your PDF using the landscape dimensions directly without rotation:
      PageFormat.with().mediaBox(new PDRectangle( Constants.A5.getHeight(), Constants.A5.getWidth()))

      Does that help?

      Regards
      Ralf

      Delete
    2. If this does not help, would you mind passing me your overlay PDF in order to reproduce the problem?

      Delete
    3. Your code solve the issue, here is how I implement the code:

      PageFormat a5_landscape =
      PageFormat.with().mediaBox(
      new PDRectangle( Constants.A5.getHeight(),Constants.A5.getWidth()))
      .orientation(Orientation.Landscape)
      .margins(hMargin, hMargin, 100f, 130f)
      .build();

      Document document = new Document(a5_landscape);

      Putting code here hurts the eye °w°, thanks for everything and if I could help I some way I would love to

      Delete
  5. PDFClown provides that kind of features, but hasn't seen any commit for more than a year. I'm so glad I found your library! Thank you very much for your work!

    ReplyDelete
  6. Thanks for your work in this library. Is there any way make paragraphs justified?

    ReplyDelete
  7. Hi,

    thank you for you library, it is very usefull.
    do you have an example showing how to draw a line?

    thanks in advance!

    ReplyDelete
  8. Hi,

    Does Markup support html tags?

    thanks!

    ReplyDelete
  9. Dear Ralf,

    first of all, I want to thank you for this great job. We are currently using the pdfbox2-layout.

    Subject to your pagination mechanism, it is mentioned that the text is simply split by lines. Fair enough. In the practice, we add text or markups to a paragraph and the latter is added to the document. Is there a way to indicate the paragraph cannot be divided on multiple pages and should then consider as a block?

    Thank you very much.

    Best

    ReplyDelete
  10. Thank you so much for this! Its really awesome and helps to create PDF's quickly. Love your work :)

    ReplyDelete
  11. Thank you so much for this lovely wrapper on top of PDFBox. I am looking for some help on a PDF that I am generating. Its a combination of text and some form fields. How can I generate that using PDFBoxLayout?

    I am trying with getting instance of PDDocument via document.getPDDocument() and then adding PDAcroform, however it does not seem to work. Is there a sample that I can refer to? Please suggest, thanks for your amazing work. Much appreciated.

    ReplyDelete