PDF text layout made easy with PDFBox-Layout
More than a decade ago I was using iText to create PDF documents from scratch. It was quite easy to use, and did all the stuff I needed like organizing text in paragraphs, performing word wrapping and marking up text with bold and italic. But once upon a time Bruno Lowagie - the developer of iText - switched from open source to a proprietary license for reasons I do understand.
So when I now had to do some PDF processing for a new project, I was looking for an alternative. PDFBox is definitely the best open source choice, since it is quite mature.But when I was searching on how to do layout, I found a lot of people looking for exactly those features, and the common answer was: you have to do it on your own! Say what? Ouch. There must be someone out there, who already wrote that stuff... Sure there is, but google did not find him. So I started to write some simple word wrapping. And some simple pagination. And some simple markup for easy highlighting with bold an italic. Don't get me wrong: the stuff I wrote is neither sophisticated nor complete. It is drop dead simple, and does the things I need. But just in case someone out there may find it useful, I made it public under MIT license on GitHub.
Let's start with a simple example:
We start with creating a Document, which acts as a container for elements like e.g. paragraphs. You specify the media box - A4 in this case - and the left, right, top and bottom margin of the document. The margins are applied to each page. After that we create a paragraph which is a container for text fragments. We add a text
If you do not specify an explicit max width, the documents media box and the margins dictate the max width for a paragraph. Means: you may just write text, write text and more text without the need for any line beaks, and the layout will do the word wrapping in order to fit the paragraph into the page boundaries.
The alignment tells the draw method what to do with extra horizontal space, where the extra space is the difference between the width of the text container and the line. This means, that the alignment is effective only in case of multiple lines. Currently,
You can combine text- and paragraph-alignment anyway you want:
An alternative to the vertical layout is the Column-Layout, which allows you to arrange the paragraphs in multiple columns on a page.
But you may also set an absolute position on an element. If this is set, the layout will ignore this element, and render it directly at the given position:
"Markup supports bold, italic, and even mixed markup."
If you want to do that using the standard API, it would look like this:
That's annoying, isn't it? That's what the markup API is intended for. Use
To make things even more easy, you may specify only the font family instead:
So when I now had to do some PDF processing for a new project, I was looking for an alternative. PDFBox is definitely the best open source choice, since it is quite mature.But when I was searching on how to do layout, I found a lot of people looking for exactly those features, and the common answer was: you have to do it on your own! Say what? Ouch. There must be someone out there, who already wrote that stuff... Sure there is, but google did not find him. So I started to write some simple word wrapping. And some simple pagination. And some simple markup for easy highlighting with bold an italic. Don't get me wrong: the stuff I wrote is neither sophisticated nor complete. It is drop dead simple, and does the things I need. But just in case someone out there may find it useful, I made it public under MIT license on GitHub.
PDFBox-Layout
PDFBox-Layout acts as a layer on top of PDFBox that performs some basic layout operations for you:- word wrapping
- text alignment
- paragraphs
- pagination
The Text Layout API
The text layout API is thought for direct usage with the low level PDFBox API. You may organize text into blocks, do word wrapping, alignment, and highlight text with markup. Means: most features described in the remainder of this article may be used directly with PDFBox without the document layout API. For more details on this API see the Text API Wiki page. What the document layout API gives you as a surplus, is paragraph layout and pagination.The Document Layout API
The ingredients of the document layout API are documents, paragraphs and layouts. It is thought to easily create complete PDF documents from scratch, and performs things like word-wrapping, paragraph layout and pagination for you.Let's start with a simple example:
document = new Document(Constants.A4, 40, 60, 40, 60); Paragraph paragraph = new Paragraph(); paragraph.addText("Hello Document", 20, PDType1Font.HELVETICA); document.add(paragraph); final OutputStream outputStream = new FileOutputStream("hellodoc.pdf"); document.save(outputStream);
We start with creating a Document, which acts as a container for elements like e.g. paragraphs. You specify the media box - A4 in this case - and the left, right, top and bottom margin of the document. The margins are applied to each page. After that we create a paragraph which is a container for text fragments. We add a text
"Hello Document"
with the font type HELVETICA
and size 20
to the paragraph. That's it, let's save it to a file. The result looks like this:Word Wrapping
As already said, you can also perform word wrapping with PDFBox-Layout. Just use the methodsetMaxWidth()
to set a maximum width, and the text container will do its best to not exceed the maximum width by word wrapping the text:Paragraph paragraph = new Paragraph(); paragraph.addText( "This is some slightly longer text wrapped to a width of 100.", 11, PDType1Font.HELVETICA); paragraph.setMaxWidth(100); document.add(paragraph);
If you do not specify an explicit max width, the documents media box and the margins dictate the max width for a paragraph. Means: you may just write text, write text and more text without the need for any line beaks, and the layout will do the word wrapping in order to fit the paragraph into the page boundaries.
Text-Alignment
As you might have already seen, you can specify a text alignment on the paragraph:Paragraph paragraph = new Paragraph(); paragraph.addText( "This is some slightly longer text wrapped to a width of 100.", paragraph.setMaxWidth(100); paragraph.setAlignment(Alignment.Right); document.add(paragraph);
The alignment tells the draw method what to do with extra horizontal space, where the extra space is the difference between the width of the text container and the line. This means, that the alignment is effective only in case of multiple lines. Currently,
Left
, Center
and Right
alignment is supported. Layout
The paragraphs in a document are sized and positioned using a layout strategy. By default, paragraphs are stacked vertically by the VerticalLayout. If a paragraph’s width is smaller than the page width, you can specify an alignment with a layout hint:document.add(paragraph, new VerticalLayoutHint(Alignment.Left, 10, 10, 20, 0));
You can combine text- and paragraph-alignment anyway you want:
An alternative to the vertical layout is the Column-Layout, which allows you to arrange the paragraphs in multiple columns on a page.
Document document = new Document(Constants.A4, 40, 60, 40, 60); Paragraph title = new Paragraph(); title.addMarkup("*This Text is organized in Colums*", 20, BaseFont.Times); document.add(title, VerticalLayoutHint.CENTER); document.add(new VerticalSpacer(5)); // use column layout from now on document.add(new ColumnLayout(2, 10)); Paragraph paragraph1 = new Paragraph(); paragraph1.addMarkup(text1, 11, BaseFont.Times); document.add(paragraph1); ...
But you may also set an absolute position on an element. If this is set, the layout will ignore this element, and render it directly at the given position:
Paragraph footer = new Paragraph(); footer.addMarkup("This is some example footer", 6, BaseFont.Times); paragraph.setAbsolutePosition(new Position(20, 20)); document.add(paragraph);
Pagination
As you add more and more paragraphs to the document, the layout automatically creates a new page if the content does not fit completely on the current page. Elements have different strategies how they will divide on multiple pages. Text is simply split by lines. Images may decide to either split, or - if they fit completely on the next page - to introduce some vertical spacer in order to be drawn on the next page. Anyway, you can always insert aNEW_PAGE
element to trigger a new page. Markup
Often you want use just some basic text styling: use a bold font here, some words emphasized with italic there, and that's it. Let's say we want to use different font types for the following sentence:"Markup supports bold, italic, and even mixed markup."
If you want to do that using the standard API, it would look like this:
Paragraph paragraph = new Paragraph(); paragraph.addText("Markup supports ", 11, PDType1Font.HELVETICA); paragraph.addText("bold", 11, PDType1Font.HELVETICA_BOLD); paragraph.addText(", ", 11, PDType1Font.HELVETICA); paragraph.addText("italic", 11, PDType1Font.HELVETICA_OBLIQUE); paragraph.addText(", and ", 11, PDType1Font.HELVETICA); paragraph.addText("even ", 11, PDType1Font.HELVETICA_BOLD); paragraph.addText("mixed", 11, PDType1Font.HELVETICA_BOLD_OBLIQUE); paragraph.addText(" markup", 11, PDType1Font.HELVETICA_OBLIQUE); paragraph.addText(".\n", 11, PDType1Font.HELVETICA); document.add(paragraph);
That's annoying, isn't it? That's what the markup API is intended for. Use
*
to mark bold content, and _
for italic. Let's do the same example with markup: Paragraph paragraph = new Paragraph(); paragraph.addMarkup( "Markup supports *bold*, _italic_, and *even _mixed* markup_.\n", 11, PDType1Font.HELVETICA, PDType1Font.HELVETICA_BOLD, PDType1Font.HELVETICA_OBLIQUE, PDType1Font.HELVETICA_BOLD_OBLIQUE); document.add(paragraph);
To make things even more easy, you may specify only the font family instead:
paragraph = new Paragraph(); paragraph.addMarkup( "Markup supports *bold*, _italic_, and *even _mixed* markup_.\n", 11, BaseFont.Helvetica);
Hi in first thanks for the Library it helps alot, but i have a problem how to change the BaseFont for some Font i provide?
ReplyDeleteHi, it's very nice to heir that it is actually helpful :-)
DeleteConcerning your question: I guess you are targeting the Paragraph.addMarkup() method, is that correct? If so, there is an additional signature that allows you to specify every single (PD)Font directly; the signature with the font family is just a short cut (see the last two examples in the article).
Or do you want to define your own BaseFonts resp. font families? If this is what you want, I would ask you to open an issue and describe your needs.
Best regards
Ralf
Hi normally i do something like this to load a new font
ReplyDeleteFile file = new File("Verdana.ttf");
PDFont font = PDTrueTypeFont.load(document, file, true);
but this is not alload whit the library because this declaration
Document document = new Document(Constants.A4, hMargin, hMargin,
vMargin, vMargin);
for loading a true type font it needs a PDDocument type of file, so my question is how to load a .ttf file font and to use in paragraph.addMarkup()
beforehand thank you very much
Hi,
DeleteI just added a method Document.getPDDocument() that makes the PDDocument available before rendering. I guess this is necessary for any task where you need the PDDocument before rendering. Using that I was able to load and use any TTF.
I will make a release the next days, I'm still working on the topic indentation and lists, which was much more work than I thought ;-)
Hope this helps
Ralf
Release 0.6.0 has just been published.
DeleteWow thanks alot woorks like a charm, my new questions is how i make a page landscape normally this is something easy like this.
Deletedoc = new PDDocument();
PDFont font = PDType1Font.HELVETICA;
PDPage page = new PDPage();
page.setMediaBox(PDPage.PAGE_SIZE_A4);
page.setRotation(90);
doc.addPage(page);
i try to do the same thing with this
float hMargin = 30;
float vMargin = 30;
Document document = new Document(PDRectangle.A6, hMargin, hMargin,
vMargin, vMargin);
document.getPDDocument().getPage(0).setRotation(90);
but i got out of index exeption becouse in this moment the document doesn't have pages i try to acomodate the
document.getPDDocument().getPage(0).setRotation(90);
but is the same exept in the render listener but when i add the rotation in here it creates the page in landscape but with all the content rotated like in this file
https://drive.google.com/file/d/0B49ypOKAaMl6VGJoX3U2Wmdlbjg/view?usp=sharing
how do make a page landscape?
beforehand thanks
Hi,
DeleteI've opened an issue on that.
In the meantime you may use the following workaround. As far as I know, you can either use rotation or a different media box to render landscape, according to the PDF docs this is equivalent. Given that, you can do the following:
final PDRectangle A4_LANDSCAPE = new
PDRectangle(Constants.A4.getHeight(), Constants.A4.getWidth());
document.addRenderListener(new RenderListener() {
@Override
public void beforePage(RenderContext renderContext) throws IOException {
PDPage page = renderContext.getPage();
page.setMediaBox(A4_LANDSCAPE);
renderContext.resetPositionToUpperLeft();
}
...
});
Watch the resetPositionToUpperLeft(), that's necessary to make it (currently) work.
Regards
Ralf
Hi works wonderfully, yo work is awesome thanks alot.
DeleteFirst of all i want to thank you for your work I use the library for all my PDF needs and it's wonderful.
ReplyDeleteI came across to a problem I don't think is from the library but I think your the best person to ask.
When I use overlay in a landscape page the background PDF that I use gets rotated in 90° degrees.
Even if I rotate the original PDF of the background the effect is the same
hhere is an example https://drive.google.com/file/d/0B49ypOKAaMl6ckJEOEJSVDNhLXc/view?usp=sharing
Do you know a way to fix this?
here is my class https://raw.githubusercontent.com/corrortiz/ConCons/master/src/com/aohys/copiaIMSS/Utilidades/Reportes/MedicamentosPDF.java
ReplyDeleteWell, if you create a Page format using PageFormat.with().A5().landscape() this actually means: create an A5 portrait and rotate by 90 degrees. That's quite perfect for the PDF standard, but you also create a format using the A5 landscape dimesions directly without rotation. I guess that is the case for the PDF you are using for the overlay... and it gets rotated by the first PDFs rotation matrix.
DeleteTry creating your PDF using the landscape dimensions directly without rotation:
PageFormat.with().mediaBox(new PDRectangle( Constants.A5.getHeight(), Constants.A5.getWidth()))
Does that help?
Regards
Ralf
If this does not help, would you mind passing me your overlay PDF in order to reproduce the problem?
DeleteYour code solve the issue, here is how I implement the code:
DeletePageFormat a5_landscape =
PageFormat.with().mediaBox(
new PDRectangle( Constants.A5.getHeight(),Constants.A5.getWidth()))
.orientation(Orientation.Landscape)
.margins(hMargin, hMargin, 100f, 130f)
.build();
Document document = new Document(a5_landscape);
Putting code here hurts the eye °w°, thanks for everything and if I could help I some way I would love to
PDFClown provides that kind of features, but hasn't seen any commit for more than a year. I'm so glad I found your library! Thank you very much for your work!
ReplyDeleteThanks for your work in this library. Is there any way make paragraphs justified?
ReplyDeleteYou can use ALignment for that
DeleteHi,
ReplyDeletethank you for you library, it is very usefull.
do you have an example showing how to draw a line?
thanks in advance!
Hi,
ReplyDeleteDoes Markup support html tags?
thanks!
Dear Ralf,
ReplyDeletefirst of all, I want to thank you for this great job. We are currently using the pdfbox2-layout.
Subject to your pagination mechanism, it is mentioned that the text is simply split by lines. Fair enough. In the practice, we add text or markups to a paragraph and the latter is added to the document. Is there a way to indicate the paragraph cannot be divided on multiple pages and should then consider as a block?
Thank you very much.
Best
Thank you so much for this! Its really awesome and helps to create PDF's quickly. Love your work :)
ReplyDeleteThank you so much for this lovely wrapper on top of PDFBox. I am looking for some help on a PDF that I am generating. Its a combination of text and some form fields. How can I generate that using PDFBoxLayout?
ReplyDeleteI am trying with getting instance of PDDocument via document.getPDDocument() and then adding PDAcroform, however it does not seem to work. Is there a sample that I can refer to? Please suggest, thanks for your amazing work. Much appreciated.
Hi,
ReplyDeletethis library is very good. Are you still maintaining it?
Kim
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteIs it possible to add a table using PdfBox 2.0.8
ReplyDeletehow to add image using PDFBoxLayout?
ReplyDeleteI tried with ImageElement but image is not displaying in pdf file please anyone help with issue
Thanks in Advance
Hi,
DeletePlease see below snap of code. This works for me.
final ImageElement imageElement = new ImageElement(inputStream);
if (maxWidth != null) {
float scale = imageElement.getWidth() / maxWidth.floatValue();
if (scale > 1) {
imageElement.setWidth(imageElement.getWidth() / scale);
imageElement.setHeight(imageElement.getHeight() / scale);
}
}
if (alignment == null) {
this.document.add(imageElement, new VerticalLayoutHint(Alignment.Left));
} else {
this.document.add(imageElement, new VerticalLayoutHint(alignment));
}
Is it possible to add the paragraph to pdpagecontentstream ?
ReplyDeletehi,
ReplyDeleteAm trying to add custom font to version 1.0.0 pdfbox-layout like below
PDFont font = PDTrueTypeFont.loadTTF(document.getPDDocument(),file);
It is throwing null at StyledText...
@Override
public float getWidth() throws IOException {
if (width == null) {
width = getFontDescriptor().getSize()
* getFontDescriptor().getFont().getStringWidth(getText())
/ 1000;
width += leftMargin;
width += rightMargin;
}
return width;
},
There is no font in getFontDescriptor(), it contains only size..
can u help me on this Issue
Hi,
ReplyDeleteI am trying to draw a line separator in pdfbox-layout version-1.0.0.
Can you please guide how can i get dynamic "y" position to pass through drawline(x,y,x1,y1) function? After render() only I can get Position object without render null object is returned.
Hi,
ReplyDeleteCan I write text to another column directly?
My requirement is like example below in center with all colon(:) aligned in same vertical line.
Name: John
Age: 35
Address: California
Hi, Thanks a lot for your library which is excellent enhancement. Is there any way to achieve a for fully justified (left and right) paragraph like in MSWord. Please help to advice on this.
ReplyDeleteChinnaswamy
Singapore