Welcome on the iText Documentation pages. On this site, you'll find all the information you need to get started with iText:
- a plethora of examples
- the FAQ
- the keywords list
- the Javadoc API information
- a new book about iText (under construction; not available for free)
If you have comments or suggestions, please visit the contact page at itextpdf.com.
Writing documentation about a software project that is very alive and to which new functionality is added frequently, is a work that is constantly under construction. You can reward the work that is done on the documentation by buying the book about iText.
back to main pageFor instance: Can I replace one word with another in an existing PDF?
Aandi Inston answers: Change the original. Re-make the PDF. The task of editing the PDF is as grim as it appears, so it is vital to keep originals, to get originals, for editing. I mean, perhaps this was a "Microsoft Word" document first. Get the "Microsoft Word" document. Replace the word. Make the PDF one more time. If it was a different program to make the PDF, still get the original file, not the PDF. If you cannot get the original document, perhaps you can copy/paste from Acrobat into Word to make a new Word document. If you must change in Acrobat, you will need to type the word 1000 times. PDF files are not for this purpose.
Paulo Soares answers: It's not possible with any tool unless you understand or control the way the PDF was made and even then with lots of limitations. Before you say that it can be done in Acrobat, it can't. Just look at all the reflow problems.
Leonard Rosenthol answers: Changing text in a PDF is an EXTREMELY NON-TRIVIAL process for the average PDF... made even more complex if the text being changed is larger/smaller than the text it is replacing (the norm). In other words - don't even bother...
Mark Storer answers: PDF is a _DISPLAY_ format. It is designed from top to bottom to display the same way no matter what. The biggest problem faced by a general PDF editor is that there is no particular order or format requirements. Also the structure of a document gets lost.
- Format:
- Text in a PDF can be an image, a collection of lines with no apparent meaning to anything but an OCR program (or a human), or actual characters in just about any encoding you can imagine. The byte value 0x61 (ASCII 'a') might be an 'a', or it could be ANYTHING else. PDF allows each of it's font resources to define their own, possibly customized, encoding. It is also possible to draw text using 'glyph indexes' into a font's list of characters. There's no particular order to the way fonts list their characters. Anything goes. When embedding a font into a PDF, it's legal to drop anything you don't need, including the information that converts glyph indexes into characters.
- Order:
- The word on the top of the page may be the last one drawn... It's perfectly legal to draw all the 'a' characters, then all the 'b's, and so on. That isn't a realistic example, but there are PDFs out there that have drawn all the text from one particular font (bold and italic text are seperate fonts in PDF), then all the text from the next.
- Structure:
- There are no such things as paragraphs or text alignment within PDF, just specific locations. 'Draw this thing at these coordinates'. Just as there are no paragraphs within PDF, there are no tables, headers, footers, indexes, or anything else. There is only 'draw this there'.
Bruno concludes: if you are a developer and you get an assignment to edit a PDF file, don't accept the assignment. The requirement is part of a design that is completely wrong. Send the person who made the design back to the drawing board. You could throw the PDF Reference at him, but make sure not to hit any vital organs (the PDF Reference has over 1,200 pages).
