Convert Pdf to Word in Java Example

Convert Pdf to Word in Java:

Required Jars:
1. itextpdf-5.4.4
2. xmlbeans-xpath-2.3.0
3. xmlbeans-2.6.0
4. poi-3.9
5. dom4j-1.6.1
6. poi-ooxml-schemas-3.7
7. poi-ooxml-3.7

Java Program:

package in.javadomain;

import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.xwpf.usermodel.BreakType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;

public class ConvertPdf2Word {

	public static void main(String[] args) throws IOException {
		System.out.println("Document converted started");
		XWPFDocument doc = new XWPFDocument();
		String pdf = "D:\\javadomain.pdf";
		PdfReader reader = new PdfReader(pdf);
		PdfReaderContentParser parser = new PdfReaderContentParser(reader);
		for (int i = 1; i <= reader.getNumberOfPages(); i++) {
			TextExtractionStrategy strategy = parser.processContent(i,
					new SimpleTextExtractionStrategy());
			String text = strategy.getResultantText();
			XWPFParagraph p = doc.createParagraph();
			XWPFRun run = p.createRun();
			run.setText(text);
			run.addBreak(BreakType.PAGE);
		}
		FileOutputStream out = new FileOutputStream("D:\\javadomain.docx");
		doc.write(out);
		out.close();
		reader.close();
		System.out.println("Document converted successfully");
	}
}

 

Input: [pdf file]
pdf input

 

Output: [word file]
word output

Recommended Books:

1,620 total views, 14 views today

8 comments

  • dhanush

    Can you please send me te code to convert from word to pdf using itext..

    • Naveen

      Source code provided in the post itself. Are you facing any issue ? if so please post the errors here to look and solve it.

  • Dhanush

    Can you please send me the code to convert from doc to pdf…

    • Naveen

      Source code provided in the post itself. Are you facing any issue ? if so please post the errors here to look and solve it.

  • poonam

    I am unable to get the exact format from a pdf to doc or docx if the pdf is in a tabular format.
    The structure gets distorted. Can you please help.

  • Docx4j is the only open source api which is efficient in converting docx to pdf without compromising the format and styling but catch there is it does not handle space and tabs in documents which keeps the problem unsolved. So I have been doing a lot of research in this area, I have not been able to find a single perfect api in java which converts doc or docx to pdf without compromising the format and styling.

  • I'm not a developer, i always use this free online pdf to word converter(http://www.online-code.net/pdf-to-word.html) to convert pdf to word online.

  • Bavaraj

    i amusing your code but getting the following error.can u plz help me
    Exception in thread “main” java.lang.NoSuchMethodError: org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR.getPictList()Ljava/util/List;
    at org.apache.poi.xwpf.usermodel.XWPFRun.(XWPFRun.java:75)
    at org.apache.poi.xwpf.usermodel.XWPFParagraph.createRun(XWPFParagraph.java:266)
    at com.tcs.ConvertPdf2Word.main(ConvertPdf2Word.java:27)

Leave a Reply

Your email address will not be published. Required fields are marked *