DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Exploring Hazelcast With Spring Boot
  • How To Convert HTML to PNG in Java
  • Three Ways To Separate Plain Text From HTML Using Java
  • How to Convert a PDF to Text (TXT) Using Java

Trending

  • Building an Effective Zero Trust Security Strategy for End-To-End Cyber Risk Management
  • Operational Excellence Best Practices
  • GBase 8a Implementation Guide: Resource Assessment
  • The Art of Manual Regression Testing
  1. DZone
  2. Coding
  3. Languages
  4. Reading an HTML File, Parsing It and Converting It to a PDF File With the Pdfbox Library

Reading an HTML File, Parsing It and Converting It to a PDF File With the Pdfbox Library

In this article, we will read an HTML file from a specified folder and replace variables with their actual values.

By 
Erkin Karanlık user avatar
Erkin Karanlık
·
Nov. 22, 23 · Tutorial
Like (4)
Save
Tweet
Share
7.9K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, we will ensure that the HTML file we put in a folder we specify is read and the variables in its content are parsed and replaced with their real values. Then,  I modified the HTML file with the "openhtmltopdf-pdfbox"  library. We will cover converting it to a PDF file.

First, we will read the HTML file under a folder we have determined, parse it, and pass our own dynamic values to the relevant variables in the HTML. We will convert the HTML file to PDF file using the "openhtmltopdf-pdfbox"  library in its latest updated form.

I hope it will be a reference for those who need it on this subject. You can easily do the conversion in your Java projects. You can see an example project below.

First, we will create a new input folder where we will read our input HTML file and an output folder where we will write the PDF file.

We can put the HTML file under the input folder. We define a key value to be replaced in the HTML file. This key value is given as #NAME# as an example. Optionally, you can replace the key value you want here in Java with an externally sent value.

Plain Text
 
input folder :  \ConvertHtmlToPDF\input

output folder:  \ConvertHtmlToPDF\output




HTML
 
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE html>
<html lang="tr">
  <head>
    <meta data-fr-http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </meta>
    <title>Convert to Html to Pdf</title>
    <style type="text/css">
      body {
        font-family: "Times New Roman", Times, serif;
        font-size: 40px;
      }
    </style>
  </head>
  <body topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0">
    <table width="700" border="0" cellspacing="0" cellpadding="0" style="background-color: #1859AB;
                                 
                                 color: white;
                                 font-size: 14px;
                                 border-radius: 1px;
                                 line-height: 1em; height: 30px;">
      <tbody>
        <tr>
          <td>
            <strong style="color:#F8CD00;">   Hello </strong>#NAME#
          </td>
        </tr>
      </tbody>
    </table>
  </body>
</html>


Creating a New Project

We are creating a new spring project.  I am using Intellj Idea.

Controller 

To replace a key with a value in HTML, we will send the value value from outside. We will write a rest service for this.

We create the "ConvertHtmlToPdfController.java" class under the Controller folder. We create a get method called "convertHtmlToPdf " within the Controller class. We can pass the value to this Method dynamically as follows.

Java
 
package com.works.controller;

import com.works.service.ConvertHtmlToPdfService;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("")
public class ConvertHtmlToPdfController {

    private final ConvertHtmlToPdfService convertHtmlToPdfService;

    public ConvertHtmlToPdfController(ConvertHtmlToPdfService convertHtmlToPdfService) {
        this.convertHtmlToPdfService = convertHtmlToPdfService;
    }

    @GetMapping("/convertHtmlToPdf/{variableValue}")
    public ResponseEntity<String> convertHtmlToPdf(@PathVariable @RequestBody String variableValue) {
        try {
            return ResponseEntity.ok(convertHtmlToPdfService.convertHtmlToPdf(variableValue));
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

}


Service 

Java
 
package com.works.service.impl;

import com.works.service.ConvertHtmlToPdfService;
import com.works.util.ConvertHtmlToPdfUtil;
import io.micrometer.common.util.StringUtils;
import org.springframework.stereotype.Service;

@Service("convertHtmlToPdfService")
public class ConvertHtmlToPdfServiceImpl implements ConvertHtmlToPdfService {

    private String setVariableValue(String htmlContent, String key, String value) {

        if (StringUtils.isNotEmpty(value)) {
            htmlContent = htmlContent.replaceAll("#" + key + "#", value);
        } else {
            htmlContent = htmlContent.replaceAll("#" + key + "#", "");
        }

        return htmlContent;
    }

    @Override
    public String convertHtmlToPdf(String variableValue) throws Exception {
        String inputFile = "/convertHtmlToPDF/input/input.html";
        String outputFile = "/convertHtmlToPDF/output/output.pdf";
        String fontFile = "/convertHtmlToPDF/input/times.ttf";

        try {
            String htmlContent = ConvertHtmlToPdfUtil.readFileAsString(inputFile);
            htmlContent = setVariableValue(htmlContent, "NAME", variableValue);
            ConvertHtmlToPdfUtil.htmlConvertToPdf(htmlContent, outputFile, fontFile);

        } catch (Exception e) {
            throw new Exception("convertHtmlToPdf - An error was received in the service : ", e);
        }
        return "success";
    }
}


ConvertHtmlToPdfService.java service contains the method called convertHtmlToPdf. The convertHtmlToPdf method takes string variableValue input.

In the convertHtmlToPdf service method;

The "inputFile" variable is defined to read the html file under the input folder. We can give this variable the URL of the input html file we will read.

The "outputFile" variable is defined to assign the pdf file to the output folder. We can give the output file folder url to this variable.

You can also read the font file from outside. You can get this from under the input folder. We can also assign the URL where the font file is located to the "fontFile" variable.

In the above code line, the URL of the folder containing the input is given to the "ConvertHtmlToPdfUtil.readFileAsString" method to read the HTML file in the input folder.

             String htmlContent = ConvertHtmlToPdfUtil.readFileAsString(inputFile);


Java
 
package com.works.util;

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;

public class ConvertHtmlToPdfUtil {

    public static void safeCloseBufferedReader(BufferedReader bufferedReader) throws Exception {
        try {
            if (bufferedReader != null) {
                bufferedReader.close();
            }
        } catch (IOException e) {
            throw new Exception("safeCloseBufferedReader  - the method got an error. " + e.getMessage());
        }
    }

    public static String readFileAsString(String filePath) throws Exception {
        BufferedReader br = null;
        String encoding = "UTF-8";

        try {

            br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));

            StringBuilder fileContentBuilder = new StringBuilder();
            String line;

            while ((line = br.readLine()) != null) {
                if (fileContentBuilder.length() > 0) {
                    fileContentBuilder.append(System.getProperty("line.separator"));
                }
                fileContentBuilder.append(line);
            }

            return fileContentBuilder.toString();

        } catch (Exception e) {
            new Exception("readFileAsString - the method got an error." + e.getMessage(), e);
            return null;
        } finally {
            safeCloseBufferedReader(br);
        }
    }

    public static OutputStream htmlConvertToPdf(String html, String filePath, String fonts) throws Exception {
        OutputStream os = null;
        try {
            os = new FileOutputStream(filePath);
            final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();
            pdfBuilder.useFastMode();
            pdfBuilder.withHtmlContent(html, null);
            String fontPath = fonts;
            pdfBuilder.useFont(new File(concatPath(fontPath, "times.ttf")), "Times", null, null, false);
            pdfBuilder.toStream(os);
            pdfBuilder.run();
            os.close();
        } catch (Exception e) {
            throw new Exception(e.getMessage(), e);
        } finally {
            try {
                if (os != null) {
                    os.close();
                }
            } catch (IOException e) {
            }
        }
        return os;
    }

    public static String concatPath(String path, String... subPathArr) {
        for (String subPath : subPathArr) {
            if (!path.endsWith(File.separator)) {
                path += File.separator;
            }
            path += subPath;
        }

        return path;
    }
}


The HTML file is read with FileInputStream in the ConvertHtmlToPdfUtil.readFileAsString method. It is converted into a character set with InputStreamReader and put into the internal buffer with BufferedReader.

The characters in BufferedReader are read line by line as seen in the code block below. All HTML content is thrown into the string variable. With the safeCloseBufferedReader method, we close the buffer when we are done with it. 

Plain Text
 
         br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));
            StringBuilder fileContentBuilder = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                if (fileContentBuilder.length() > 0) {
                    fileContentBuilder.append(System.getProperty("line.separator"));
                }
                fileContentBuilder.append(line);
            }
            return fileContentBuilder.toString();


We can send our HTML content to the setVariableValue method to be replaced with the value we sent to the service from outside. The key value we marked as #key# in HTML is replaced with the value value.

Plain Text
 
    private String setVariableValue(String htmlContent, String key, String value) {
        if (StringUtils.isNotEmpty(value)) {
            htmlContent = htmlContent.replaceAll("#"+key+"#", value);
        }else {
            htmlContent = htmlContent.replaceAll("#"+key+"#", "");
        }
        return htmlContent;
    }


Then, after the replacement process, we can call the ConvertHtmlToPdfUtil.htmlConvertToPdf method to produce the html URL file as pdf output. ConvertHtmlToPdfUtil.htmlConvertToPdf method can receive html content, output, and font inputs, as can be seen below.

We can pass these inputs to the method.

Plain Text
 
              ConvertHtmlToPdfUtil.htmlConvertToPdf(htmlContent, outputFile, fontFile);


ConvertHtmlToPdfUtil.htmlConvertToPdf method content;

We will create a new FileOutputStream. This will determine the creation of the output.pdf file we specified.

PdfRendererBuilder class is in the com.openhtmltopdf.pdfboxout library. Therefore, we must add this library to the pom.xml file as follows.

Plain Text
 
                 final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();


 Pom.xml

XML
 
      <dependency>
            <groupId>com.openhtmltopdf</groupId>
            <artifactId>openhtmltopdf-pdfbox</artifactId>
            <version>1.0.10</version>
        </dependency>
XML
 
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>3.1.1</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.works</groupId>
	<artifactId>convertHtmlToPDF</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>convertHtmlToPDF</name>
	<description>convertHtmlToPDF</description>
	<properties>
		<java.version>17</java.version>
		<spring-cloud.version>2022.0.3</spring-cloud.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>com.openhtmltopdf</groupId>
			<artifactId>openhtmltopdf-pdfbox</artifactId>
			<version>1.0.10</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-starter-zookeeper-discovery</artifactId>
			<version>4.0.1</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
				<configuration>
					<image>
						<builder>paketobuildpacks/builder-jammy-base:latest</builder>
					</image>
				</configuration>
			</plugin>
		</plugins>
	</build>

</project>


 After the PdfRendererBuilder object is implemented, we can set the HTML parameter to 'withHtmlContent' and the fontpath parameters to 'useFont'. We can set the output file with toStream. Finally, we can run it with the run method.

Java
 
            pdfBuilder.useFastMode();
            pdfBuilder.withHtmlContent(html, null);
            String fontPath = fonts;
            pdfBuilder.useFont(new File(concatPath(fontPath, "times.ttf")), "Times", null, null, false);
            pdfBuilder.toStream(os);
            pdfBuilder.run();


pdfBuilder.run(); After the method is run, we should see that the output.pdf file is created under the output folder.

Thus, we can see the smooth HTML to PDF conversion process with the openhtmltopdf-pdfbox library.


HTML Library PDF Plain text XML Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • Exploring Hazelcast With Spring Boot
  • How To Convert HTML to PNG in Java
  • Three Ways To Separate Plain Text From HTML Using Java
  • How to Convert a PDF to Text (TXT) Using Java

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: