Category Archives: XML

Java: Properly Indenting XML String

PROBLEM

Let’s assume we want to perform a pretty print on the following XML string:-

<root>
   
<name>Coco Puff</name>
        <total>10</total>    </root>

We have the following code to create a formatted XML string:-

String xml = "<root>" +
             "\n   " +
             "\n<name>Coco Puff</name>" +
             "\n        <total>10</total>    </root>";

try {
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

    StringWriter stringWriter = new StringWriter();
    StreamResult streamResult = new StreamResult(stringWriter);

    transformer.transform(new StringSource(xml), streamResult);

    System.out.println(stringWriter.toString());
}
catch (Exception e) {
    e.printStackTrace();
}

However, the generated output looks like just the original XML string:-

<root>
   
<name>Coco Puff</name>
        <total>10</total>    </root>

Why?

SOLUTION

You may think the above code is wrong, but it is not. Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.

String xml = "<root>" +
             "\n   " +
             "\n<name>Coco Puff</name>" +
             "\n        <total>10</total>    </root>";

try {
    Document document = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder()
            .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

    XPath xPath = XPathFactory.newInstance().newXPath();
    NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                  document,
                                                  XPathConstants.NODESET);

    for (int i = 0; i < nodeList.getLength(); ++i) {
        Node node = nodeList.item(i);
        node.getParentNode().removeChild(node);
    }

    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

    StringWriter stringWriter = new StringWriter();
    StreamResult streamResult = new StreamResult(stringWriter);

    transformer.transform(new DOMSource(document), streamResult);

    System.out.println(stringWriter.toString());
}
catch (Exception e) {
    e.printStackTrace();
}

When we run the above code, we will get the intended output:-

<root>
    <name>Coco Puff</name>
    <total>10</total>
</root>