Embracing the Messiness in Search of Epic Solutions

Java: Properly Indenting XML String

Posted

in

PROBLEM

Let’s assume we want to perform a pretty print on the following XML string:-

<root>
    
<name>Coco Puff</name>
        <total>10</total>    </root>

We have the following code to create a formatted XML string:-

String xml = "<root>" +
             "\n   " +
             "\n<name>Coco Puff</name>" +
             "\n        <total>10</total>    </root>";

try {
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

    StringWriter stringWriter = new StringWriter();
    StreamResult streamResult = new StreamResult(stringWriter);

    transformer.transform(new StringSource(xml), streamResult);

    System.out.println(stringWriter.toString());
}
catch (Exception e) {
    e.printStackTrace();
}

However, the generated output looks like just the original XML string:-

<root>
    
<name>Coco Puff</name>
        <total>10</total>    </root>

Why?

SOLUTION

You may think the above code is wrong, but it is not. Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.

String xml = "<root>" +
             "\n   " +
             "\n<name>Coco Puff</name>" +
             "\n        <total>10</total>    </root>";

try {
    Document document = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder()
            .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

    XPath xPath = XPathFactory.newInstance().newXPath();
    NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                  document,
                                                  XPathConstants.NODESET);

    for (int i = 0; i &lt; nodeList.getLength(); ++i) {
        Node node = nodeList.item(i);
        node.getParentNode().removeChild(node);
    }

    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

    StringWriter stringWriter = new StringWriter();
    StreamResult streamResult = new StreamResult(stringWriter);

    transformer.transform(new DOMSource(document), streamResult);

    System.out.println(stringWriter.toString());
}
catch (Exception e) {
    e.printStackTrace();
}

When we run the above code, we will get the intended output:-

<root>
    <name>Coco Puff</name>
    <total>10</total>
</root>

Tags:

Comments

2 responses to “Java: Properly Indenting XML String”

  1. Valentyn Kolesnikov Avatar
    Valentyn Kolesnikov

    Underscore-java library has a static method U.formatXml(xml).

  2. Jan Mehlhorn Avatar
    Jan Mehlhorn

    Perfect. Had the same problem.
    Worked like a charm.

Leave a Reply