Background
In previous post I had shown how to parse XML from String -
So lets parse a simple String that contains Google Play App name. Eg - - <AppName>Temple Run</AppName>
public static void main(String args[]) throws SAXException, IOException, ParserConfigurationException { String xmlString = "<AppName>Temple Run</AppName>"; DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document doc = db.parse(new InputSource(new StringReader(xmlString))); System.out.println(doc.getFirstChild().getTextContent()); }
and output is as expected - Temple Run
Now lets change out input xml string/ app name as follows -
- <AppName>Angels & Demons</AppName>
Run the code again with above xml String input. You will get following Exception -
Reason being '&' is a special character and you need to escape it in a String before parsing the String as XML. Same goes for HTML as well. Special characters like '&' should be escaped. '&'
in it's escaped form looks like '&'. So the input should be something like -
Reason for escaping these so called special characters is that these have special meaning in XML and when used in data will led to parsing errors as the one show in the code snippet above. For example & character is used to import other XML entities.
[Fatal Error] :1:18: The entity name must immediately follow the '&' in the entity reference. Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 18; The entity name must immediately follow the '&' in the entity reference. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) at StringXMLParser.main(StringXMLParser.java:20)
Reason being '&' is a special character and you need to escape it in a String before parsing the String as XML. Same goes for HTML as well. Special characters like '&' should be escaped. '&'
in it's escaped form looks like '&'. So the input should be something like -
- <AppName>Angels '& Demons</AppName>
Special Characters in XML
Special characters in XML are -
- & - &
- < - <
- > - >
- " - "
- ' - '
Reason for escaping these so called special characters is that these have special meaning in XML and when used in data will led to parsing errors as the one show in the code snippet above. For example & character is used to import other XML entities.
Escaping Input for XML in Java
You can very well write your own piece of code to parse these special characters from the input and replace them with their escaped version. For this tutorial I am going to use Apache commons lang’s StringEscapeUtils class which provide escaping for several languages like XML, SQL and HTML.
As usual I am using Ivy as my dependency manager and Eclipse as my IDE. To install and configure Apache Ivy refer to the link provided in "Related Links" section at the bottom.
My ivy file looks like following -
<ivy-module version="2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ant.apache.org/ivy/schemas/ivy.xsd"> <info organisation="OpenSourceForGeeks" module="XMLEscaper" status="integration"> </info> <dependencies> <dependency org="org.apache.commons" name="commons-lang3" rev="3.3.2"/> </dependencies> </ivy-module>
now lets get to the code -
import java.io.IOException; import java.io.StringReader; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import org.apache.commons.lang3.StringEscapeUtils; import org.w3c.dom.Document; import org.xml.sax.InputSource; import org.xml.sax.SAXException; public class StringXMLParser { public static void main(String args[]) throws SAXException, IOException, ParserConfigurationException { String appNameInput = "Angels & Demons"; System.out.println("App Name Before Escaping : " + appNameInput); String escapedInput = StringEscapeUtils.escapeXml(appNameInput); System.out.println("App Name After Escaping : " + escapedInput); String xmlString = "<AppName>" + escapedInput + "</AppName>"; DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document doc = db.parse(new InputSource(new StringReader(xmlString))); System.out.println(doc.getFirstChild().getTextContent()); } }
Compile and run above code. You should get the following output -
App Name Before Escaping : Angels & Demons
App Name After Escaping : Angels & Demons
Angels & Demons
App Name After Escaping : Angels & Demons
Angels & Demons
No Exception. I have just shown this demo for '&' special character but you can do the same for all special characters mentioned in "Special Characters in XML" section above.
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.