Is URLEncoder.encode (string, & ldquo; UTF-8 & rdquo;) a bad validation?


In a portion of my J2EE/java code, I do a URLEncoding on the output of getRequestURI() to sanitize it to prevent XSS attacks, but Fortify SCA considers that poor validation.


The key point is that you need to convert HTML special characters to HTML entities. This is also called "HTML escaping" or "XML escaping". Basically, the characters <, >, ", & and ' needs to be replaced by &lt;, &gt;, &quot;, &amp; and '.

URL encoding does not do that. URL encoding converts URL special characters to percent-encoded values. This is not HTML escaping.

In case of web applications, HTML escaping is normally to be done in the view side, exactly there where you're redisplaying user-controlled input. In case of a Java EE web applications, that depends on the view technology you're using.

  1. If the webapp is using modern Facelets view technology, then you don't need to escape it yourself. Facelets will already implicitly do that.

  2. If the webapp is using legacy JSP view technology, then you need to ensure that you're using JSTL <c:out> tag or fn:escapeXml() function to redisplay user-controlled input.

    <c:out value="${}" />
    <input type="text" name="foo" value="${fn:escapeXml(}" />
  3. If the webapp is very legacy or bad designed and using servlets or scriptlets to print HTML, then you've a bigger problem. There are no builtin tags or functions, let alone Java methods which can escape HTML entities. You should either write some escape() method yourself or use the Apache Commons Lang StringEscapeUtils#escapeHtml() for this. Then you need to ensure that you're using it everywhere you're printing user-controlled input.

    out.print("<p>" + StringEscapeUtils.escapeHtml(request.getParameter("foo")) + "</p>");

    Much better would be to redesign that legacy webapp to use JSP with JSTL.