Beware: WebUtility.UrlEncode vs HttpUtility.UrlEncode

Whilst experimenting with hash-based message authentication code (HMAC) request signing for a REST API I’m working on, I noticed that sometimes a signature would fail to validate server side, despite the hashing algorithm on both ends following the exact same algorithm. Upon closer inspection, it turned out that the client side URL encoding method was returning lowercase HEX values and the server side, when computing the string for hashing, was returning uppercase HEX values. Fair enough! The client was written in PHP and the server is .Net. There’s no standard requirement for the casing of url encoded values so some differences should be expected across platforms.

But would you expect that difference between two methods in the .Net framework? Experience tells you yes, but we can be hopeful all the same!

In most cases it makes no difference if you have %AA or %aa, but when you’re hashing a string, it makes all the difference. Armed with this knowledge, I just make a requirement that all URL encoded values used to calculate signatures are to be in uppercase a la Amazon’s approach (scroll down to the table under the heading ‘Calculating a Signature’). For those of you screaming that lowercase is generally safer than uppercase, in this case we’re ok because we’re only concerned with HEX characters (A-F) and at that point in time, it wasn’t practical to re-write the server side processing just for this one test.

Fast forward a few months and I’m at a stage where I need to use this approach for real. In this case it’s for calls between an ASP.Net MVC app and a WCF service, both .Net 4.5 projects running on Azure. I moved the code for calculating and verifying signatures in to a common library so that it can be shared between the roles. I didn’t want to have to include the System.Web assembly in this shared assembly just for the sake of the encode/decode methods and found that in .Net 4.5, the System.Net.WebUtility (introduced in .Net 4.0) class gained methods for encoding and decoding URLs. Perfect! Except now all my tests failed. This code worked perfectly in my tests, but now the signature was failing.

In my test, I created the hash using the System.Web.HttpUtility.UrlEncode as it was used in other places already, but when I changed it to use the System.Net.WebUtility.UrlEncode, the tests passed. WTF. We have two methods, both native .Net code that are encoding URL values. What causes the difference?

Digging in to the code for each implementation, I found that each version uses its own method for converting an int to a hex value (and a few other utility functions). Take a look at the code for each of those methods…

System.Net.WebUtility:

private static char IntToHex(int n)
{
    if (n <= 9)
        return (char) (n + 48);
    else
        return (char) (n - 10 + 65);
}

System.Web.Util.HttpEncoderUtility (an internal class):

public static char IntToHex(int n)
{
    if (n <= 9)
        return (char) (n + 48);
    else
        return (char) (n - 10 + 97);
}

Spot the difference? That ‘+ 65’ in the first code block will result in an uppercase alpha character, but ‘+ 97’ will result in an lowercase character. You can perform a very simple test and see the results for yourself. The following code:

var test1 = WebUtility.UrlEncode("http://www.test.com/?param1=22&param2=there@is<a space");
var test2 = HttpUtility.UrlEncode("http://www.test.com/?param1=22&param2=there@is<a space");

Will result in:

test1 -> http%3A%2F%2Fwww.test.com%2F%3Fparam1%3D22%26param2%3Dthere%40is%3Ca+space
test2 -> http%3a%2f%2fwww.test.com%2f%3fparam1%3d22%26param2%3dthere%40is%3ca+space

Curiously, the help pages for both methods states the following:

For example, when embedded in a block of text to be transmitted in a URL, the characters < and > are encoded as %3c and %3e.

Which would lead one to suggest that both methods return lowercase characters for any hex values.

So what’s the solution here? Well you can just choose one method and stick to it, hoping the implementation never changes in the future. Or, you can massage the result to fit your requirements and make your code resilient to changes in casing. E.g.

var queryString = WebUtility.UrlEncode("http://www.test.com/?key1=something@something<something!");
queryString = Regex.Replace(queryString, "(%[0-9A-F]{2})", c => c.Value.ToLowerInvariant());

Remember that you still need to choose a casing and stick to it, this regex won’t help you if your client creates a signature based on lowercase HEX values and your server uses upper case.

For those of you still reading, another two gotchas when encoding / decoding URL values.

  1. When deciding on which of the two implementations to use, HttpUtility has an overload that allows you to specify which encoding to use whereas WebUtility will always use UTF-8. HttpUtility uses UTF-8 by default, but in some cases you may need to override this.
  2. You also need to be aware that some implementations will convert a space character in to a ‘+’ and others will use ‘%20’. The ‘+’ is generally used in form data, although both of the above .Net methods above use ‘+’.

Update – 26th May 2014

Thanks to Reddit user DaRKoN_ for mentioning this:

“MVC 6 has no dependency on System.Web. The result is a leaner framework, with faster startup time and lower memory consumption.”
http://www.asp.net/vnext/overview/aspnet-vnext/overview

This will help reduce the confusion for newer projects!

3 thoughts on “Beware: WebUtility.UrlEncode vs HttpUtility.UrlEncode

  1. You mention that solution is to use a regular expression as below:
    Regex.Replace(queryString, “(%[0-9a-f]{2})”;, c => c.Value.ToLowerInvariant());

    But the “[0-9a-f]” you might need to change to “[0-9A-F]”.

    Thanks for you solution, it help me a lot!

    • Hi Louis,

      Thanks for the comment. You’re right, the regular expression will depend on which version of the UrlEncode you used. Thanks for highlighting that.

  2. We have some silliness with WebUtility.URLEncode(). Take a string with some patterns like # for a large string. This string will be acceptable in windows. But the encoding will not work.

Leave a Reply

Your email address will not be published. Required fields are marked *