URL Encoder/Decoder


Info

Why encode?

Sometimes, you may want to include reserved or disallowed characters in a portion of a URL. To do this without affecting any normal processing of a URL and without making the entire thing invalid, you'll need to use URL encoding.

About the process

When a character is encoded you'll notice that it's converted in to one or more "triplets". A triplet includes a % sign followed by two hexadecimal characters, e.g. FF. To give you an example, the character: = encodes to one triplet: %3D but the chinese character: encodes to three: %E4%BD%A0.

Each character can be represented by a number. Depending upon what this number is determines how many triplets are needed to represent it. This tool uses UTF-8 to determine the correct numbers (which is the recommended choice).

URL processing, reserved and unreserved characters

RFC 3986 defines 18 "reserved" characters. These characters act as delimiters and affect how URLs are processed:

:/?#[]
@!$&'(
)*+,;=

If you want to use any of these characters in a URL without them affecting URL processing you'll need to URL encode them. For example:

/* This URL uses many reserved characters */
http://toolpond.com/request.php?data=[# This needs to be URL encoded! #]

/* But all we need to encode is the final portion of this URL */
http://toolpond.com/request.php?data=%5B%23%20This%20needs%20to%20be%20URL%20encoded!%20%23%5D

RFC 3986 also defines 66 "unreserved" characters. These characters do not affect how URLs are processed in any way and can thus be used freely without ever needing to be URL encoded:

a-Z0-9-._~

The only other allowed character in URLs that hasn't been mentioned yet is the % character itself. This is only allowed when used for URL encoding. The funny thing is that this character needs to be URL encoded too. That's why when you click encode multiple times you see %2525252525.... Please note that if you use the % character in your URLs it must be followed by two uppercase hexadecimal characters.

Do I need to encode reserved characters?

You will potentially produce a malformed URL if you do not also URL encode reserved characters for the query part (or other parts) of a URL.

Notes

This tool uses UTF-8.

Spaces are often represented in URLs with a + sign. This tool offers the option to encode and decode pluses into spaces and vice versa. If you do not select this option then spaces will be encoded/decoded into/from %20 as normal.

Hexadecimal characters within triplets should be uppercase as per the spec.

Code

This tool was written in javascript. In the script below the output from the native javascript functions has been altered (updated) to comply with RFC 3986. These functions also give you the option to convert reserved characters or to use spaces as pluses.

function url_encode(text, reserved, plus) {
	if (reserved === true) {
		/* Encode reserved characters */
		var tmp = encodeURIComponent(text).replace(/[!'()]/g, escape).replace(/\*/g, "%2A");
		if (plus === true) {
			tmp = tmp.replace(/%20/g, "+");
		}
		return tmp;
	} else {
		/* Basic encode */
		var tmp = encodeURI(text).replace(/%5B/g, '[').replace(/%5D/g, ']');
		if (plus === true) {
			tmp = tmp.replace(/%20/g, "+");
		}
		return tmp;
	}
}

function url_decode(text, reserved, plus) {
	if (reserved === true) {
		/* Must convert the plus first! */
		if (plus === true) {
			text = text.replace(/\+/g, " ");
		}
		/* Decode reserved characters */
		text = decodeURIComponent(text);
		return text;
	} else {
		/* Basic decode (apply fixes) */
		text = text.replace(/%5B/g, '%255B')
		           .replace(/%5D/g, '%255D')
		           .replace(/%2A/g, '%252A')
		           .replace(/%21/g, '%2521')
		           .replace(/%27/g, '%2527')
		           .replace(/%27/g, '%2527')
		           .replace(/%28/g, '%2528')
		           .replace(/%29/g, '%2529');
		var tmp = decodeURI(text);
		if (plus === true) {
			tmp = tmp.replace(/\+/g, " ");
		}
		return tmp;
	}
}

Comments

Be the first to comment.


Anonymous

A preview of the comment you're writing will go here. Please wait a few seconds after typing.