MySQL: Replacing URL Escape Sequences

Posted in: MySQL, Technical Track

So you want to store URLs in MySQL, and the URLs have those annoying %20%27%7C%26%5E%2B%2D%25 symbols? And you want to be able to show your users some kind of human-readable information. You might want to consider using this trick. Take this list of commonly escaped characters as an example:

%20 - space
%27 - '
%7C - |
%26 - &
%5E - ^
%2B - +
%2D - -
%25 - %

So, how about we do some search’n’replace on that?

mysql> SET @url:='%20%27%7C%26%5E%2B%2D%25';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @url as original,
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),  -- REPLACE() is case sensitive
    ->        '%7c','|'),  -- so we have 
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),  -- to repeat 
    ->        '%2D','-'),
    ->        '%2d','-'),  -- ourselves
    ->        '%2B','+'),
    ->        '%2b','+'),  -- sometimes
    ->        '%25','%') as replaced;
+--------------------------+----------+
| original                 | replaced |
+--------------------------+----------+
| %20%27%7C%26%5E%2B%2D%25 |  '|&^+-% | 
+--------------------------+----------+
1 row in set (0.01 sec)

mysql>

We can easily turn this into a function:

mysql> CREATE DATABASE foo;
Query OK, 1 row affected (0.03 sec)

mysql> USE foo;
Database changed
mysql> CREATE FUNCTION url_replace 
    -> (url VARCHAR(1024))
    -> RETURNS VARCHAR(1024)
    -> BEGIN
    -> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),
    ->        '%7c','|'),
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),
    ->        '%2D','-'),
    ->        '%2d','-'),
    ->        '%2B','+'),
    ->        '%2b','+'),
    ->        '%25','%');
    -> END$$
Query OK, 0 rows affected (0.02 sec)

mysql> DELIMITER ;

mysql> SELECT url_replace('%20%27%7C%26%5E%2B%2D%25') as bar;
+----------+
| bar      |
+----------+
|  '|&^+-% | 
+----------+
1 row in set (0.00 sec)

mysql>

I hope this is useful to you guys. Cheers!

Useful links:

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

5 Comments. Leave new

Hi!

It’s a nice try, but I don’t think this covers all cases. Lookhere (rfc 1738):

“…octets may be encoded by a character triplet consisting of the character “%” followed by the two hexadecimal digits…

and


Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

So, this basically says that *any* character may be encoded, and that some *must* be encoded. So in order to do it right, you actually have to decode every occurrence of %HH.

In addition, (can’t find which RFC that is), the space character (ascii 32) maybe encoded as +

As far as I can see the only way to do this reliably and correct is to loop over all characters. Ugly, slow…but correct. So I guess I would recommend everybody to deal with this in the application before putting it into the db. That, or write a UDF (for performance reasons).

kind regards,
Roland Bouman

Reply

@Roland,

Thanks for your feedback! Indeed, replacing every occurrence of %HH is a requirement for full compliance (as well as parsing correctly the space character). This snippet is an (incomplete) example.

Cheers!

Reply

Yes not complete. The example as given doesn’t parse in MySQL.

Reply
walmley web design
February 1, 2011 8:34 am

I recently wanted to replace a string within MySQL on the fly, but the field could contain 2 items. So I wrapped a REPLACE() within a REPLACE(), such as:

REPLACE(REPLACE(field_name, “what we are looking for”, “replace first instance”), “something else we are looking for”, “replace second instance”)

This is the syntax I used to detect a boolean value:

REPLACE(REPLACE(field, 1, “Yes”), 0, “No”)

Hope this helps!

Reply

It is a bit heavy. In my case, I put a REPLACE() within a CONCAT_WS() so I can browse all my field within a same space. One level of REPLACE() works fine but as soon as I add a second REPLACE(), it crash. could it be posssible to do a PHP function to use within the CONCAT_WS() to works like a REPLACE() ?

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *