So you want to store URLs in MySQL, and the URLs have those annoying %20%27%7C%26%5E%2B%2D%25
symbols? And you want to be able to show your users some kind of human-readable information. You might want to consider using this trick. Take this list of commonly escaped characters as an example:
%20 - space %27 - ' %7C - | %26 - & %5E - ^ %2B - + %2D - - %25 - %
So, how about we do some search’n’replace on that?
mysql> SET @url:='%20%27%7C%26%5E%2B%2D%25'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @url as original, -> REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( -> REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( -> @test,'%20',' '), -> '%27','\''), -> '%7C','|'), -- REPLACE() is case sensitive -> '%7c','|'), -- so we have -> '%26','&'), -> '%5E','^'), -> '%5e','^'), -- to repeat -> '%2D','-'), -> '%2d','-'), -- ourselves -> '%2B','+'), -> '%2b','+'), -- sometimes -> '%25','%') as replaced; +--------------------------+----------+ | original | replaced | +--------------------------+----------+ | %20%27%7C%26%5E%2B%2D%25 | '|&^+-% | +--------------------------+----------+ 1 row in set (0.01 sec) mysql>
We can easily turn this into a function:
mysql> CREATE DATABASE foo; Query OK, 1 row affected (0.03 sec) mysql> USE foo; Database changed mysql> CREATE FUNCTION url_replace -> (url VARCHAR(1024)) -> RETURNS VARCHAR(1024) -> BEGIN -> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( -> REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( -> @test,'%20',' '), -> '%27','\''), -> '%7C','|'), -> '%7c','|'), -> '%26','&'), -> '%5E','^'), -> '%5e','^'), -> '%2D','-'), -> '%2d','-'), -> '%2B','+'), -> '%2b','+'), -> '%25','%'); -> END$$ Query OK, 0 rows affected (0.02 sec) mysql> DELIMITER ; mysql> SELECT url_replace('%20%27%7C%26%5E%2B%2D%25') as bar; +----------+ | bar | +----------+ | '|&^+-% | +----------+ 1 row in set (0.00 sec) mysql>
I hope this is useful to you guys. Cheers!
Useful links:
5 Comments. Leave new
Hi!
It’s a nice try, but I don’t think this covers all cases. Lookhere (rfc 1738):
“…octets may be encoded by a character triplet consisting of the character “%” followed by the two hexadecimal digits…
and
”
Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.
”
So, this basically says that *any* character may be encoded, and that some *must* be encoded. So in order to do it right, you actually have to decode every occurrence of %HH.
In addition, (can’t find which RFC that is), the space character (ascii 32) maybe encoded as +
As far as I can see the only way to do this reliably and correct is to loop over all characters. Ugly, slow…but correct. So I guess I would recommend everybody to deal with this in the application before putting it into the db. That, or write a UDF (for performance reasons).
kind regards,
Roland Bouman
@Roland,
Thanks for your feedback! Indeed, replacing every occurrence of %HH is a requirement for full compliance (as well as parsing correctly the space character). This snippet is an (incomplete) example.
Cheers!
Yes not complete. The example as given doesn’t parse in MySQL.
I recently wanted to replace a string within MySQL on the fly, but the field could contain 2 items. So I wrapped a REPLACE() within a REPLACE(), such as:
REPLACE(REPLACE(field_name, “what we are looking for”, “replace first instance”), “something else we are looking for”, “replace second instance”)
This is the syntax I used to detect a boolean value:
REPLACE(REPLACE(field, 1, “Yes”), 0, “No”)
Hope this helps!
It is a bit heavy. In my case, I put a REPLACE() within a CONCAT_WS() so I can browse all my field within a same space. One level of REPLACE() works fine but as soon as I add a second REPLACE(), it crash. could it be posssible to do a PHP function to use within the CONCAT_WS() to works like a REPLACE() ?