If you ever participated in some forum or message board , then you know what’s BBCode.
What’s BBCode ?
BBCode is short for Bulletin Board Code and used to format posts in forums or message boards as a fast way to achieve formating without the need to add any complex HTML code beside allowing safer posts (no JS code for example). BBCode uses tags indicated by rectangular brackets surrounding a keyword. At the time of displaying as part of a webpage , those tags are converted to HTML.
A simple example of BBCode would be to make some text bold , this can be simply written like that [b]Bold Text[/b] or italic [i]italic text[/i]. In HTML , this can be done by font tags (deprecated in newer HTML versions) or CSS text properties. I tell you what! I’m writing this blog and I’m even using some BBCode tags to format text.
Case defined
One of our clients had a database that contained forum posts stored inside the database and the rows included various BBCode tags. They decided to migrate to a new platform and it was required that the BBCode stored to be migrated to its equivalent HTML ones.
The requirement was to export the data to flatfiles and the code to be migrated on the fly. Converting them at the App side wasn’t an option for the client due to the limitations of the destination platform.
Talk the talk
We determined that we need to commit the following actions :
1- Determine the BBcode tags that are stored in the table(s). This was an easy task since they had a list of supported BBCode tags.
Although not exactly the list that we had , the following page lists some of the famous BBCode operators
2- Figure out the equivalent HTML for each BBCode.
Some of the tags were pretty simple HTML code such as Bold , Italic formatting so it was easy to figure it out. Some were subject to the client direction since they have specific platform; examples were images and video posts.
We conducted some research on the different tags and made a list of the BBCode tags and their “equivalent” HTML ones.
3- After that , we needed to find out a way to convert the BBCode data to HTML on the fly while writing the output to flatfiles.
Walk the walk
Because T-SQL is a bit limited when it comes to string manipulations compared to .NET , we opted to use CLR to use .Net regular expressions and take advantage of its abilities to fast parse large amounts of text to find specific character patterns.
I’m not going to dip which is better, TSQL or CLR , to use for different tasks but it is generally accepted that CLR excels for procedural, computational work ; for data access the easier and direct T-SQL wins. The following whitepaper, Using CLR Integration in SQL Server 2005, provides more info about that.
Now we have the BBCode tag , the equivalent HTML and determined that CLR is the way to , we just need to code!
I used Visual C# express 2010 to create a class DLL so that I can use later as a CLR function. The code uses namespace System.Text.RegularExpressions for regular expression functionality.
Parts of the code I saw on some forums online, can’t recall exactly where but I wrote more to fulfill the need.
The following is the main function containing regular expressions definitions. The Whole code file is attached as a CShap file.
static Converter_bbcode() { try { BBCode_parser.Add(new RegexFormatter(@"", string.Empty)); // Font BBCode_parser.Add(new RegexFormatter(@"\[b(?:\s*)\]((.|\n)*?)\[/b(?:\s*)\]", "<strong>$1</strong>")); BBCode_parser.Add(new RegexFormatter(@"\[bold(?:\s*)\]((.|\n)*?)\[/bold(?:\s*)\]", "<strong>$1</strong>")); BBCode_parser.Add(new RegexFormatter(@"\[i(?:\s*)\]((.|\n)*?)\[/i(?:\s*)\]", "<span style="font-style: italic">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[italic(?:\s*)\]((.|\n)*?)\[/italic(?:\s*)\]", "<span style="font-style: italic">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[u(?:\s*)\]((.|\n)*?)\[/u(?:\s*)\]", "<span style="text-decoration: underline">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[underline(?:\s*)\]((.|\n)*?)\[/underline(?:\s*)\]", "<span style="text-decoration: underline">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[s(?:\s*)\]((.|\s)*?)\[/s(?:\s*)\]", "<span style="text-decoration: line-through">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[strike(?:\s*)\]((.|\s)*?)\[/strike(?:\s*)\]", "<span style="text-decoration: line-through">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[size=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/size(?:\s*)\]", "<font>$3</font>")); //color BBCode_parser.Add(new RegexFormatter(@"\[color=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/color(?:\s*)\]", "<span style="color:$1">$3</span>")); BBCode_parser.Add(new RegexFormatter(@"\[background=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/background(?:\s*)\]", "<span style="background-color:$1">$3</span>")); //highlight BBCode_parser.Add(new RegexFormatter(@"\[highlight(?:\s*)\]((.|\n)*?)\[/highlight(?:\s*)\]", "<span style="background-color:yellow">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[highlight=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/highlight(?:\s*)\]", "<span style="background-color:$1">$3</span>")); BBCode_parser.Add(new RegexFormatter(@"\[hl(?:\s*)\]((.|\n)*?)\[/hl(?:\s*)\]", "<span style="background-color:yellow">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[hl=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/hl(?:\s*)\]", "<span style="background-color:$1">$3</span>")); //blink BBCode_parser.Add(new RegexFormatter(@"\[blink(?:\s*)\]((.|\n)*?)\[/blink(?:\s*)\]", "<span style="text-decoration: blink">$1</span>")); //Align BBCode_parser.Add(new RegexFormatter(@"\[left(?:\s*)\]((.|\n)*?)\[/left(?:\s*)]", "<div style="text-align:left">$1</div>")); BBCode_parser.Add(new RegexFormatter(@"\[center(?:\s*)\]((.|\n)*?)\[/center(?:\s*)]", "<div style="text-align:center">$1</div>")); BBCode_parser.Add(new RegexFormatter(@"\[right(?:\s*)\]((.|\n)*?)\[/right(?:\s*)]", "<div style="text-align:right">$1</div>")); //URL BBCode_parser.Add(new RegexFormatter(@"\[url(?:\s*)\]www\.(.*?)\[/url(?:\s*)\]", "<a href="https://www.$1" title="$1">$1</a>")); BBCode_parser.Add(new RegexFormatter(@"\[url(?:\s*)\]((.|\n)*?)\[/url(?:\s*)\]", "<a href="$1" title="$1">$1</a>")); BBCode_parser.Add(new RegexFormatter("\\[url=(?:\"|"|")((.|\\n)*?)(?:\\s*)(?:\"|"|")\\]((.|\\n)*?)\\[/url(?:\\s*)\\]", "<a href="$1" title="$1">$3</a>")); BBCode_parser.Add(new RegexFormatter(@"\[url=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/url(?:\s*)\]", "<a href="$1" title="$1">$3</a>")); //Email BBCode_parser.Add(new RegexFormatter(@"\[email(?:\s*)\]((.|\n)*?)\[/email(?:\s*)\]", "<a href="mailto:$1">$1</a>")); BBCode_parser.Add(new RegexFormatter(@"\[email=""((.|\n)*?)(?:\s*)""\]((.|\n)*?)\[/email(?:\s*)\]", "<a href="mailto:$1" title="$3">$3</a>")); BBCode_parser.Add(new RegexFormatter(@"\[email=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/email(?:\s*)\]", "<a href="mailto:$1" title="$3">$3</a>")); //Image BBCode_parser.Add(new RegexFormatter(@"\[img(?:\s*)\]((.|\n)*?)\[/img(?:\s*)\]", "<img src="$1" alt="image" class="bbcode-image" />")); BBCode_parser.Add(new RegexFormatter(@"\[img align=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/img(?:\s*)\]", "<img src="$3" align="$1" alt="image" class="bbcode-image" />")); BBCode_parser.Add(new RegexFormatter(@"\[img=((.|\n)*?),((.|\n)*?),((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/img(?:\s*)\]", "<img width="$1" height="$3" align="$5" src="$7" alt="image" class="bbcode-image" />")); BBCode_parser.Add(new RegexFormatter(@"\[img=((.|\n)*?),((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/img(?:\s*)\]", "<img width="$1" height="$3" src="$5" alt="image" class="bbcode-image" />")); BBCode_parser.Add(new RegexFormatter(@"\[image=((.|\n)*?)(?:\s*)\]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" alt="image" class="bbcode-image" /></a></div>")); BBCode_parser.Add(new RegexFormatter(@"\[imagecenter=((.|\n)*?)(?:\s*)\]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" align="center" alt="image" class="bbcode-image" /></a></div>")); BBCode_parser.Add(new RegexFormatter(@"\[imageleft=((.|\n)*?)(?:\s*)\]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" align="left" alt="image" class="bbcode-image" /></a></div>")); BBCode_parser.Add(new RegexFormatter(@"\[imageright=((.|\n)*?)(?:\s*)\]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" align="right" alt="image" class="bbcode-image" /></a></div>")); BBCode_parser.Add(new RegexFormatter(@"\[imagelink src=((.|\n)*?)url='((.|\n)*?)\’]((.|\n)*?)\[/imagelink(?:\s*)\]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$3" /><img src="$3" alt="image" class="bbcode-image" /></a></div>")); //Flash BBCode_parser.Add(new RegexFormatter(@"\[flash(?:\s*)\]((.|\n)*?)\[/flash(?:\s*)\]", "")); BBCode_parser.Add(new RegexFormatter(@"\[flash=((.|\n)*?),((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/flash(?:\s*)\]", "")); BBCode_parser.Add(new RegexFormatter(@"\[\*(?:\s*)]\s*([^\[]*)", "<li>$1</li>")); //List BBCode_parser.Add(new RegexFormatter(@"\[list(?:\s*)\]((.|\n)*?)\[/list(?:\s*)\]", "<ul class="bbcode-list">$1</ul>")); BBCode_parser.Add(new RegexFormatter(@"\[list=1(?:\s*)\]((.|\n)*?)\[/list(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "decimal"), false)); BBCode_parser.Add(new RegexFormatter(@"\[list=i(?:\s*)\]((.|\n)*?)\[/list(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "lower-roman"), false)); BBCode_parser.Add(new RegexFormatter(@"\[list=I(?:\s*)\]((.|\n)*?)\[/list(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "upper-roman"), false)); BBCode_parser.Add(new RegexFormatter(@"\[ol(?:\s*)\]((.|\n)*?)\[/ol(?:\s*)\]", "<ul class="bbcode-list">$1</ul>")); BBCode_parser.Add(new RegexFormatter(@"\[ol=1(?:\s*)\]((.|\n)*?)\[/ol(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "decimal"), false)); BBCode_parser.Add(new RegexFormatter(@"\[ol=i(?:\s*)\]((.|\n)*?)\[/ol(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "lower-roman"), false)); BBCode_parser.Add(new RegexFormatter(@"\[ol=I(?:\s*)\]((.|\n)*?)\[/ol(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "upper-roman"), false)); BBCode_parser.Add(new RegexFormatter(@"\[ol(?:\s*)\]((.|\n)*?)\[/ol(?:\s*)\]", "<ol>$1</ol>")); BBCode_parser.Add(new RegexFormatter(@"\[dir(?:\s*)\]((.|\n)*?)\[/dir(?:\s*)\]", "$1")); BBCode_parser.Add(new RegexFormatter(@"\[li(?:\s*)\]((.|\n)*?)\[/li(?:\s*)\]", "<li>$1</li>")); BBCode_parser.Add(new RegexFormatter(@"\[ul(?:\s*)\]((.|\n)*?)\[/ul(?:\s*)\]", "<ul class="bbcode-list">$1</ul>")); BBCode_parser.Add(new RegexFormatter(@"\[ul=1(?:\s*)\]((.|\n)*?)\[/ul(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "decimal"), false)); BBCode_parser.Add(new RegexFormatter(@"\[ul=i(?:\s*)\]((.|\n)*?)\[/ul(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "lower-roman"), false)); BBCode_parser.Add(new RegexFormatter(@"\[ul=I(?:\s*)\]((.|\n)*?)\[/ul(?:\s*)\]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "upper-roman"), false)); BBCode_parser.Add(new RegexFormatter(@"\[list=a(?:\s*)\]((.|\n)*?)\[/list(?:\s*)\]", "<span>$1</span>", false)); BBCode_parser.Add(new RegexFormatter(@"\[list=A(?:\s*)\]((.|\n)*?)\[/list(?:\s*)\]", "<span>$1</span>", false)); BBCode_parser.Add(new SearchReplaceFormatter("\r", "")); BBCode_parser.Add(new SearchReplaceFormatter("\n\n", "<br /><br />")); BBCode_parser.Add(new SearchReplaceFormatter("\n", "<br />")); //code BBCode_parser.Add(new RegexFormatter(@"\[/code]((.|\n)*?)\[/code(?:\s*)]", "<div class="bbc-codetitle">Code:</div><div class="bbc-codecontent"><pre>$1</pre></div>")); BBCode_parser.Add(new RegexFormatter(@"\[php language="(?:s*)"][/php][/php]((.|\n)*?)\[/php(?:\s*)]", "<div class="bbc-codetitle">PHP Code:</div><div class="bbc-codecontent"><pre>$1</pre></div>")); //quote BBCode_parser.Add(new RegexFormatter(@"\[quote=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/quote(?:\s*)\]", "<strong>$1</strong> said: <blockquote>$3</blockquote>")); BBCode_parser.Add(new RegexFormatter(@"\[quote=((.|\n)*?)(?:\s*)\]", "<blockquote>$1</blockquote>")); BBCode_parser.Add(new RegexFormatter(@"\[quote member=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/quote(?:\s*)\]", "<strong>$1</strong> said: <blockquote>$3</blockquote>")); BBCode_parser.Add(new RegexFormatter(@"\[quote(?:\s*)\]((.|\n)*?)\[/quote(?:\s*)]", "<blockquote>$1</blockquote>")); BBCode_parser.Add(new RegexFormatter(@"\[blockquote=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/blockquote(?:\s*)\]", "<strong>$1</strong> said: <blockquote>$3</blockquote>")); BBCode_parser.Add(new RegexFormatter(@"\[blockquote=((.|\n)*?)(?:\s*)\]", "<blockquote>$1</blockquote>")); BBCode_parser.Add(new RegexFormatter(@"\[blockquote(?:\s*)\]((.|\n)*?)\[/blockquote(?:\s*)]", "<blockquote>$1</blockquote>")); BBCode_parser.Add(new RegexFormatter(@"\[spoiler(?:\s*)\]((.|\n)*?)\[/spoiler(?:\s*)]", "<span class="bbc-spoiler">$1</span>")); BBCode_parser.Add(new RegexFormatter(@"\[indent(?:\s*)\]((.|\n)*?)\[/indent(?:\s*)]", "<div class="bbc-indent">$1</div>")); BBCode_parser.Add(new RegexFormatter(@"\[hr(?:\s*)\]\[/hr(?:\s*)]", "<hr />")); BBCode_parser.Add(new RegexFormatter(@"\[hr(?:\s*)\]", "<hr />")); BBCode_parser.Add(new RegexFormatter(@"\[rule=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/rule(?:\s*)\]", "<div style="height: 0pt;border-top: 1px solid $3;margin: auto;width: $1"></div>")); //Styles BBCode_parser.Add(new RegexFormatter(@"\[size=+((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/size(?:\s*)\]", "<span style="font-size:$1em">$3</span>")); BBCode_parser.Add(new RegexFormatter(@"\[size=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/size(?:\s*)\]", "<span style="font-size:$1">$3</span>")); BBCode_parser.Add(new RegexFormatter(@"\[font=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/font(?:\s*)\]", "<span style="font-family:$1">$3</span>")); BBCode_parser.Add(new RegexFormatter(@"\[align=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/align(?:\s*)\]", "<span style="text-align:$1">$3</span>")); BBCode_parser.Add(new RegexFormatter(@"\[float=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/float(?:\s*)\]", "<span style="float:$1">$3</div>")); BBCode_parser.Add(new RegexFormatter(@"\[\*(?:\s*)]\s*([^\[]*)", "<li>$1</li>")); //LINK BBCode_parser.Add(new RegexFormatter(@"\[link(?:\s*)\]((.|\n)*?)\[/link(?:\s*)\]", "<a href="$1" target="_blank" title="$1">$1</a>")); BBCode_parser.Add(new RegexFormatter(@"\[link=((.|\n)*?)(?:\s*)\]((.|\n)*?)\[/link(?:\s*)\]", "<a href="$1" target="_blank" title="$1">$3</a>")); BBCode_parser.Add(new RegexFormatter(@"\[link=((.|\n)*?)(?:\s*)\]", "<a href="$1" target="_blank" title="$1">$1</a>")); //Youtube BBCode_parser.Add(new RegexFormatter(@"\[youtube(?:\s*)\]((.|\n)*?)\[/youtube(?:\s*)\]", "")); } catch (Exception) { return; } }
There are two important requirements when building the code:
1- .Net framework version
You need to build the dll to work with SQL server CLR base; VS 2010 , by default, builds projects against .net framework 4.0 which is NOT supported even with SQL server 2008 R2
SQL Server 2008 & 2008 R2 still load 2.0 CLR ;SQL Server 2008 uses .NET Framework 3.5 SP1 which is just an update of framework 2.0 base with some new assemblies.
You can verify the CLR version by running this code
SELECT * FROM sys.dm_clr_properties
Result on my computer which is SQL server 2008 R2
name—————-value
———————————————————————————–
directory—————-C:\Windows\Microsoft.NET\Framework64\v2.0.50727\
version—————- v2.0.50727
state—————- CLR is initialized
If you build the DLL against .NET framework 4.0 and try to load into SQL server, you’ll get this error
Msg 6257, Level 16, State 1, Line 1
CREATE ASSEMBLY for assembly ‘bbcode_parser’ failed because the assembly is built for an unsupported version of the Common Language Runtime.
If you are using IDE to build the DLL then you can define the target framework from project properties, application tab as in the screenshot here
If you are compiling the code using C# compiler , then having only .net 3.5 should be enough; all you want to do is to browse to Microsoft.NET\Framework\ folder under the system directory to compile the code.
Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Windows\Microsoft.NET\Framework\v3.5>csc /out:d:\temp\bbcode_parser.dll /target:library d:\temp\code.cs
Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.5420
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.
2- Assembly access permission
Depending on the assembly role, we specify its access permissions. Usually SAFE access permission should be enough if the assembly does NOT access external system resources like in our case. SAFE is the most restrictive permission set; code executed by an assembly with SAFE permissions cannot access external system resources such as files, the network, environment variables, or the registry.
I have a blog where I had to sign the assembly because it needed access to external resources (Web resources) , we don’t need that here. More info about creating an assembly.
Just for information
IDE: You can sign the assembly from project properties > signing tab
C# compiler : you’ll need to specify key file using parameter /keyfile
You’ll need first to create a key file using Strong Name Tool (Sn.exe)
Using the assembly inside SQL server
With the DLL in hand , all we do is to create an assembly inside SQL server , build a CLR function on top of it and call the function.
1- Ensure CLR is enabled
Exec master..Sp_configure ‘clr enabled’, 1 Go Reconfigure Go
2- Create the assembly with access permission=SAFE
CREATE assembly bbcode_parser FROM ‘d:\temp\bbcode_parser.dll’ WITH PERMISSION_SET=SAFE GO
3-Create the CLR function
CREATE function bbcode_parser (@bbcode NVARCHAR(MAX),@scrubHtml bit) returns nvarchar(max) AS external name bbcode_parser.Converter_bbcode.BBCode_To_HTML_Convert ; Go
Testing the function
We can test the function by feeding it plain BBCode. Here are some examples and the result.
Browser should render the effects of the real output
set nocount on GO Select dbo.bbcode_parser (‘[hl]Text with yellow background[/hl]’,0) Go Select dbo.bbcode_parser (‘[u] Underline text[/u]’,0) Go Select dbo.bbcode_parser (‘[link]https://www.pythian.com[/link]’,0) Go select dbo.bbcode_parser (‘[ol][li]{point1}[/li] [li]{point2}[/li][/ol]’,0)
Text with yellow background
Underline text
- {point1}
- {point2}
I hope that’s helpful for any of you and I apologize if it is a bit long, just trying to make it clear
Cheers
2 Comments. Leave new
[…] SQL Server expert Muhammad Mawla illuminates the BBCode corner with respect to […]
how ‘s Convert HTML to BBcode