Stripping HTML From A String In ASP Visual Basic .NET
Many web languages have built in functions to strip out HTML, but Microsoft’s Visual Basic language lacks this functionality. To that end here are a couple of functions that will take a string as input and produce a string with no HTML tags present in it.
The first example simply looks for anything between a < and a > and replaces this with nothing. It also replaces single < and > characters with their proper HTML encoding.
Function stripHTML(strHTML)
'Strips the HTML tags from strHTML
Dim objRegExp, strOutput
Set objRegExp = New Regexp
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "<(.|\n)+?>"
'Replace all HTML tag matches with the empty string
strOutput = objRegExp.Replace(strHTML, "")
'Replace all < and > with < and >
strOutput = Replace(strOutput, "<", "<")
strOutput = Replace(strOutput, ">", ">")
stripHTML = strOutput 'Return the value of strOutput
Set objRegExp = Nothing
End Function
The next function is a more complicated version using the split and join method. This is a slightly more processor intensive version, but will yield more consistent result than the previous method. The idea behind this is that it first pulls apart the string by the > marks and then stick it back together again piece by piece.
Function stripHTML(strHTML)
'Strips the HTML tags from strHTML using split and join
'Ensure that strHTML contains something
If len(strHTML) = 0 then
stripHTML = strHTML
Exit Function
End If
Dim arysplit, i, j, strOutput
arysplit = split(strHTML, "<")
'Assuming strHTML is nonempty, we want to start iterating
'from the 2nd array postition
if len(arysplit(0)) > 0 then j = 1 else j = 0
'Loop through each instance of the array
for i=j to ubound(arysplit)
'Do we find a matching > sign?
if instr(arysplit(i), ">") then
'If so, snip out all the text between the start of the string
'and the > sign
arysplit(i) = mid(arysplit(i), instr(arysplit(i), ">") + 1)
else
'Ah, the < was was nonmatching
arysplit(i) = "<" & arysplit(i)
end if
next
'Rejoin the array into a single string
strOutput = join(arysplit, "")
'Snip out the first <
strOutput = mid(strOutput, 2-j)
'Convert < and > to < and >
strOutput = replace(strOutput,">",">")
strOutput = replace(strOutput,"<","<")
stripHTML = strOutput
End Function
Recent Comments