Category: VB

Stripping HTML From A String In ASP Visual Basic .NET

17 January, 2008 | VB | No comments

Many web languages have built in functions to strip out HTML, but Microsoft’s Visual Basic language lacks this functionality. To that end here are a couple of functions that will take a string as input and produce a string with no HTML tags present in it.

The first example simply looks for anything between a < and a > and replaces this with nothing. It also replaces single < and > characters with their proper HTML encoding.

Function stripHTML(strHTML)
'Strips the HTML tags from strHTML
 
  Dim objRegExp, strOutput
  Set objRegExp = New Regexp
 
  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "<(.|\n)+?>"
 
  'Replace all HTML tag matches with the empty string
  strOutput = objRegExp.Replace(strHTML, "")
 
  'Replace all < and > with &lt; and &gt;
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
 
  stripHTML = strOutput 'Return the value of strOutput
 
  Set objRegExp = Nothing
  End Function

The next function is a more complicated version using the split and join method. This is a slightly more processor intensive version, but will yield more consistent result than the previous method. The idea behind this is that it first pulls apart the string by the > marks and then stick it back together again piece by piece.

Function stripHTML(strHTML)
  'Strips the HTML tags from strHTML using split and join
 
  'Ensure that strHTML contains something
  If len(strHTML) = 0 then
    stripHTML = strHTML
    Exit Function
  End If
 
  Dim arysplit, i, j, strOutput
 
  arysplit = split(strHTML, "<")
 
  'Assuming strHTML is nonempty, we want to start iterating
  'from the 2nd array postition
  if len(arysplit(0)) > 0 then j = 1 else j = 0
 
  'Loop through each instance of the array
  for i=j to ubound(arysplit)
    'Do we find a matching > sign?
    if instr(arysplit(i), ">") then
      'If so, snip out all the text between the start of the string
      'and the > sign
      arysplit(i) = mid(arysplit(i), instr(arysplit(i), ">") + 1)
    else
      'Ah, the < was was nonmatching
      arysplit(i) = "<" & arysplit(i)
    end if
  next
 
  'Rejoin the array into a single string
  strOutput = join(arysplit, "")
 
  'Snip out the first <
  strOutput = mid(strOutput, 2-j)
 
  'Convert < and > to < and >
  strOutput = replace(strOutput,">","&gt;")
  strOutput = replace(strOutput,"<","&lt;")
 
  stripHTML = strOutput
End Function