Archive

Posts Tagged ‘asp’

Stripping HTML From A String In ASP Visual Basic .NET

January 17th, 2008 No comments

Many web languages have built in functions to strip out HTML, but Microsoft’s Visual Basic language lacks this functionality. To that end here are a couple of functions that will take a string as input and produce a string with no HTML tags present in it.

The first example simply looks for anything between a < and a > and replaces this with nothing. It also replaces single < and > characters with their proper HTML encoding.

Function stripHTML(strHTML)
'Strips the HTML tags from strHTML
 
  Dim objRegExp, strOutput
  Set objRegExp = New Regexp
 
  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "<(.|\n)+?>"
 
  'Replace all HTML tag matches with the empty string
  strOutput = objRegExp.Replace(strHTML, "")
 
  'Replace all < and > with &lt; and &gt;
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
 
  stripHTML = strOutput 'Return the value of strOutput
 
  Set objRegExp = Nothing
  End Function

The next function is a more complicated version using the split and join method. This is a slightly more processor intensive version, but will yield more consistent result than the previous method. The idea behind this is that it first pulls apart the string by the > marks and then stick it back together again piece by piece.

Function stripHTML(strHTML)
  'Strips the HTML tags from strHTML using split and join
 
  'Ensure that strHTML contains something
  If len(strHTML) = 0 then
    stripHTML = strHTML
    Exit Function
  End If
 
  Dim arysplit, i, j, strOutput
 
  arysplit = split(strHTML, "<")
 
  'Assuming strHTML is nonempty, we want to start iterating
  'from the 2nd array postition
  if len(arysplit(0)) > 0 then j = 1 else j = 0
 
  'Loop through each instance of the array
  for i=j to ubound(arysplit)
    'Do we find a matching > sign?
    if instr(arysplit(i), ">") then
      'If so, snip out all the text between the start of the string
      'and the > sign
      arysplit(i) = mid(arysplit(i), instr(arysplit(i), ">") + 1)
    else
      'Ah, the < was was nonmatching
      arysplit(i) = "<" & arysplit(i)
    end if
  next
 
  'Rejoin the array into a single string
  strOutput = join(arysplit, "")
 
  'Snip out the first <
  strOutput = mid(strOutput, 2-j)
 
  'Convert < and > to < and >
  strOutput = replace(strOutput,">","&gt;")
  strOutput = replace(strOutput,"<","&lt;")
 
  stripHTML = strOutput
End Function

Categories: VB Tags: , , , , ,