Archive

Posts Tagged ‘encoding’

Preparing HTML And PHP Code For Pubilishing On Websites

April 1st, 2009 No comments

I talked a while ago about Adding Code To WordPress Blogs And Comments, but I decided that it needed a bit of code to do this automatically.

So here it is, prepared by the text processor.

<form method="post" action="http://talkincode.com/examples/text-process/text.php">
    <textarea name="text" rows="10" cols="80" wrap="off"></textarea>
    <input type="submit" value="Process" />
</form>
 
<?php
if ( isset($_POST["text"]) ) {
    $text   = $_POST["text"];
    $text   = stripslashes( $text );
    $input  = array ( "/&/", "/'/", "/"/", "/</", "/>/", "/t/", "/(?<=s)x20|x20(?=s)/", "/^\s$/m", "/&/", "/rn/" );
    $output = array ( "&amp;", "&#39;", "&quot;", "&lt;", "&gt;", "&nbsp;&nbsp;&nbsp;&nbsp;", "&nbsp;", "&nbsp;<br />", "&amp;", "<br />" );
    $temp = preg_replace($input, $output, $text);
    echo '<div style="border:1px solid grey;">'.$temp.'</div>';
}
?>

There seems to be rather a lot going on here, but the process is quite simple. The preg_replace() function can take an array as an argument for the input and output parameters. When you do this the arrays will be matched up so that the second item in the input array will be replaced by the second item in the output array.

So here is a list of the things I am matching for and what they are replaced with.

  • /&/ This matches for any ampersand, we replace this with the encoded variant of &amp;.
  • /&#39;/ Find single quotes and encode them with &#39;.
  • /\"/ Find double quotes and encode them with &quot;.
  • /
  • />/ Same as above but the other way around, in this case the equivalent is &gt;.
  • /\t/ Next we start matching for white space, the first is to find all tab characters and replace them with four &nbsp; characters, like this &nbsp;&nbsp;&nbsp;&nbsp;
  • /(?<=\s)\x20|\x20(?=\s)/ Next we look for any space character that has white space characters before and after it and replace with a single white space character &nbsp;.
  • /^\s$/m This matches for any line with nothing on it. These must be replaced with a single &nbsp; character, but in order to keep the code as it was posted we add a <br /> tag, the final output would be &nbsp;<br />.
  • /&/ Now that we have all of our tags encoded we need to re encode all of the & characters so that when the script prints out the content to a HTML page with all & translated to &amp;.
  • /\r\n/ Finally, we find all of the new line characters and convert them to <br /> tags. You might want to change this to just \n if you are using a Linux format.

Before we do any of this we pass the text through the stripslashes() function. This is because sending the text over POST might add slashes to the " and ' characters. This call just removes them.

You can try out the processor if you want by copying some code into the following text box.

This will output to the text process example page. You can also visit this page directly and play around with the tool.

Adding Code To WordPress Blogs And Comments

February 5th, 2009 No comments

WordPress is a pretty neat blogging platform, but it falls over quite spectacularly when trying to write code in posts. I write a lot of code for Talk In Code and so I have understand what needs to be encoded to make code examples work.

For code example on Talk In Code I use the <code> tag and I encode the following characters.

  • < into &lt;
  • > into &gt;
  • " into &quot;
  • ' into &#39;

Note: You must be in HTML mode in your WordPress editor or everything will be double encoded.

If these characters are left in then WordPress will either keep them "as is" (ie, a <br /> will cause a line break) or it will convert them into non standard characters. For example, typing a ' (single quote) is straightforward, but when your users come to copy and paste the code to try it for themselves they find that the characters WordPress gives them cause the examples to fail. So every time you type a ' you have to encode it using &#39;. The following example shows why typing a single quote will break your code examples.

echo ‘Hello World’;

The same thing applies to double quotes, as in the following example.

echo “Hello World”;

WordPress will also try to guess what you are doing and add in tags where you don’t want them. The effect of this is to break your code tags if you leave any space in them. Take the following example of a 4 line snippet of code, with a blank line between line 2 and line 4.

line 1
line 2

line 4

This is because WordPress will see the blank line and try to add some tags in to make it look like it thought you wanted it. In order to stop this you need to put in a &nbsp; (non breaking space) character on any blank lines that you have. The following example fixes the previous example.

line 1
line 2
 
line 4