Category: PHP Strings

Randomising The Middle Of Words In PHP

18 November, 2008 | PHP Strings | No comments

I was sent an email the other day that contained some text were the start and end letter of each word were left alone, but the middle of each word was randomized. The weird part was that the text was still readable, which is due to the way in which the brain processes words.

I wondered if I could replicate this using a PHP script. All I would need to do is split apart the sentence into the component words and loop through those words, randomizing the middle of them. Clearly, it is not possible to mix up the order of letters in a word less than four characters long so a check would be needed for this. This is what I cam up with:

function mixWordMiddle($string)
{
 $string = explode(' ',$string);
 foreach ( $string as $pos=>$word ) {
  $tmpArray = array();
  if ( strlen($word) > 3 ) {
   $chars = preg_split('//', $word, -1, PREG_SPLIT_NO_EMPTY);
   for ( $i = 1 ; $i < count($chars)-1 ; ++$i ) {
    $tmpArray[] = $chars[$i];
    shuffle($tmpArray);
   }
   $string[$pos] = $chars[0].implode($tmpArray).$chars[count($chars)-1] .' ';
  }
 }
 echo implode(' ',$string);
}

I then tried plugging in the following text about evolution.

$string = 'In biology, evolution is the changes in the inherited traits of a population of organisms from one generation to the next. These changes are caused by a combination of three main processes: variation, reproduction, and selection.';

And came up with something like the following.

In bliygoo, eoutivoln is the cganhes in the iethirned titras of a piaplouotn of oargnsims form one gneoeatirn to the nxte. Thsee cagnhes are ceusad by a cmibitoonan of there main persocses: voaitanri, rteunodpoirc, and stoneleic.

Which is actually quite difficult to read. I thought that this might be because I had used a bit of text with too many long words, so I selected another:

$string = 'A giant Saudi oil tanker seized by pirates in the Indian Ocean is nearing the coast of Somalia, the US Navy says.';

This produced the following text.

A ganit Suadi oil taeknr seezid by ptaiers in the Ianidn Oecan is nraneig the cosat of Smiolaa, the US Navy syas.

This is just a test script, so it doesn’t take into account any punctuation. However, the text it produces is still difficult to read, which leads me be skeptical of the claims of that the email I received.

Simple Swear Filter In PHP

30 September, 2008 | PHP Strings | No comments

Use the following function to filter out words from user input. It works by having a pre-set array of words that are to be excluded, this array is then looped through and each item is used to replace any instances of that word within the text. The regular expression uses the \b character class, which stands for any word boundary. This way you don’t get the middle of words being filtered out when they are not meant to be.

By using the e of the preg_replace function it is possible to run PHP functions within the output. In this case we count the number of characters found in the replace and use this to create a string of stars (*) of equal length.

function filterwords($text){
 $filterWords = array('gosh','darn','poo');
 $filterCount = sizeof($filterWords);
 for($i=0; $i<$filterCount; $i++){
  $text = preg_replace('/\b'.$filterWords[$i].'\b/ie',"str_repeat('*',strlen('$0'))",$text);
 }
 return $text;
}

When the following text is run through this function.

echo filterwords('Darn, I have a mild form of torretts, poo!');

It produces the following result.

****, I have a mild form of torretts, ***!

What To Do When get_html_translation_table() And htmlspecialchars() Doesn’t Work

17 September, 2008 | PHP Strings | No comments

I found a little problem today when processing a bit of text from a non-english site. I found that the text was being loaded properly, but because it was in UTF-8 encoding PHP couldn’t use htmlspecialchars() or apply get_html_translation_table() to the string to properly encode the foreign characters. These methods just don’t have any effect. This is because PHP doesn’t natively support unicode character encoding and is therefore not able to translate encoded characters.

To get around this just use the utf8_decode() function on the string to convert it into a usable format.

// convert from uft8
$string = utf8_decode($string);
 
// translate HTML entities
$trans = get_html_translation_table(HTML_ENTITIES);
$string = strtr($string, $trans);

I hope this helps anyone having the same issue. Also, PHP6 will support unicode character encoding so this will probably have to be looked at again when PHP6 is released.

Work Out Size In Bytes Of A PHP String

15 September, 2008 | PHP Strings | No comments

I found this very handy function on the php.net site in the user comments for the strlen() function. It accepts a string in ASCII or UTF-8 format and finds out how long that string is in bytes.

The function works by going through the string and adding how many bytes each character represents. For normal ASCII values this is a single byte so 1 is added to the total. Unicode characters can be up to 6 bytes and so the rest of this function works out how many bytes the character takes up by using AND calculations.

/**
* Count the number of bytes of a given string.
* Input string is expected to be ASCII or UTF-8 encoded.
* Warning: the function doesn't return the number of chars
* in the string, but the number of bytes.
* See http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
* for information on UTF-8.
*
* @param string $str The string to compute number of bytes
*
* @return The length in bytes of the given string.
*/
function strBytes($str){
 // STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT
 
 // Number of characters in string
 $strlen_var = strlen($str);
 
 // string bytes counter
 $d = 0;
 
 /*
 * Iterate over every character in the string,
 * escaping with a slash or encoding to UTF-8 where necessary
 */
 for($c = 0; $c < $strlen_var; ++$c){
  $ord_var_c = ord($str{$c});
  switch(true){
  case(($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)):
   // characters U-00000000 - U-0000007F (same as ASCII)
   $d++;
   break;
  case(($ord_var_c & 0xE0) == 0xC0):
   // characters U-00000080 - U-000007FF, mask 110XXXXX
   $d+=2;
   break;
  case(($ord_var_c & 0xF0) == 0xE0):
   // characters U-00000800 - U-0000FFFF, mask 1110XXXX
   $d+=3;
   break;
  case(($ord_var_c & 0xF8) == 0xF0):
   // characters U-00010000 - U-001FFFFF, mask 11110XXX
   $d+=4;
   break;
  case(($ord_var_c & 0xFC) == 0xF8):
   // characters U-00200000 - U-03FFFFFF, mask 111110XX
   $d+=5;
   break;
  case(($ord_var_c & 0xFE) == 0xFC):
   // characters U-04000000 - U-7FFFFFFF, mask 1111110X
   $d+=6;
   break;
   default:
   $d++;
  };
 };
 return $d;
}

This string is useful if you want to know how large a string is in bytes, but have only a small amount of control over how the string will be presented. For example, if you download a web page and want to know how large it is in bytes you can pass the content of the page into this function.

You might think that the Content-Length header could be used here, but you can’t rely on this header to be returned from every site. Some sites will simply omit the line, whilst others will just put a default amount there.

Using PHP To Split A String Into Characters

2 September, 2008 | PHP Strings | No comments

Use the following code to split a string into an array of characters.

$chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);

It uses the preg_split() PHP function which takes a number of parameters. These area as follows:

  1. The regular expression to be used. In this case it matches everything.
  2. The string to be used in the regular expression.
  3. This is the character limit. In this case -1 mean no limit, so the function will work for any size of string.
  4. The last parameter can be a flag or series of flags separated by the | character. In this case the PREG_SPLIT_NO_EMPTY flag is used. This prevents the function from returning any empty strings. So if your string has any spaces in it they will not be returned.

To give an example, take the following string varaible.

$str = 'wibble';

This can be passed into the code and printed out like this.

$chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>'.print_r($chars,true).'</pre>';

This will print out the following:

Array
(
 [0] => w
 [1] => i
 [2] => b
 [3] => b
 [4] => l
 [5] => e
)