Archive

Archive for the ‘PHP Strings’ Category

Extract Keywords From A Text String With PHP

May 21st, 2009 Tech 1 comment

A common issue I have come across in the past is that I have a CMS system, or an old copy of Wordpress, and I need to create a set of keywords to be used in the meta keywords field. To solve this I put together a simple function that runs through a string and picks out the most commonly used words in that list as an array. This is currently set to be 10, but you can change that quite easily.

The first thing the function defines is a list of "stop" words. This is a list of words that occur quite a bit in English text and would therefore interfere with the outcome of the function. The function also uses a variant of the slug function to remove any odd characters that might be in the text.

function commonWords($string){
    $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
 
    $string = preg_replace('/ss+/i', '', $string);
    $string = trim($string); // trim the string
    $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
    $string = strtolower($string); // make it lowercase
 
    preg_match_all('/([a-z]*?)(?=s)/i', $string, $matchWords);
    $matchWords = $matchWords[0];
    foreach ( $matchWords as $key=>$item ) {
        if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
            unset($matchWords[$key]);
        }
    }
    $wordCountArr = array();
    if ( is_array($matchWords) ) {
        foreach ( $matchWords as $key => $val ) {
            $val = strtolower($val);
            if ( isset($wordCountArr[$val]) ) {
                $wordCountArr[$val]++;
            } else {
                $wordCountArr[$val] = 1;
            }
        }
    }
    arsort($wordCountArr);
    $wordCountArr = array_slice($wordCountArr, 0, 10);
    return $wordCountArr;                
}

Here is an example of the function in action.

$text = "This is some text. This is some text. Vending Machines are great.";
$words = commonWords($text);
echo implode(',', array_keys($words));

This produces the following output.

some,text,machines,vending

Hide And Unhide Code With PHP

April 27th, 2009 Tech No comments

If you are selling a system the last thing you want is for people to copy the system and pass it on for free. There are numerous ways to implement parts of the system that will stop this from happening.

By far the easiest is to create a section of code that is hidden, the removal of which will cause the application to fall over. It could even be as simple as a link back to your site so that even if you give you application away for free, you will always have that link present.

This method involves the use of a function called eval(), which takes PHP code as a string and interprets it to produce output. Here is an example that prints a link to Talk In Code.

$code = "echo "<a href='http://www.talkincode.com/' title'Talk In Code'>Talk In Code</a>";";
eval($code);

So lets use some code to hide this from anyone who might be reading our source code. First we pass this string through our hiding function to produce non-human readable text. This function is called obfuscate() and works by taking each character in turn and converting it into the ascii equivalent.

function obfuscate($text) {
    $length = strlen($text);
    $scrambled = '';
    
    for ($i = 0; $i < $length; ++$i) {
        $scrambled .= ord($text[$i]). ' ';
    }
    
    return $scrambled;
}
$code = "echo "<a href='http://www.talkincode.com/' title'Talk In Code'>Talk In Code</a>";";
 
$obf = obfuscate($code);
echo $obf;

This will print out the following:

101 99 104 111 32 34 60 97 32 104 114 101 102 61 39 104 116 116 112 58 47 47 119 119 119 46 116 97 108 107 105 110 99 111 100 101 46 99 111 109 47 39 32 116 105 116 108 101 39 84 97 108 107 32 73 110 32 67 111 100 101 39 62 84 97 108 107 32 73 110 32 67 111 100 101 60 47 97 62 34 59

We can store this as a variable until we next need it. In order to run this code we need to convert it into something that eval() can understand, to do this we use the opposite of the obfuscate(), called unobfuscate(). This function works by taking a set of ascii values and converting them into their character equivalents, note that we also trim the text to remove the last space from the end of the code.

function unobfuscate($scrambled) {
    $text = '';
 
    $bits = explode(' ',$scrambled);
    
    foreach ( $bits as $bit ) {
        $text .= chr($bit);
    }
 
    return trim($text);
}

We can then transform our hidden code into PHP code, which is then passed to the eval() function and run.

$code = '101 99 104 111 32 34 60 97 32 104 114 101 102 61 39 104 116 116 112 58 47 47 119 119 119 46 116 97 108 107 105 110 99 111 100 101 46 99 111 109 47 39 32 116 105 116 108 101 39 84 97 108 107 32 73 110 32 67 111 100 101 39 62 84 97 108 107 32 73 110 32 67 111 100 101 60 47 97 62 34 59';
$code = unobfuscate($code);
eval($code);

This produces the following output.

<a href='http://www.talkincode.com/' title'Talk In Code'>Talk In Code</a>

Beware that doing this sort of thing will probably slow down your application, especially if you try to eval() a large block of code. A single link like this is probably as far as I would personally go as there are much better ways of verifying that a piece of software is properly licensed.

Categories: PHP Strings Tags: , , , , , , ,

Delete Trailing Commas In PHP

April 14th, 2009 Tech 1 comment

Converting an array of information into a string is easy, but when you are doing this for insertion into a database having trailing commas is going to mess up your SQL statements.

Take the following example, which takes an array of values and converts them into a string of values. This practice is quite common in PHP database manipulation.

$values = array('one', 'two', 'three', 'four', 'five');
$string = '';
 
foreach ( $values as $val ) {
    $string .= '"'.$val.'", ';
}
 
echo $string; // prints "one", "two", "three", "four", "five",

Obviously we need to strip the trailing comma from the end of this string. To do this you can use the following function.

function deleteTrailingCommas($str)
{
    return trim(preg_replace("/(.*?)((,|s)*)$/m", "$1", $str));
}

This function uses a regular expression to match for one or more commas or spaces after the main bulk of text and before the end of the string and prints out the main bulk of text. The trailing commas are not returned.

Here is another example:

$string = '"one", , ,  , , , ,,';
echo $string;
$string = deleteTrailingCommas($string);
echo $string;

This prints out the following:

"one", , ,  , , , ,,
"one"

Disemvoweling PHP Function

April 7th, 2009 Tech 1 comment

Disemvoweling is a technique used on blogs and forums to censor any post or comment that contains spam or other unwanted text. It involves simply removing the vowels from the text so that it is almost, but not entirely, unreadable.

Use the following function to disemvowel a string of text.

function disemvowel($string)
{
    return str_replace(array('a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U'), '', $string);
}

As an example, the first sentence on this post:

Disemvoweling is a technique used on blogs and forums to censor any post or comment that contains spam or other unwanted text.

would appear like this:

Dsmvwlng s tchnq sd n blgs nd frms t cnsr ny pst r cmmnt tht cntns spm r thr nwntd txt.

Which doesn’t make a lot of sense, but is still kind of readable. This technique kills unwanted comments without removing the text entirely.

Check out the Wikipedia page on Disemvoweling for more information on the origins or this method.

Preparing HTML And PHP Code For Pubilishing On Websites

April 1st, 2009 Tech No comments

I talked a while ago about Adding Code To Wordpress Blogs And Comments, but I decided that it needed a bit of code to do this automatically.

So here it is, prepared by the text processor.

<form method="post" action="http://talkincode.com/examples/text-process/text.php">
    <textarea name="text" rows="10" cols="80" wrap="off"></textarea>
    <input type="submit" value="Process" />
</form>
 
<?php
if ( isset($_POST["text"]) ) {
    $text   = $_POST["text"];
    $text   = stripslashes( $text );
    $input  = array ( "/&/", "/'/", "/"/", "/</", "/>/", "/t/", "/(?<=s)x20|x20(?=s)/", "/^\s$/m", "/&/", "/rn/" );
    $output = array ( "&amp;", "&#39;", "&quot;", "&lt;", "&gt;", "&nbsp;&nbsp;&nbsp;&nbsp;", "&nbsp;", "&nbsp;<br />", "&amp;", "<br />" );
    $temp = preg_replace($input, $output, $text);
    echo '<div style="border:1px solid grey;">'.$temp.'</div>';
}
?>

There seems to be rather a lot going on here, but the process is quite simple. The preg_replace() function can take an array as an argument for the input and output parameters. When you do this the arrays will be matched up so that the second item in the input array will be replaced by the second item in the output array.

So here is a list of the things I am matching for and what they are replaced with.

  • /&/ This matches for any ampersand, we replace this with the encoded variant of &amp;.
  • /&#39;/ Find single quotes and encode them with &#39;.
  • /\"/ Find double quotes and encode them with &quot;.
  • /
  • />/ Same as above but the other way around, in this case the equivalent is &gt;.
  • /\t/ Next we start matching for white space, the first is to find all tab characters and replace them with four &nbsp; characters, like this &nbsp;&nbsp;&nbsp;&nbsp;
  • /(?<=\s)\x20|\x20(?=\s)/ Next we look for any space character that has white space characters before and after it and replace with a single white space character &nbsp;.
  • /^\s$/m This matches for any line with nothing on it. These must be replaced with a single &nbsp; character, but in order to keep the code as it was posted we add a <br /> tag, the final output would be &nbsp;<br />.
  • /&/ Now that we have all of our tags encoded we need to re encode all of the & characters so that when the script prints out the content to a HTML page with all & translated to &amp;.
  • /\r\n/ Finally, we find all of the new line characters and convert them to <br /> tags. You might want to change this to just \n if you are using a Linux format.

Before we do any of this we pass the text through the stripslashes() function. This is because sending the text over POST might add slashes to the " and ' characters. This call just removes them.

You can try out the processor if you want by copying some code into the following text box.

This will output to the text process example page. You can also visit this page directly and play around with the tool.