Archive

Posts Tagged ‘function’

Extract Keywords From A Text String With PHP

May 21st, 2009 Tech 1 comment

A common issue I have come across in the past is that I have a CMS system, or an old copy of Wordpress, and I need to create a set of keywords to be used in the meta keywords field. To solve this I put together a simple function that runs through a string and picks out the most commonly used words in that list as an array. This is currently set to be 10, but you can change that quite easily.

The first thing the function defines is a list of "stop" words. This is a list of words that occur quite a bit in English text and would therefore interfere with the outcome of the function. The function also uses a variant of the slug function to remove any odd characters that might be in the text.

function commonWords($string){
    $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
 
    $string = preg_replace('/ss+/i', '', $string);
    $string = trim($string); // trim the string
    $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
    $string = strtolower($string); // make it lowercase
 
    preg_match_all('/([a-z]*?)(?=s)/i', $string, $matchWords);
    $matchWords = $matchWords[0];
    foreach ( $matchWords as $key=>$item ) {
        if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
            unset($matchWords[$key]);
        }
    }
    $wordCountArr = array();
    if ( is_array($matchWords) ) {
        foreach ( $matchWords as $key => $val ) {
            $val = strtolower($val);
            if ( isset($wordCountArr[$val]) ) {
                $wordCountArr[$val]++;
            } else {
                $wordCountArr[$val] = 1;
            }
        }
    }
    arsort($wordCountArr);
    $wordCountArr = array_slice($wordCountArr, 0, 10);
    return $wordCountArr;                
}

Here is an example of the function in action.

$text = "This is some text. This is some text. Vending Machines are great.";
$words = commonWords($text);
echo implode(',', array_keys($words));

This produces the following output.

some,text,machines,vending

Validate EAN13 Barcodes

April 30th, 2009 Tech No comments

EAN13 barcodes are commonly used to label products in Europe. If you want to know more about how they work then please view the Wikipedia entry on European Article Numbers.

EAN13 barcodes are actually 12 digits long and are validated by using a check digit, which is placed at the end, making the code 13 digits long. The check digit is worked out by the following process:

  • Add up all of the even numbers and multiply this number by 3.
  • Add up all of the odd numbers and add this result to the result of the even numbers.
  • Divide the number by 10 and keep the remainder (modulo).
  • If the remainder is not 0 then subtract 10 from this number.

Here is a function that runs through those steps, but also check to see what length the barcode is. If it is 13 digits long then it returns both the original check digit and the calculated check digit. If the barcode is 12 digits long then it returns the checksum. In both cases the original barcode is also returned.

function validateEan13($digits)
{
    $originalcheck = false;
    if ( strlen($digits) == 13 ) {
        $originalcheck = substr($digits, -1);
        $digits = substr($digits, 0, -1);
    } elseif ( strlen($digits) != 12 ) {
        // Invalid EAN13 barcode
        return false;
    }
 
    // Add even numbers together
    $even = $digits[1] + $digits[3] + $digits[5] + $digits[7] + $digits[9] + $digits[11];
    // Multiply this result by 3
    $even = $even * 3;
    
    // Add odd numbers together
    $odd = $digits[0] + $digits[2] + $digits[4] + $digits[6] + $digits[8] + $digits[10];
    
    // Add two totals together
    $total = $even + $odd;
    
    // Calculate the checksum
    // Divide total by 10 and store the remainder
    $checksum = $total % 10;
    // If result is not 0 then take away 10
    if($checksum != 0){
        $checksum = 10 - $checksum;
    }
 
    // Return results.
    if ( $originalcheck !== false ) {
        return array('barcode'=>$digits, 'checksum'=>$checksum, 'originalcheck'=>$originalcheck);
    } else {
        return array('barcode'=>$digits, 'checksum'=>$checksum);
    }
}

To test this I ran a few codes through the function.

echo '<pre>';
// two normal barcodes
print_r(validateEan13(5023920187205));
print_r(validateEan13(5010548001860));
// one short barcode to work out checksum
print_r(validateEan13(501054800186));
// a normal barcode
print_r(validateEan13(5034504935778));
// the same barcode with a broken number
print_r(validateEan13(5034504735778));
// two random numbers, one of which is not long enough to be an EA13 barcode
print_r(validateEan13(7233897438712));
var_dump(validateEan13(3345345345));
echo '</pre>';

This prints out the following results, which I have annotated for your convenience.

// two normal barcodes
Array
(
    [barcode] => 502392018720
    [checksum] => 5
    [originalcheck] => 5
)
Array
(
    [barcode] => 501054800186
    [checksum] => 0
    [originalcheck] => 0
)
// one short barcode to work out checksum
Array
(
    [barcode] => 501054800186
    [checksum] => 0
)
 
// a normal barcode
Array
(
    [barcode] => 503450493577
    [checksum] => 8
    [originalcheck] => 8
)
// the same barcode with a broken number
Array
(
    [barcode] => 503450473577
    [checksum] => 4
    [originalcheck] => 8
)
// two random numbers, one of which is not long enough to be an EA13 barcode
Array
(
    [barcode] => 723389743871
    [checksum] => 4
    [originalcheck] => 2
)
bool(false)

Categories: PHP Tags: , , , ,

Print Array Without Trailing Commas In PHP

April 24th, 2009 Tech No comments

I have previously talked about Removing commas from the end of strings, but it is also possible to use the implode() function to do the same sort of thing.

implode() takes two parameters, the separator and the array, and returns a string with each array item separated with the separator. The following example shows how this function works.

$array = array(1,2,3,4,5,6);
$list = implode(',', $array);

The $list variable will now contain the string "1,2,3,4,5,6". However, things tend to become messy again when you have an array with empty items in it.

$array = array(1,2,3,4,5,6,'','','');
$list = implode(',', $array);

The $list variable will now contain the string "1,2,3,4,5,6,,". So to solve this issue we need to use the array_filter() function to clear out any blank array items before passing the output to the implode() function. The following example shows this in action.

$array = array(1,2,3,4,5,6,'','','');
$list = implode(',', array_filter($array));

The $list variable will now contain the string "1,2,3,4,5,6", which is the string we are looking for.

PHP Function To Detect A Prime Number

April 9th, 2009 Tech 1 comment

A prime number is a number which has exactly two distinct number divisors: 1 and itself. So if you take the number 11, it can only be divided to get a whole number if it is divided by 1 or 11. If any other number is used then a fraction is always found.

The following function uses a method called trial division to detect if a number is prime or not.

function is_prime($number)
{
    // 1 is not prime
    if ( $number == 1 ) {
        return false;
    }
    // 2 is the only even prime number
    if ( $number == 2 ) {
        return true;
    }
    // square root algorithm speeds up testing of bigger prime numbers
    $x = sqrt($number);
    $x = floor($x);
    for ( $i = 2 ; $i <= $x ; ++$i ) {
        if ( $number % $i == 0 ) {
            break;
        }
    }
    
    if( $x == $i-1 ) {
        return true;
    } else {
        return false;
    }
}

The function first detects if the number is 1 (not prime) or if it is two (prime). These are two exceptions to the rules that follow and must be caught before proceeding. The function divides the number by all numbers less than or equal to the square root of that number. If any of the divisions come out as an integer, then the original number is not a prime. Otherwise, it is a prime.

Here is an example bit of script that finds all of the prime numbers between 0 and 1,000,000.

$start = 0;
$end =   1000000;
for($i = $start; $i <= $end; $i++)
{
    if(is_prime($i))
    {
        echo '<strong>'.$i.'</strong>, ';
    }
}

Obviously this takes a little while to run!

Also, this function is only useful if you want to check integers, if your number is higher than the maximum value of an integer PHP will use a float to store the number, which causes false positives. To find the maximum value of an integer on your system use the following code.

echo PHP_INT_MAX;

Categories: PHP Tags: , , , , ,

Disemvoweling PHP Function

April 7th, 2009 Tech 1 comment

Disemvoweling is a technique used on blogs and forums to censor any post or comment that contains spam or other unwanted text. It involves simply removing the vowels from the text so that it is almost, but not entirely, unreadable.

Use the following function to disemvowel a string of text.

function disemvowel($string)
{
    return str_replace(array('a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U'), '', $string);
}

As an example, the first sentence on this post:

Disemvoweling is a technique used on blogs and forums to censor any post or comment that contains spam or other unwanted text.

would appear like this:

Dsmvwlng s tchnq sd n blgs nd frms t cnsr ny pst r cmmnt tht cntns spm r thr nwntd txt.

Which doesn’t make a lot of sense, but is still kind of readable. This technique kills unwanted comments without removing the text entirely.

Check out the Wikipedia page on Disemvoweling for more information on the origins or this method.