My Final Post

June 2nd, 2009 Tech 3 comments

The time has come for me to stop writing on Talk In Code. It has been a fun couple of years, and I have learnt a lot, but as of today I will not be able to write any more posts on this blog. I want to thank everyone who has posted comments and contributed to the site since I have been writing.

You might see the blog being updated non-regularly in the future as I have now passed the rains on to someone else.

This is Tech, signing off.

Categories: General Tags:

Extract Keywords From A Text String With PHP

May 21st, 2009 Tech No comments

A common issue I have come across in the past is that I have a CMS system, or an old copy of Wordpress, and I need to create a set of keywords to be used in the meta keywords field. To solve this I put together a simple function that runs through a string and picks out the most commonly used words in that list as an array. This is currently set to be 10, but you can change that quite easily.

The first thing the function defines is a list of "stop" words. This is a list of words that occur quite a bit in English text and would therefore interfere with the outcome of the function. The function also uses a variant of the slug function to remove any odd characters that might be in the text.

function commonWords($string){
    $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
 
    $string = preg_replace('/ss+/i', '', $string);
    $string = trim($string); // trim the string
    $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
    $string = strtolower($string); // make it lowercase
 
    preg_match_all('/([a-z]*?)(?=s)/i', $string, $matchWords);
    $matchWords = $matchWords[0];
    foreach ( $matchWords as $key=>$item ) {
        if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
            unset($matchWords[$key]);
        }
    }
    $wordCountArr = array();
    if ( is_array($matchWords) ) {
        foreach ( $matchWords as $key => $val ) {
            $val = strtolower($val);
            if ( isset($wordCountArr[$val]) ) {
                $wordCountArr[$val]++;
            } else {
                $wordCountArr[$val] = 1;
            }
        }
    }
    arsort($wordCountArr);
    $wordCountArr = array_slice($wordCountArr, 0, 10);
    return $wordCountArr;                
}

Here is an example of the function in action.

$text = "This is some text. This is some text. Vending Machines are great.";
$words = commonWords($text);
echo implode(',', array_keys($words));

This produces the following output.

some,text,machines,vending

Excel Document Scanning With Zend_Search_Lucene

May 11th, 2009 Tech 1 comment

Zend_Search_Lucene offers some powerful document scanning capabilities, and there are a few different formats that are useful for the search engine to index.

To allow the indexing and searching of Excel documents using Zend_Search_Lucene you need to use the Zend_Search_Lucene_Document_Xlsx class. However, to use this class you must have the Zip module installed with PHP. For Windows users this means editing your php.ini file and uncommenting the following line:

extension=php_zip.dll

For Linux users you will need to recompile PHP with the –enable-zip configure option.

Create and/or open the index in the normal way and you can index Excel documents using the following code.

$filename = 'C:\Book1.xlsx';
$doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($filename);
$index->addDocument($doc);

You can now set up a query and search for the document in the following way, although you would normally expect the input string to be some kind of user input.

$queryStr = 'wibble';
$userQuery = Zend_Search_Lucene_Search_QueryParser::parse($queryStr);
 
$query = new Zend_Search_Lucene_Search_Query_Boolean();
$query->addSubquery($userQuery, true);
 
 
$hits = $index->find($query);
 
foreach ( $hits as $hit ) {
    echo $hit->score.'<br />';
    echo $hit->filename.'<br />';
}

The score is always returned with a hit object. Other parameters available to display are filename, title, subject, creator, keywords, description, lastModifiedBy, revision, modified, created. However, some of these depend on the contents of the document. It is possible to add keywords and subjects to an Excel document, so if they are not present then you will need to check for the existence of that parameter before displaying it. The following code looks for the existence of the keyword parameter before trying to print it out.

if ( isset($hit->keywords) ) {
    echo $hit->keywords.'<br />';
}

By default, this function indexes the document meta data and will tokenise and store the tokens as an index. The loadXlsxFile() function has a second optional parameter which is by default set to false. If this is set to true the contents of the Excel document will be included in the index. You can then use the following code to print out the contents of the document.

echo .$hit->body.'<br />';

Bear in mind that this output will not contain any row or column information and will therefore look like a dump of the data.

Multi Page Forms In PHP

May 5th, 2009 Tech No comments

Multi pages forms are just as they sound, a single form spread across multiple pages. These are useful in terms of usability as it can break up an otherwise dauntingly big form into smaller chunks. It can also be useful if you want to process some of the results in order to determine what forms the user sees on later steps.

There are two ways in which it is possible to do this using PHP.

The first (and simplest) is just to cycle through the items submitted on a previous form and print them out as hidden fields. Our first page source code will look like this:

<form action="form2.php" method="get">
Name: <input type="text" name="name" />
<br />
<input type="submit" value="Proceed">
</form>

On submitting the form we are taken to form2.php, which asks the user a different question and prints out the hidden fields. Because we used a get request for our first form we need to use the $_GET array.

<form action="end.php" method="get">
Colour: <input type="text" name="colour" />
<br />
<?php
foreach ( $_GET as $key=>$value ) {
    if ( $key!="submit" ) {
        $value = htmlentities(stripslashes(strip_tags($value)));
        echo "t<input type="hidden" name="$key" value="$value">\n";
    }
}
?>
<input type="submit" value="Proceed">
</form>

This same code can be used on the different pages of the form. This method is fine, and works quite well, but it doesn’t account for users going back through the form and resubmitting a previous item. The $_GET array will only contain information about the previous forms.

To make this more user friendly, and robust, we need to employ a session to store our form values as the user goes through the form. Using sessions means that the user can cycle back and forward through the forms with no ill effect on the form data. The following code will take the input of the previous form and save it as a PHP session.

session_start();
foreach ( $_GET as $key=>$value ) {
    if ( $key!="submit" ) {
        $value = htmlentities(stripslashes(strip_tags($value)));
        $_SESSION[$key] = $value;
    }
}

This must be included on every page of our multi page form. If it is not then the data will simply not be saved for that step. Your users can now move back and forward through the forms, saving the information as they go. You need to supply a back button for this to work as using the browser back and forward buttons will also not save the data.

Finally, when testing this code I found that using GET rather than POST was beneficial in terms of usability. This is mainly because if you use POST requests and the user clicks the back on their browser they will be asked if they want to resubmit the information for that form.

Does anyone else have any ideas about how to do this? If so then post a comment and suggest it. You can even put a post in the forum!

Categories: PHP Tags: , , , , , , ,

Validate EAN13 Barcodes

April 30th, 2009 Tech No comments

EAN13 barcodes are commonly used to label products in Europe. If you want to know more about how they work then please view the Wikipedia entry on European Article Numbers.

EAN13 barcodes are actually 12 digits long and are validated by using a check digit, which is placed at the end, making the code 13 digits long. The check digit is worked out by the following process:

  • Add up all of the even numbers and multiply this number by 3.
  • Add up all of the odd numbers and add this result to the result of the even numbers.
  • Divide the number by 10 and keep the remainder (modulo).
  • If the remainder is not 0 then subtract 10 from this number.

Here is a function that runs through those steps, but also check to see what length the barcode is. If it is 13 digits long then it returns both the original check digit and the calculated check digit. If the barcode is 12 digits long then it returns the checksum. In both cases the original barcode is also returned.

function validateEan13($digits)
{
    $originalcheck = false;
    if ( strlen($digits) == 13 ) {
        $originalcheck = substr($digits, -1);
        $digits = substr($digits, 0, -1);
    } elseif ( strlen($digits) != 12 ) {
        // Invalid EAN13 barcode
        return false;
    }
 
    // Add even numbers together
    $even = $digits[1] + $digits[3] + $digits[5] + $digits[7] + $digits[9] + $digits[11];
    // Multiply this result by 3
    $even = $even * 3;
    
    // Add odd numbers together
    $odd = $digits[0] + $digits[2] + $digits[4] + $digits[6] + $digits[8] + $digits[10];
    
    // Add two totals together
    $total = $even + $odd;
    
    // Calculate the checksum
    // Divide total by 10 and store the remainder
    $checksum = $total % 10;
    // If result is not 0 then take away 10
    if($checksum != 0){
        $checksum = 10 - $checksum;
    }
 
    // Return results.
    if ( $originalcheck !== false ) {
        return array('barcode'=>$digits, 'checksum'=>$checksum, 'originalcheck'=>$originalcheck);
    } else {
        return array('barcode'=>$digits, 'checksum'=>$checksum);
    }
}

To test this I ran a few codes through the function.

echo '<pre>';
// two normal barcodes
print_r(validateEan13(5023920187205));
print_r(validateEan13(5010548001860));
// one short barcode to work out checksum
print_r(validateEan13(501054800186));
// a normal barcode
print_r(validateEan13(5034504935778));
// the same barcode with a broken number
print_r(validateEan13(5034504735778));
// two random numbers, one of which is not long enough to be an EA13 barcode
print_r(validateEan13(7233897438712));
var_dump(validateEan13(3345345345));
echo '</pre>';

This prints out the following results, which I have annotated for your convenience.

// two normal barcodes
Array
(
    [barcode] => 502392018720
    [checksum] => 5
    [originalcheck] => 5
)
Array
(
    [barcode] => 501054800186
    [checksum] => 0
    [originalcheck] => 0
)
// one short barcode to work out checksum
Array
(
    [barcode] => 501054800186
    [checksum] => 0
)
 
// a normal barcode
Array
(
    [barcode] => 503450493577
    [checksum] => 8
    [originalcheck] => 8
)
// the same barcode with a broken number
Array
(
    [barcode] => 503450473577
    [checksum] => 4
    [originalcheck] => 8
)
// two random numbers, one of which is not long enough to be an EA13 barcode
Array
(
    [barcode] => 723389743871
    [checksum] => 4
    [originalcheck] => 2
)
bool(false)

Categories: PHP Tags: , , , ,
Information, services, and products:
We specialise in top quality wholesale body jewellery at excellent prices!
Looking for a Takeaway Delivery go to 118 Menu
Interested in traveller evictions
If you want 0800 numbers come to us.
Cycle Shelters at great prices.
Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit.
Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit.
Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit.
Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit.
Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit.