Archive

Posts Tagged ‘string’

Extract Keywords From A Text String With PHP

May 21st, 2009 1 comment

A common issue I have come across in the past is that I have a CMS system, or an old copy of WordPress, and I need to create a set of keywords to be used in the meta keywords field. To solve this I put together a simple function that runs through a string and picks out the most commonly used words in that list as an array. This is currently set to be 10, but you can change that quite easily.

The first thing the function defines is a list of "stop" words. This is a list of words that occur quite a bit in English text and would therefore interfere with the outcome of the function. The function also uses a variant of the slug function to remove any odd characters that might be in the text.

function commonWords($string){
    $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
 
    $string = preg_replace('/ss+/i', '', $string);
    $string = trim($string); // trim the string
    $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
    $string = strtolower($string); // make it lowercase
 
    preg_match_all('/([a-z]*?)(?=s)/i', $string, $matchWords);
    $matchWords = $matchWords[0];
    foreach ( $matchWords as $key=>$item ) {
        if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
            unset($matchWords[$key]);
        }
    }
    $wordCountArr = array();
    if ( is_array($matchWords) ) {
        foreach ( $matchWords as $key => $val ) {
            $val = strtolower($val);
            if ( isset($wordCountArr[$val]) ) {
                $wordCountArr[$val]++;
            } else {
                $wordCountArr[$val] = 1;
            }
        }
    }
    arsort($wordCountArr);
    $wordCountArr = array_slice($wordCountArr, 0, 10);
    return $wordCountArr;                
}

Here is an example of the function in action.

$text = "This is some text. This is some text. Vending Machines are great.";
$words = commonWords($text);
echo implode(',', array_keys($words));

This produces the following output.

some,text,machines,vending

Convert A String To A Date Value In MySQL

April 21st, 2009 No comments

There are numerous ways to print out dates and times, and many hours of programming are spent in converting dates from one format to another.

To do this in MySQL you use the STR_TO_DATE() function, which has two parameters. The first parameter is the time to be parsed and the second is the format of that time. Here is a simple example that converts one date format to a MySQL formatted date string.

SELECT STR_TO_DATE('[21/Apr/2009:07:14:50 +0100]', '[%d/%b/%Y:%H:%i:%S +0100]');

This outputs 2009-04-21 07:14:50.

Using this function is quite intuitive and means that you can convert between time formats very easily. Here are a few more examples of this function.

SELECT STR_TO_DATE('21/04/2009', '%d/%m/%Y'); // 2009-04-21
SELECT STR_TO_DATE('04/21/2009', '%m/%d/%Y'); // 2009-04-21
SELECT STR_TO_DATE('2009-04-21 14', '%Y-%m-%d %H'); // 2009-04-21 14:00:00

If you enter a date that can’t be translated then you will get an error, take the following code that tries to convert a minute value of 87.

SELECT STR_TO_DATE('2009-04-21 14:87', '%Y-%m-%d %H:%i');

The following error message is returned.

Incorrect datetime value: '2009-04-21 14:87' for function str_to_date

This error can be quite easy to reproduce. For example, if you wanted to parse a time that was in 24 format, you would use the %H for hours. Using the %h value for hours will produce exactly this error.

The following table lists all of the different parameters that are involved in the date formatting features within MySQL. Use this to parse your own dates.

Specifier Description
%a Abbreviated weekday name (Sun..Sat)
%b Abbreviated month name (Jan..Dec)
%c Month, numeric (0..12)
%D Day of the month with English suffix (0th, 1st, 2nd, 3rd, …)
%d Day of the month, numeric (00..31)
%e Day of the month, numeric (0..31)
%f Microseconds (000000..999999)
%H Hour (00..23)
%h Hour (01..12)
%I Hour (01..12)
%i Minutes, numeric (00..59)
%j Day of year (001..366)
%k Hour (0..23)
%l Hour (1..12)
%M Month name (January..December)
%m Month, numeric (00..12)
%p AM or PM
%r Time, 12-hour (hh:mm:ss followed by AM or PM)
%S Seconds (00..59)
%s Seconds (00..59)
%T Time, 24-hour (hh:mm:ss)
%U Week (00..53), where Sunday is the first day of the week
%u Week (00..53), where Monday is the first day of the week
%V Week (01..53), where Sunday is the first day of the week; used with %X
%v Week (01..53), where Monday is the first day of the week; used with %x
%W Weekday name (Sunday..Saturday)
%w Day of the week (0=Sunday..6=Saturday)
%X Year for the week where Sunday is the first day of the week, numeric, four digits; used with %V
%x Year for the week, where Monday is the first day of the week, numeric, four digits; used with %v
%Y Year, numeric, four digits
%y Year, numeric (two digits)
%% A literal % character
%x x, for any x not listed above
Categories: MySQL Tags: , , , , , , ,

Disemvoweling PHP Function

April 7th, 2009 1 comment

Disemvoweling is a technique used on blogs and forums to censor any post or comment that contains spam or other unwanted text. It involves simply removing the vowels from the text so that it is almost, but not entirely, unreadable.

Use the following function to disemvowel a string of text.

function disemvowel($string)
{
    return str_replace(array('a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U'), '', $string);
}

As an example, the first sentence on this post:

Disemvoweling is a technique used on blogs and forums to censor any post or comment that contains spam or other unwanted text.

would appear like this:

Dsmvwlng s tchnq sd n blgs nd frms t cnsr ny pst r cmmnt tht cntns spm r thr nwntd txt.

Which doesn’t make a lot of sense, but is still kind of readable. This technique kills unwanted comments without removing the text entirely.

Check out the Wikipedia page on Disemvoweling for more information on the origins or this method.

Search A Table With JavaScript

March 10th, 2009 3 comments

Using server side scripts to search for things can be as complex or as simple as the situation requires. However, if you have a table of results and you just want to enable a simple JavaScript search on that table then this might be the script for you.

To search a table using JavaScript you need to split the table into bits, this can be done using the getElementsByTagName() function, which takes the name of the element that you want to capture. So to grab all of the rows of a table as an array you need to pass the value of tr.

var rows = document.getElementsByTagName("tr");

We can then iterate through these rows, grabbing the column that you want to search on, with the following code.

<form action="#" method="get" onsubmit="return false;">
<input type="text" size="30" name="q" id="q" value="" onkeyup="doSearch();" />
</form>

Next, the table. Note that I have added an additional row to the end of this table. This will be used to display a note to the user if they have entered a query that isn’t found.

<table>
<tr><td>One</td></tr>
<tr><td>Two</td></tr>
<tr><td>Three</td></tr>
<tr><td>Four</td></tr>
<tr><td>Five</td></tr>
<tr><td>Six</td></tr>
<tr><td>Seven</td></tr>
<tr><td>Eight</td></tr>
<tr style="display:none;" id="noresults">
 <td>(no listings that start with "<span id="qt"></span>")</td>
</tr>
</table>

The first thing we need to do in our search function is to prepare the search term. This turns the query string to lowercase, which we can then match to the table column.

var q = document.getElementById("q");
var v = q.value.toLowerCase();

Now we can go through each row value and try to match it to the value in the query string. If it matches then we display the row, if not then we hide it.

  for ( var i = 0; i < rows.length; i++ ) {
    var fullname = rows[i].getElementsByTagName("td");
    fullname = fullname[0].innerHTML.toLowerCase();
    if ( fullname ) {
        if ( v.length == 0 || (v.length < 3 && fullname.indexOf(v) == 0) || (v.length >= 3 && fullname.indexOf(v) > -1 ) ) {
        rows[i].style.display = "";
      } else {
        rows[i].style.display = "none";
      }
    }
  }

Here is the full function, including the code to implement the no results note.

<script type="text/javascript">
//<!--
function doSearch() {
  var q = document.getElementById("q");
  var v = q.value.toLowerCase();
  var rows = document.getElementsByTagName("tr");
  var on = 0;
  for ( var i = 0; i < rows.length; i++ ) {
    var fullname = rows[i].getElementsByTagName("td");
    fullname = fullname[0].innerHTML.toLowerCase();
    if ( fullname ) {
        if ( v.length == 0 || (v.length < 3 && fullname.indexOf(v) == 0) || (v.length >= 3 && fullname.indexOf(v) > -1 ) ) {
        rows[i].style.display = "";
        on++;
      } else {
        rows[i].style.display = "none";
      }
    }
  }
  var n = document.getElementById("noresults");
  if ( on == 0 && n ) {
    n.style.display = "";
    document.getElementById("qt").innerHTML = q.value;
  } else {
    n.style.display = "none";
  }
}
//-->
</script>

Here is a working example of the code, with valid HTML.

Convert A Date Into Timestamp In JavaScript

February 27th, 2009 1 comment

I have previously talked about using PHP to convert from a date to a time, but what about doing the same in JavaScript?

To get the unix timestamp using JavaScript you need to use the getTime() function of the build in Date object. As this returns the number of milliseconds then we must divide the number by 1000 and round it in order to get the timestamp in seconds.

Math.round(new Date().getTime()/1000);

To convert a date into a timestamp we need to use the UTC() function of the Date object. This function takes 3 required parameters and 4 optional parameters. The 3 required parameters are the year, month and day, in that order. The optional 4 parameters are the hours, minutes, seconds and milliseconds of the time, in that order.

To create the time do something like this:

var datum = new Date(Date.UTC('2009','01','13','23','31','30'));
return datum.getTime()/1000;

This prints out 1234567890. Here is a function to simplify things:

function toTimestamp(year,month,day,hour,minute,second){
 var datum = new Date(Date.UTC(year,month-1,day,hour,minute,second));
 return datum.getTime()/1000;
}

Notice that when adding in the month you need to minus the value by 1. This is because the function requires a month value between 0 and 11. However, there is an easier way of getting the timestamp by using the parse() function. This function returns the timestamp in milliseconds, so we need to divide it by 1000 in order to get the timestamp.

function toTimestamp(strDate){
 var datum = Date.parse(strDate);
 return datum/1000;
}

This can be run by using the following, not that the date must be month/day when writing the date like this.

alert(toTimestamp('02/13/2009 23:31:30'));

Or even this:

alert(toTimestamp('2009 02 13 23:31:30'));