Archive

Posts Tagged ‘url’

Using The e Modifier In PHP preg_replace

August 20th, 2008 10 comments

The PHP function preg_replace() has powerful functionality in its own right, but extra depth can be added with the inclusion of the e modifier. Take the following bit of code, which just picks out the letters of a string and replaces them with the letter X.

$something = 'df1gdf2gdf3sgdfg';
$something = preg_replace("/([a-z]*)/", "X", $something);
echo $something; // XX1XX2XX3XX

This is simple enough, but using the e modifier allows us to use PHP functions within the replace parameters. The following bit of code turns all letters upper case in a string of random letters by using the strtoupper() PHP function.

$something = 'df1gdf2gdf3sgdfg';
$something = preg_replace("/([a-z]*)/e", "strtoupper('\\1')", $something);
echo $something; // DF1GDF2GDF3SGDFG

Here is another example, but in this case the full string is repeated after the modified string.

$something = 'df1gdf2gdf3sgdfg';
$something = preg_replace("/([a-z0-9]*)/e", "strtoupper('\\1').'\\1'", $something);
echo $something; // DF1GDF2GDF3SGDFGdf1gdf2gdf3sgdfg

Notice that when using the e modifier it is important to properly escape the string with single and doulbe quotes. This is because the string as a whole is parsed as PHP and so if you don’t put single quotes around the backreferences then you will get PHP complaining about constants.

For a more complex example I modified the createTextLinks() function that wrote about recently on the site. The function originally found any URL strings within a larger string and turned them into links. The modified function now returns the same thing, except that the link text has been shortened using the shortenurl() function.

$longurl = "there is the new site http://www.google.co.uk/search?aq=f&num=100&hl=en&client=firefox-a&channel=s&rls=org.mozilla%3Aen-US%3Aofficial";
 
function createShortTextLinks($str='') {
 
 if($str=='' or !preg_match('/(http|www\.|@)/im', $str)){
  return $str;
 }
 
 // replace links:
 $str = preg_replace("/([ \t]|^)www\./im", "\\1http://www.", $str);
 $str = preg_replace("/([ \t]|^)ftp\./im", "\\1ftp://ftp.", $str);
 
 $str = preg_replace("/(https?:\/\/[^ )\r\n!]+)/eim", "'<a href=\"\\1\" title=\"\\1\">'.shortenurl('\\1').'</a>'", $str);
 
 $str = preg_replace("/(ftp:\/\/[^ )\r\n!]+)/eim", "'<a href=\"\\1\" title=\"\\1\">'.shortenurl('\\1').'</a>'", $str);
 
 $str = preg_replace("/([-a-z0-9_]+(\.[_a-z0-9-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)+))/eim", "'<a href=\"mailto:\\1\" title=\"Email \\1\">'.shortenurl('\\1').'</a>'", $str);
 
 $str = preg_replace("/(\&)/im","\\1amp;", $str);
 
 return $str;
}
 
function shortenurl($url){
 if(strlen($url) > 45){
  return substr($url, 0, 30)."[...]".substr($url, -15);
 }else{
  return $url;
 }
}
 
echo createShortTextLinks($longurl);

Tidy Up A URL With PHP

August 4th, 2008 No comments

Lots of applications require a user to input a URL and lots of problems occur as a result. I was recently looking for something that would take a URL as an input and allow me to make sure that is was formatted properly. There wasn’t anything that did this so I decided to write it myself.

The following function takes in a URL as a string and tries to clean it up. It essentially does this by splitting is apart and then putting it back together again using the parse_url() function. In order to make sure that this function works you need to put a schema in front of the URL, so the first thing the function does (after trimming the string) is to check that a schema exists. If it doesn’t then the function adds this onto the end.

function tidyUrl($url){
 // trim the string
 $url = trim($url);
 // check for a schema and if there isn't one then add it
 if(substr($url,0,5)!='https' && substr($url,0,4)!='http' && substr($url,0,3)!='ftp'){
  $url = 'http://'.$url;
 };
  parse the url
 $parsed = @parse_url($url);
 if(!is_array($parsed)){
  return false;
 }
 // rebuild url
 $url = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '' : '//') : '';
 $url .= isset($parsed['user']) ? $parsed['user'].(isset($parsed['pass']) ? ':'.$parsed['pass'] : '').'@' : '';
 $url .= isset($parsed['host']) ? $parsed['host'] : '';
 $url .= isset($parsed['port']) ? ':'.$parsed['port'] : '';
 // if no path exists then add a slash
 if(isset($parsed['path'])){
  $url .= (substr($parsed['path'],0,1) == '/') ?   $parsed['path'] : ('/'.$parsed['path']);
 }else{
  $url .= '/';
 };
 // append query
 $url .= isset($parsed['query']) ? '?'.$parsed['query'] : '';
 // return url string
 return $url;
}

The parse_url() function should return an array is successful, if it doesn’t then the function checks for this and returns false.

This function is also useful if you want to keep a standard format to any URL that you store. To make this easier in the long term you should store any domain URL with the trailing slash. If none is added by the user then the function adds it onto the end.

Categories: PHP Strings Tags: , , , , , ,

Shortening Long URLs With PHP

June 13th, 2008 2 comments

Print out a full URL for a link will sometimes mess up your formatting, especially if you URL is quite long. This might be the case if you are linking to a Google search page, or have an automated script that shows numerous URLs of indeterminate length. The following function will reduce any URL longer than 45 characters by splitting it in two and join them up with a simple string.

function shortenurl($url){
 if(strlen($url) > 45){
  return substr($url, 0, 30)."[...]".substr($url, -15);
 }else{
  return $url;
 }
}

You can use the function in the following way.

// long URL, in this case a Google search query
$longurl = "http://www.google.co.uk/search?aq=f&num=100&hl=en&client=firefox-a&channel=s&rls=org.mozilla%3Aen-US%3Aofficial&q=talk+in+code&btnG=Search&meta=";
$shorturl = shortenurl($longurl);
echo '<a href="'.$longurl.'" title="'.$longurl.'">'.$shorturl.'</a>';

This will print out the URL string as.

http://www.google.co.uk/search[...]nG=Search&meta=

Categories: PHP Strings Tags: , , ,

Preparing A URL With PHP

June 12th, 2008 No comments

There might be many instances where you will create a program in PHP that takes a URL as input and does something with the address. This might be a site analysis or an image resize, but whatever the use is, you need to be sure that the URL will work or at least has the same format.

What users tend to leave out of a URL string is the http:// bit at the start. You could validate the URL to force the user to do this, but you will end up annoying a few people. By far the best way of making sure that the URL has the http:// bit at the start is by adding it behind the scenes. The best way to this is to remove the http:// from the start of the string, even if it isn’t there and then add it back on.

$url = str_replace('http://','',$url);
$url = 'http://'.$url;

This way you absolutely ensure that http:// is there, regardless of whether the user entered it, and you don’t have to use any complicated substr()/strpos() combinations to figure out what the URL looks like. Of course this doesn’t account for the https protocol, but you can test for this by treating the string as an array.

if($url[4].$url[5].$url[6]=='s:/'){
 // https
}

Once you have the URL prepared there is no guarantee that the URL is actually an address to a viable resource. To check this you can use the PHP5 function get_headers() to check that the resource returns a valid resource code. The get_headers() function will return an array containing the headers returned from the URL. The first item in the array is always the response code, so if the URL is valid you will get a response of HTTP/1.1 200 OK, so you just have to check for this.

One small note of interest is that the get_headers() call can have a small delay and it doesn’t really understand locations that redirect to a difference source so it might not work all of the time. However, if it doesn’t work then the chances are that your script won’t work either!

Categories: PHP Strings Tags: , , , ,

Check Backlinks With PHP

June 5th, 2008 No comments

Backlinks are an important part of search engine optimisation and are also useful in seeing what sort of things are popular on your site. If you have a list of known backlinks that you want to keep track of then you can do it manually, or you can get a script to do it for you.

The following function takes two arguments, the first being the remote URL and the second is the URL you are checking for. The function works by doing a small amount of initial formatting on the URL you are checking for and then downloading the remote page in little bits and seeing if each bit contains a link. If it does then the function breaks out and returns true. If the link isn’t there then it returns false.

function check_back_link($remote_url, $your_link) {
  $match_pattern = preg_quote(rtrim($your_link, "/"), "/");
  $found = false;
  if($handle = @fopen($remote_url, "r")){
    while(!feof($handle)){
      $part = fread($handle, 1024);
      if(preg_match("/<a(.*)href=[\"']".$match_pattern."(\/?)[\"'](.*)>(.*)<\/a>/", $part)){
        $found = true;
        break;
      }
    }
    fclose($handle);
  }
  return $found;
}

Here is an example of the function in use.

if(check_back_link('http://www.google.com','http://www.talkincode.com')){
  echo 'link found';
}else{
  echo 'link NOT found';
};
// this prints 'link NOT found' - and probably will forever!

To get the most out of this function it is best to have a plain text file full of all of the links that you think you have (one per line) and use the file() function in PHP to load all of the URLs into an array. you can then loop through this array and see which sites have a link to your site and which don’t.

Also, if you are looking at a lot of URLs then you might want to include a call to set_time_limit() function with a parameter of something like 300 at the top of the script. This stops the script from timing out after 30 seconds (default) and your script will probably take a little longer than this to run

Categories: PHP Tags: , , , ,