Archive

Posts Tagged ‘xml’

XML Sitemap Date Format In PHP

April 3rd, 2009 No comments

To format the current timestamp in W3C Datetime encoding (as used in sitemap.xml files) use the following parameters.

echo date('Y-m-dTH:i:sP', time());

As of PHP5 you can also use the c format character to print the exact same string.

echo date('c',time());

These would both print out the following:

2009-04-03T11:49:00+01:00

Categories: PHP Tags: , , , , ,

Populating A TileList On Creation Complete Using XML In Flex

November 7th, 2008 2 comments

A TileList is part of a group of elements that allow you to add components in a specific order and orientation. The TileList controls the displaying of a number of items set out as tiles and so it best suited to displaying images as thumbnails.

There are many ways to do this, but none of the examples on the Flex site seemed to be very useful, or very well explained. What I wanted to do was to create a TileList that displayed the tiles in a certain way and used an XML file to fill up the list of items with images, each image having a label associated with it.

The first thing to do is to create the TileList element.

<mx:TileList id="imageTileList"
itemRenderer="CustomItemRenderer"
dataProvider="{theImages}"
width="200"
height="400"
columnCount="2"
creationComplete="initList();"/>

This contains three important attributes, which I have described here.

itemRenderer="CustomItemRenderer"
This attribute is used to tell the TileList how to display each element within the list. The CustomItemRenderer refers to a file called CustomItemRenderer.mxml that contains the following:

<?xml version="1.0" encoding="utf-8"?>
<mx:VBox xmlns:mx="http://www.adobe.com/2006/mxml"
horizontalAlign="center"
verticalAlign="middle"
verticalGap="0"
width="80"
height="100"
paddingRight="5"
paddingLeft="5"
paddingTop="5"
paddingBottom="5"
>
 
<mx:Image height="50" width="50" source="{data.strThumbnail}" />
<mx:Label height="20" width="75" text="{data.title}" textAlign="center" color="0x000000" fontWeight="normal" />
</mx:VBox>

This tells the TileList that each item should be kept in a VBox element, which contains an Image element to contain the image and a Label element that contains the label for that image. The source and the text attributes for the Image and Label elements respectively are used to tell the elements what data is to go where.

dataProvider="{theImages}"
This is a reference to an ArrayCollection element that provides a mechanism to access the data within the TileList element. This ArrayCollection element looks like the following and needs to be placed as a direct child of the Application element.

<mx:ArrayCollection id="theImages"></mx:ArrayCollection>

creationComplete="initList();&quot
This is an event call that allows us to populate the TileList just after is has been added to the application. The initList() function call is where everything is put together.

The next step is to add a reference to the XML file that contains the information we want to populate the TileList with. Place this as a child of the Application element.

<mx:Model id="items" source="items.xml" />

This references a file called items.xml, which has the following contents.

<?xml version="1.0" encoding="utf-8"?>
<items>
 <image id="1">
    <title>Image 1</title>
    <strThumbnail>one.png</strThumbnail>
 </image>
 <image id="2">
    <title>Image 2</title>
    <strThumbnail>two.png</strThumbnail>
 </image>
 <image id="3">
    <title>Image 3</title>
    <strThumbnail>three.png</strThumbnail>
 </image>
 <image id="4">
    <title>Image 4</title>
    <strThumbnail>four.png</strThumbnail>
 </image>    
</items>

Finally, we are ready to write some code to get this thing working. The first thing we need to do is create a little helper class that will allow us to convert the XML into a usable format. Create a file called ItemListObject.as and put the following contents in it.

package
{
 [Bindable]
 public class ItemListObject extends Object
 {
  public function ItemListObject() {
   super();
  }
 
  public var title:String = new String();
  public var strThumbnail:String = new String();
 }
}

Next, you can create your Script element, which contains two main parts. The first is a command that makes the ItemListObject (from the ItemListObject.as) file available to other functions, and the second is the all important call to the initList() function.

<mx:Script>
 import ItemListObject;
 
 public function initList():void
 {
  for each ( var node:Object in items.image ) {
   var temp:ItemListObject = new ItemListObject();
   temp.strThumbnail = node.strThumbnail;
   temp.title = node.title;
   theImages.addItem(temp);
  }
 }
</mx:Script>

The initList() function works by getting hold of the items XML file and converting it into an array. For team item it creates a ItemListObject and adds in the Image and Label parts before adding this to the TileList data provider. If everything is working properly then this should produce the following result.

TileList final result

TileList final result

Convert A sitemap.xml File To A HTML Sitemap With PHP

August 13th, 2008 No comments

I have already talked about converting a sitemap.xml file into a urllist.txt file, but what if you want to create a HTML sitemap? If you have a sitemap.xml file then you can use this to spider your site, scrape the contents of each page and populate the HTML file with this information.

The following code does this. For every page it looks for the <title> tag, the description meta tag and the first <h2> tag on the page. These items are then used to construct a segment of HTML for that page.

<?php
$header = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>HTML Sitemap</title>
</head>
<body>';
 
set_time_limit(400);
 
$currentElement = '';
$currentLoc = '';
 
$map = "<h1>HTML Sitemap</h1>"."\n";
 
function parsePage($data){
 global $map;
 /*
 if you want to trap a certain file extention then use the syntax below...
 stripos($data,".php")>0
 stripos($data,".htm")>0
 stripos($data,".asp")>0
 */
 if(stripos($data,".pdf")>0){
  // if the url is a pdf document.
  $map .= '<p><a href="'.$data.'">PDF document.</a></p>'."\n";
  $map .= '<p>A pdf document.</p>'."\n";
 }elseif(stripos($data,".txt")>0){
  // if the url is a text document
  $map .= '<p><a href="'.$data.'">Text document.</a></p>'."\n";
  $map .= '<p>A text document.</p>'."\n";
 }else{
  // try to open it anyway...
  // make sure that you can read the file
  if($urlh = @fopen($data, 'rb')){
   $contents = '';
   //check php version
   if(phpversion()>5){
    $contents = stream_get_contents($urlh);
   }else{
    while(!feof($urlh)){
     $contents .= fread($urlh, 8192);
    };
   };
 
   // find the title
   preg_match('/(?<=\<[Tt][Ii][Tt][Ll][Ee]\>)\s*?(.*?)\s*?(?=\<\/[Tt][Ii][Tt][Ll][Ee]\>)/U',$contents,$title);
   $title = $title[0];
 
   // find the first h1 tag
   $header = array();
   preg_match('/(?<=\<[Hh]2\>)(.*?)(?=\<\/[Hh]2\>)/U',$contents,$header);
   $header = strip_tags($header[0]);
 
   if(strlen($title)>0 && strlen($header)>0){
    // print the title and h1 tag in combo
    $map .= '<p class="link"><a href="'.str_replace('&','&amp;',$data).'" title="'.(strlen($header)>0?trim($header):trim($title)).'">'.trim($title).(strlen($header)>0?" - ".trim($header):'').'</a></p>'."\n";
   }elseif(strlen($title)>0){
    $map .= '<p class="link"><a href="'.str_replace('&','&amp;',$data).'" title="'.trim($title).'">'.trim($title).'</a></p>'."\n";
   }elseif(strlen($header)>0){
    $map .= '<p class="link"><a href="'.str_replace('&','&amp;',$data).'" title="'.trim($header).'">'.trim($header).'</a></p>'."\n";
   };
 
   // find description
   preg_match('/(?<=\<[Mm][Ee][Tt][Aa]\s[Nn][Aa][Mm][Ee]\=\"[Dd]escription\" content\=\")(.*?)(?="\s*?\/?\>)/U',$contents,$description);
   $description = $description[0];
 
   // print description
   if(strlen($description)>0){
    $map .= '<p class="desc">'.trim($description).'</p>'."\n";
   };
   // close the file
   fclose($urlh);
  };
 };
};
 
/////////// XML PARSE FUNCTIONS HERE /////////////
// the start element function
function startElement($xmlParser,$name,$attribs){
 global $currentElement;
 $currentElement = $name;
};
 
// the end element function
function endElement($parser,$name){
 global $currentElement,$currentLoc;
 if($currentElement == 'loc'){
  parsePage($currentLoc);
  $currentLoc = '';
 };
 $currentElement = '';
};
 
// the character data function
function characterData($parser,$data){
 global $currentElement,$currentLoc;
 // if the current element is loc then it will be a url
 if($currentElement == 'loc'){
  $currentLoc .= $data;
 };
};
 
// create parse object
$xml_parser = xml_parser_create();
// turn off case folding!
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING, false);
// set start and end element functions
xml_set_element_handler($xml_parser,"startElement","endElement");
// set character data function
xml_set_character_data_handler($xml_parser,"characterData");
 
// open xml file
if(!($fp = fopen('sitemap.xml',"r"))){
 die("could not open XML input");
};
 
// read the file - print error if something went wrong.
while($data = fread($fp,4096)){
 if(!xml_parse($xml_parser,$data,feof($fp))){
  die(sprintf("XML error: %s at line %d",xml_error_string(xml_get_error_code($xml_parser)),xml_get_current_line_number($xml_parser)));
 };
};
 
// close file
fclose($fp);
 
$footer = '</body>
</html>';
 
// write output to a file
$fp = fopen('sitemap.html',"w+");
fwrite($fp,$header.$map.$footer);
fclose($fp);
 
// print output
echo $header.$map.$footer;
?>

This script prints out the sitemap and also saves the sitemap to a file for later use. This is essential as the script can take a long time to run due to all of the page accessing that it has to do.

This script is failry complicated and has gone through several versions since I first created it so if you find any improvements or bugs then let me know and I will incorporate them.

Categories: PHP Tags: , , , , , , ,

Convert A sitemap.xml File To A urllist.txt File Using PHP

August 12th, 2008 1 comment

If you create a script that produces a sitemap.xml file there is no point in adapting this script so that it creates a urllist.txt file. The best solution is to use this sitemap.xml file to create the urllist.txt. The following script will do exactly this.

$lines = file('sitemap.xml');
$allMatches = array();
 
foreach($lines as $line_number => $line){
 $line = trim($line);
 preg_match_all('/(?<=\<loc\>)(.*?)(?=\<\/loc\>)/U',$line,$matches,PREG_SET_ORDER);
 if($matches){
  if($matches[0][0] != ''){
   $allMatches[] = $matches[0][0];
  };
 };
};
 
$list = '';
foreach($allMatches as $url){
 $list .= $url."\n";
};
$fh = fopen('urllist.txt',"w+");
fwrite($fh,$list);
fclose($fh);
 
// print out list to provide some feedback...
echo $list;

The script works by first loading the sitemap.xml file into an array using the file() function. The script then goes through all of the items in the array and picks out everything between the <loc> tags and puts these into an array. It then adds these to a file called urllist.txt but also prints out the output to provide some indication that the script has run. This can be removed if you want to incorporate it into a larger script.

Categories: PHP Tags: , , , , ,

Downloading Alexa Data With PHP

January 23rd, 2008 4 comments

It is widely known that the data that Alexa offers on visitor numbers is far from accurate, but it is possible to obtain an XML feed from Alexa that allows you to find out all of the data that Alexa offers, which is more than just their visitor numbers. Passing the correct parameters to this feed you can find out related links, contact and domain information, the Alexa rank, associated keywords and Dmoz listings.

As an example here is a feed URL for getting information about the bbc.co.uk page.

http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url=www.bbc.co.uk

So to get information about any site all you have to do is pass the correct URL to this address.

To get this information in a usable form with PHP you can use the curl functions. To download the Alexa feed into PHP use the following code:

$url = 'www.bbc.co.uk';
$querystring = 'http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url='.urlencode($url);
$ch = curl_init();
$user_agent = "Mozilla/4.0";
curl_setopt ($ch, CURLOPT_URL, $querystring);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
$alexaXml = curl_exec($ch);
curl_close($ch);

You now have a variable called alexaXml that contains all of the information you need. You could use some of the XML parsing options within PHP, but a simpler method is to extract the information you need using regular expressions. Here are a few examples.

To get the Alexa popularity.
preg_match('/\<POPULARITY URL="(.*?)" TEXT="(.*?)"\/\>/Ui',$alexaXml,$match);
echo "<p>Popularity: ";
if(count($match)>0){
  echo $match[2];
}else{
  echo 0;
}
echo '</p>';

To get the Alexa links.
preg_match('/LINKSIN NUM="(.*?)"/Ui',$alexaXml,$match);
echo "<p>Links: ";
if(count($match)>0){
  echo $match[1];
}else{
  echo 0;
}
echo '</p>';

To get the Dmoz categories.
preg_match_all('/CAT\sID="(.*)"/U',$alexaXml,$match);
echo "<p>Dmoz cats: ";
if(count($match[1])){
  echo '<pre>'.print_r($match[1],true).'</pre>';
}else{
  echo 0;
}
echo '</p>';

You can also see the data directly by printing off a couple of links.
echo '<a href="http://www.alexa.com/data/ds/linksin?q=link%3A'.urlencode($url).'&url=http%3A//'.urlencode($url).'/" title="Alexa Links">Links</a>';
echo '<br />';
echo '<a href="http://www.alexa.com/data/details/traffic_details/'.urlencode($url).'" title="Alexa Data">Data</a>';

There is more information available than this. To see everything that you can extract just copy the URL at the top into a browser window and view the output directly. I suggest doing this in Firefox because of the nice way in which it displays XML.

Categories: PHP Tags: , , , , , ,