Extract Links From A HTML File With PHP
Use the following function to extract all of the links from a HTML string.
function linkExtractor($html){
$linkArray = array();
if(preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i’,$html,$matches,PREG_SET_ORDER)){
foreach($matches as $match){
array_push($linkArray,array($match[1],$match[2]));
}
}
return $linkArray;
}
To use it just read a web page or file into a string, and pass that string to the function. The following example reads a web page using the PHP CURL functions and then passes the result into the function to retrieve the links.
$url = 'http://www.talkincode.com';
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12');
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
curl_setopt($ch,CURLOPT_TIMEOUT,120);
$html = curl_exec($ch);
curl_close($ch);
echo '<pre>'.print_r(linkExtractor($html),true).'<pre>';
The function will return an array, with each element being an array containing the link location and the text that the link contains.
Comments
Comment from Mark James
Date: September 4, 2008, 9:27 am
Cool Scripts.
Write a comment