PHP Classes

PHP Scraper: Extract structured data from remote HTML pages

Recommend this page to a friend!
  Info   View files Example   View files View files (2)   DownloadInstall with Composer Download .zip   Reputation   Support forum (8)   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStar 52%Total: 6,196 This week: 1All time: 359 This week: 560Down
Version License PHP version Categories
phpscraper 1.0.0GNU General Publi...3HTML, Web services
Description 

Author

This class is meant to fetch remote HTML pages and parse them to extract structured information into arrays.

It can take a model of the definition of the structure of a given page and process it to clip the relevant fields of information.

Picture of Antonio Rodrigues
Name: Antonio Rodrigues <contact>
Classes: 1 package by
Country: Portugal Portugal
Age: ???
All time rank: 4585 in Portugal Portugal
Week rank: 411 Up3 in Portugal Portugal Up

Recommendations

What is the best PHP web content crawler class?
Extracting content by passing the URL of a web site

Example

<?
require_once ("class_scraper.php");

// Get html --------
$o_sc = new scraper();
$s_url = 'http://finance.yahoo.com/q/hp?s=AMZN';
$s_user_agent = 'Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0';
$s_html = $o_sc->browse($s_url, $s_user_agent);

// Delimit start and end of patterns
$s_start_pattern = "Adj Close";
$s_end_pattern = "<small>";

// Pattern structure
$s_model = '<tr
<td
<field>
</td>
<td
<field>
</td>
<td
<field>
</td>
<td
<field>
</td>
<td
<field>
</td>
<td
<field>
</td>
<td
<field>
</td>
/tr>'
;

$a_result = $o_sc->extract($s_html, $s_start_pattern, $s_end_pattern, $s_model);
print_r ($a_result);

?>


  Files folder image Files  
File Role Description
Plain text file class_scraper.php Class Main class
Accessible without login Plain text file test_scraper.php Example Example

 Version Control Unique User Downloads Download Rankings  
 0%
Total:6,196
This week:1
All time:359
This week:560Down
User Ratings User Comments (4)
 All time
Utility:77%StarStarStarStar
Consistency:77%StarStarStarStar
Documentation:-
Examples:64%StarStarStarStar
Tests:-
Videos:-
Overall:52%StarStarStar
Rank:2285
 
superb
7 years ago (muabshir)
70%StarStarStarStar
Thanks you have done good work helpful save my time almost 2 ...
12 years ago (Shoaib Jilani)
65%StarStarStarStar
Good class just comment out the "Clean html" lines to allow n...
15 years ago (Marios Kaintatzis)
67%StarStarStarStar
it's not flexible enough.
16 years ago (Tudor Pop)
17%Star