Creating Web Scrapers

Tim Bond

Seattle PHP Meetup - July 11, 2017

What is a         ?

Things I'm Scraping Right Now

  • Apartment listings
  • HTC's Repair Status Page
  • My eBay Feedback
  • TV Schedules
  • Podcasts
  • Wait times
  • Tons more

Get the Raw Data

  • Look for AJAX requests
  • View source
    • HTML Tags
    • Embedded JSON

Extract the Data

  • explode()
  • SimpleXML
  • DOMDocument
  • regex

Often a combination of two or more

<body>
<div>Irrelevant data</div>
<!-- Details -->
<div>Important data</div>
<!-- Details End -->
</body>
$html = explode('<!-- Details -->', $html)[1];
$html = explode('<!-- Details End -->', $html)[0];

Explode Example

<div>
    <ul>
        <li>One</li>
        <li>Two</li>
        <li>Three</li>
    </ul>
</div>
$xml = simplexml_load_string($string);
foreach($xml->ul->li as $li) {
    echo "$li\n";
}

SimpleXML Example

Warnings

  • Act like a browser
  • Cache
  • Ongoing development

Non-Published APIs

Packet Capture

for Android

Questions