Scrape the web with Goutte
A month ago’s adventures including building a web scraper. Working to a tight schedule, I poked around the tubes and decided to give Goutte a whirl. Goutte is a simple wrapper around Guzzle and a bunch of Symfony components (such as BrowserKit and DomCrawler). In theory this makes grabbing a webpage as simple as:
1 2 3 4 5 6 7 |
use Goutte\Client; $client = new Client(); $crawler = $client->request('GET', 'http://www.symfony.com/blog/'); $titles = $crawler->filter('h2.post > a')->each(function ($node) { return $node->text(); }); |