用PHP的Guzzle库编写的图片爬虫程序

使用 PHP 的 Guzzle 库编写一个图片爬虫程序是一个非常常见的任务，Guzzle 是一个流行的 HTTP 请求库，允许你轻松地发送请求和处理响应。

下面是一个使用 Guzzle 编写的图片爬虫程序示例。此程序将从指定的网页中提取图片链接并将图片下载到本地。

https://i-blog.csdnimg.cn/direct/c5683707eb7a46128dbc59d98af5d6cb.png#pic_center" alt="在这里插入图片描述" />

1、安装 Guzzle

首先，确保你已经安装了 Guzzle 库。你可以通过 Composer 安装 Guzzle：

composer require guzzlehttp/guzzle

2、创建图片爬虫程序

接下来，我们创建一个 PHP 文件 image_scraper.php，该文件会爬取指定网页中的图片链接，并将其下载到本地。

代码示例：

php"><?phprequire 'vendor/autoload.php';use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;
use GuzzleHttp\Exception\GuzzleException;
use Symfony\Component\DomCrawler\Crawler;// 创建 Guzzle 客户端
$client = new Client();// 下载图片函数
function downloadImage($url, $savePath) {global $client;try {// 发送 GET 请求获取图片数据$response = $client->get($url, ['sink' => $savePath]);echo "下载完成: $savePath\n";} catch (RequestException $e) {echo "下载失败: {$e->getMessage()}\n";}
}// 爬取网页中的图片链接
function scrapeImages($url, $saveDir) {global $client;try {// 发送 GET 请求获取网页内容$response = $client->get($url);$html = (string) $response->getBody();// 使用 DomCrawler 提取图片标签中的 src 属性$crawler = new Crawler($html);$images = $crawler->filter('img')->each(function (Crawler $node) {return $node->attr('src');});// 确保保存目录存在if (!is_dir($saveDir)) {mkdir($saveDir, 0777, true);}// 下载每一张图片foreach ($images as $index => $imageUrl) {$imageUrl = filter_var($imageUrl, FILTER_VALIDATE_URL) ? $imageUrl : $url . '/' . ltrim($imageUrl, '/');$imagePath = $saveDir . '/image_' . ($index + 1) . '.jpg';downloadImage($imageUrl, $imagePath);}} catch (GuzzleException $e) {echo "请求失败: {$e->getMessage()}\n";}
}// 主程序
$url = 'https://example.com';  // 替换为要爬取的网页 URL
$saveDir = 'downloaded_images';  // 图片保存目录
scrapeImages($url, $saveDir);

代码说明：

Guzzle 客户端：
- 使用 new Client() 创建一个 Guzzle HTTP 客户端实例，用于发送请求。
downloadImage 函数：
- 这个函数接收图片的 URL 和保存路径，发送 GET 请求获取图片并将其保存到指定路径。
- sink 选项告诉 Guzzle 直接将响应的内容保存到文件中。
scrapeImages 函数：
- 发送 GET 请求获取网页 HTML 内容。
- 使用 Symfony\Component\DomCrawler\Crawler 类解析网页并提取所有 <img> 标签的 src 属性值，获取图片的 URL。
- filter('img') 用于选择网页中的所有图片标签。
- each 方法用于遍历每个图片节点，提取其 src 属性并保存到数组中。
- 为每个图片 URL 下载并保存图片，保存路径为 downloaded_images/image_1.jpg 等。
相对路径问题：
- 如果图片链接是相对路径，代码会自动将它转换为绝对路径。filter_var($imageUrl, FILTER_VALIDATE_URL) 判断 URL 是否为有效的绝对路径，如果不是，则拼接基 URL。
文件夹创建：
- mkdir($saveDir, 0777, true) 会创建保存图片的目录，如果目录不存在的话。
错误处理：
- 使用 try-catch 捕获请求失败或下载失败的错误，并打印错误消息。