前言:
写本次项目主要是忙里偷闲想看看漫画,决定写个爬虫练练手。爬取的过程中还是遇到了一些问题。所以这一次项目主要同样是用Selenium来模拟浏览器操作来获取全部图片,用xpath来解析出图片,最后将图片保存到本地。
1、本次目标
利用Selenium爬取漫画图片,将图片保存到本地。
2. 准备工作
本次项目需要用到的库和浏览器和上一节相同,这里就不在赘述。
3、思路分析
本次爬取网站的URL为https://www.shenmanhua.com/glgsqgdspqc/,进去点击第一话可以看到页面如图一所示。
图一
首先可以看到页面的URL为https://www.shenmanhua.com/glgsqgdspqc/1.html。翻页方式为单击翻页,单击之后可以发现,URL并没有变化,还是原来的URL。再点击一章可以发现URL后面的数字增加了1,所以很容易的得出:每一章的URL实际上就是主页面的URL后面接上一个page.html(page为章节数),即章节数所对应的URL的关系就为url = https://www.shenmanhua.com/glgsqgdspqc/ + page.html。该漫画只有79章,page的范围就是1-79。所以只需要将page从range(1,80)中遍历出来,和主页的URL以及后缀字符串“.html“拼接起来,就这样很容易就可以得到全部章节的URL了,剩下的只需要从每个章节提取每一话的图即可。
接下来看如何提取出图片。如图二所示:
图二
可以看到图片的位置保存在class=mh_comicpic的div标签下的img里,但是只有一张,单击跳转到下一话,如图三所示,图片所处同样的位置且只有一张。
图三
到这里,我们可以使用selenium模拟鼠标点击图片来起到跳转下一话的作用,同时运用xpath在对应的标签下解析到图片即可。但是这样做有存在一个问题,那就是每一章的有多少话是不一样的,所以到底要点击多少次来确保取完章节里的所有图,这一点来说不太好控制。
这里也存在一个方法:提取每一张图时,提取出当前页数和最大页数,然后进行对比,小于最大页数就继续执行单击跳转,等于时就结束,URL跳转至下一章。这个方法虽然可行,但是过于复杂,便不推荐。毕竟Python的哲学是简洁和优雅。那么下面就来介绍一下这个简洁又优雅的方法吧。
上面得到图片的方式是单击,可以不可换一个思路,不使用单击,选择连续阅读,如图四所示。
图四
可以看到,选择连续阅读,图片就不需要跳转了,单个页面有所有的图片标签,下拉滚动条图片都呈现了出来。是不是现在可以选中所有图片所在的标签直接使用xpath就可以把图片提取出来了呢?
答案是不可以。
为什么?
仔细看上图可以发现,现在滚动条下拉的位置到了第三张图,第三张图片及其标签下的链接已经显示出来了,但是看到第四张图以及之后的数据并没有显示出来。
这里采用了ajax技术来加载图片,我们需要下拉滚动条来触发ajax来得到图片。(实际上之前讲述的单击阅读,也是通过单击操作的方式来触发ajax得到图片的)
和上一节京东的项目类似,使用selenium缓慢下拉滚动条至底部,确保图片加载完毕,此时在获得完整html数据的情况下,再从里面相应的标签下获取所有的图片即可。
4、获取URL
上面分析已经提到过了如何获取所有章节的URL了,只需要将page从range(1,80)遍历出来,和主URL以及字符串“.html”拼接起来得到新的URL,这样就可以得到所有章节的URL了。代码实现如下:(该代码及以下代码重在演示功能)
1 for page in range(1, 80): 2 url = 'https://www.shenmanhua.com/glgsqgdspqc/' + str(page) + '.html' 3 print(url)
运行结果如下:
1 https://www.shenmanhua.com/glgsqgdspqc/1.html 2 https://www.shenmanhua.com/glgsqgdspqc/2.html 3 https://www.shenmanhua.com/glgsqgdspqc/3.html 4 https://www.shenmanhua.com/glgsqgdspqc/4.html 5 https://www.shenmanhua.com/glgsqgdspqc/5.html 6 https://www.shenmanhua.com/glgsqgdspqc/6.html 7 https://www.shenmanhua.com/glgsqgdspqc/7.html 8 https://www.shenmanhua.com/glgsqgdspqc/8.html 9 https://www.shenmanhua.com/glgsqgdspqc/9.html 10 https://www.shenmanhua.com/glgsqgdspqc/10.html 11 https://www.shenmanhua.com/glgsqgdspqc/11.html 12 https://www.shenmanhua.com/glgsqgdspqc/12.html 13 https://www.shenmanhua.com/glgsqgdspqc/13.html 14 https://www.shenmanhua.com/glgsqgdspqc/14.html 15 https://www.shenmanhua.com/glgsqgdspqc/15.html 16 https://www.shenmanhua.com/glgsqgdspqc/16.html 17 https://www.shenmanhua.com/glgsqgdspqc/17.html 18 https://www.shenmanhua.com/glgsqgdspqc/18.html 19 https://www.shenmanhua.com/glgsqgdspqc/19.html 20 https://www.shenmanhua.com/glgsqgdspqc/20.html 21 https://www.shenmanhua.com/glgsqgdspqc/21.html 22 https://www.shenmanhua.com/glgsqgdspqc/22.html 23 https://www.shenmanhua.com/glgsqgdspqc/23.html 24 https://www.shenmanhua.com/glgsqgdspqc/24.html 25 https://www.shenmanhua.com/glgsqgdspqc/25.html 26 https://www.shenmanhua.com/glgsqgdspqc/26.html 27 https://www.shenmanhua.com/glgsqgdspqc/27.html 28 https://www.shenmanhua.com/glgsqgdspqc/28.html 29 https://www.shenmanhua.com/glgsqgdspqc/29.html 30 https://www.shenmanhua.com/glgsqgdspqc/30.html 31 https://www.shenmanhua.com/glgsqgdspqc/31.html 32 https://www.shenmanhua.com/glgsqgdspqc/32.html 33 https://www.shenmanhua.com/glgsqgdspqc/33.html 34 https://www.shenmanhua.com/glgsqgdspqc/34.html 35 https://www.shenmanhua.com/glgsqgdspqc/35.html 36 https://www.shenmanhua.com/glgsqgdspqc/36.html 37 https://www.shenmanhua.com/glgsqgdspqc/37.html 38 https://www.shenmanhua.com/glgsqgdspqc/38.html 39 https://www.shenmanhua.com/glgsqgdspqc/39.html 40 https://www.shenmanhua.com/glgsqgdspqc/40.html 41 https://www.shenmanhua.com/glgsqgdspqc/41.html 42 https://www.shenmanhua.com/glgsqgdspqc/42.html 43 https://www.shenmanhua.com/glgsqgdspqc/43.html 44 https://www.shenmanhua.com/glgsqgdspqc/44.html 45 https://www.shenmanhua.com/glgsqgdspqc/45.html 46 https://www.shenmanhua.com/glgsqgdspqc/46.html 47 https://www.shenmanhua.com/glgsqgdspqc/47.html 48 https://www.shenmanhua.com/glgsqgdspqc/48.html 49 https://www.shenmanhua.com/glgsqgdspqc/49.html 50 https://www.shenmanhua.com/glgsqgdspqc/50.html 51 https://www.shenmanhua.com/glgsqgdspqc/51.html 52 https://www.shenmanhua.com/glgsqgdspqc/52.html 53 https://www.shenmanhua.com/glgsqgdspqc/53.html 54 https://www.shenmanhua.com/glgsqgdspqc/54.html 55 https://www.shenmanhua.com/glgsqgdspqc/55.html 56 https://www.shenmanhua.com/glgsqgdspqc/56.html 57 https://www.shenmanhua.com/glgsqgdspqc/57.html 58 https://www.shenmanhua.com/glgsqgdspqc/58.html 59 https://www.shenmanhua.com/glgsqgdspqc/59.html 60 https://www.shenmanhua.com/glgsqgdspqc/60.html 61 https://www.shenmanhua.com/glgsqgdspqc/61.html 62 https://www.shenmanhua.com/glgsqgdspqc/62.html 63 https://www.shenmanhua.com/glgsqgdspqc/63.html 64 https://www.shenmanhua.com/glgsqgdspqc/64.html 65 https://www.shenmanhua.com/glgsqgdspqc/65.html 66 https://www.shenmanhua.com/glgsqgdspqc/66.html 67 https://www.shenmanhua.com/glgsqgdspqc/67.html 68 https://www.shenmanhua.com/glgsqgdspqc/68.html 69 https://www.shenmanhua.com/glgsqgdspqc/69.html 70 https://www.shenmanhua.com/glgsqgdspqc/70.html 71 https://www.shenmanhua.com/glgsqgdspqc/71.html 72 https://www.shenmanhua.com/glgsqgdspqc/72.html 73 https://www.shenmanhua.com/glgsqgdspqc/73.html 74 https://www.shenmanhua.com/glgsqgdspqc/74.html 75 https://www.shenmanhua.com/glgsqgdspqc/75.html 76 https://www.shenmanhua.com/glgsqgdspqc/76.html 77 https://www.shenmanhua.com/glgsqgdspqc/77.html 78 https://www.shenmanhua.com/glgsqgdspqc/78.html 79 https://www.shenmanhua.com/glgsqgdspqc/79.html
这样我就得到了全部章节的URL了。
5、获取图片
上面已经详细分析了提取图片的方法,即选择连续阅读模式,下拉滚动条加载全部图片,xpath提取图片。代码实现如下:
1 import time 2 from selenium import webdriver 3 from lxml import etree 4 5 # url为第79章的URL,这里重在演示功能就只取一章 6 url = 'https://www.shenmanhua.com/glgsqgdspqc/79.html' 7 browser = webdriver.Chrome() 8 browser.get(url) 9 # 将按钮调至为连续阅读 10 buttom = browser.find_element_by_xpath(('/html/body/div[2]/div[3]/select[2]/option[3]')) 11 buttom.click() 12 # 缓慢下拉加载出全部漫画 !若果直接下来至底部不能加载出全部漫画 13 for y in range(180): 14 js = 'window.scrollBy(0,100)' 15 browser.execute_script(js) 16 time.sleep(0.1) 17 # 运用xpath规则来提取该章节中所有图片 18 html = etree.HTML(browser.page_source) 19 images = html.xpath("//div[@class='mh_comicpic']/img/@src") 20 # 遍历出该章节中每一话的图片 21 for image in images: 22 print(image) 23 browser.close()
运行结果如下:
1 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F1.jpg-noresize.webp 2 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F2.jpg-noresize.webp 3 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F3.jpg-noresize.webp 4 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F4.jpg-noresize.webp 5 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F5.jpg-noresize.webp 6 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F6.jpg-noresize.webp 7 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F7.jpg-noresize.webp 8 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F8.jpg-noresize.webp 9 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F9.jpg-noresize.webp
这样我们就可以得到每一章节的全部图片了。
6、完整代码
1 from lxml import etree 2 import requests 3 from selenium import webdriver 4 import time 5 6 7 def main(): 8 print('爬虫运行开始!') 9 print('正在爬取中...') 10 for page in range(1, 80): 11 url = get_url(page) 12 html = get_page(url) 13 get_image(html, page) 14 print('爬虫运行结束!') 15 16 17 def get_url(page): 18 url = 'https://www.shenmanhua.com/glgsqgdspqc/' + str(page) + '.html' 19 return url 20 21 22 def get_page(url): 23 try: 24 browser.get(url) 25 # 将按钮调至为连续阅读 !每一页只有一副漫画,该网站应该是点击页码数来触发ajax,编写代码比较麻烦 26 buttom = browser.find_element_by_xpath(('/html/body/div[2]/div[3]/select[2]/option[3]')) 27 buttom.click() 28 # 缓慢下拉加载出全部漫画 !若果直接下来至底部不能加载出全部漫画 29 for y in range(180): 30 js = 'window.scrollBy(0,100)' 31 browser.execute_script(js) 32 time.sleep(0.1) 33 html = etree.HTML(browser.page_source) 34 current_page = url[-6] 35 except Exception as error: 36 print('第%s章爬取出错啦! 出错连接为:' % current_page, url, error) 37 finally: 38 print('正在爬取第%s章...' % current_page) 39 return html 40 41 42 def get_image(html, page): 43 images = html.xpath("//div[@class='mh_comicpic']/img/@src") 44 i = 1 45 # 漫画章节数,按章节话数给图片命名 例如:第一章 1-1 ... 46 chapter = str(page) + '-' 47 for image in images: 48 print(image) 49 image_name = chapter + str(i) +jpg' 50 with open(image_name, 'wb') as f: 51 response = requests.get(image) 52 f.write(response.content) 53 i += 1 54 print('第%s章爬取完毕!' % page) 55 56 57 if __name__ == "__main__": 58 browser = webdriver.Chrome() 59 main() 60 browser.close()
运行结果如下:
1 爬虫运行开始! 2 正在爬取中... 3 正在爬取第1章... 4 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F1.jpg-noresize.webp 5 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F2.jpg-noresize.webp 6 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F3.jpg-noresize.webp 7 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F4.jpg-noresize.webp 8 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F5.jpg-noresize.webp 9 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F6.jpg-noresize.webp 10 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F7.jpg-noresize.webp 11 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F8.jpg-noresize.webp 12 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F9.jpg-noresize.webp 13 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F10.jpg-noresize.webp 14 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F11.jpg-noresize.webp 15 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F12.jpg-noresize.webp 16 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F13.jpg-noresize.webp 17 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F14.jpg-noresize.webp 18 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F15.jpg-noresize.webp 19 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F16.jpg-noresize.webp 20 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F17.jpg-noresize.webp 21 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F18.jpg-noresize.webp 22 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F19.jpg-noresize.webp 23 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F20.jpg-noresize.webp 24 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F21.jpg-noresize.webp 25 第1章爬取完毕! 26 正在爬取第2章... 27 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F1.jpg-noresize.webp 28 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F2.jpg-noresize.webp 29 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F3.jpg-noresize.webp 30 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F4.jpg-noresize.webp 31 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F5.jpg-noresize.webp 32 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F6.jpg-noresize.webp 33 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F7.jpg-noresize.webp 34 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F8.jpg-noresize.webp 35 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F9.jpg-noresize.webp 36 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F10.jpg-noresize.webp 37 第2章爬取完毕!
.....
图片保存到本地如图五所示:
图五
这样所有的图片都爬取下来了,项目大功告成!(图五左下角,冥冥之中自有...)
结语:
这次爬虫小项目写下来还是收获挺多的,特别是重温了一下这部很喜欢的漫画,勾起了我很多的回忆。
青春,因为遗憾才显得弥足珍贵