request爬虫通用框架

news/2024/11/16 13:27:31/

 requests.get() 爬取网页通用框架

使用方法:

1.复制代码,保存为get_url.py

2.在新py文件中 import get_url

3. r = get_url.request_get_text('https://******')    # 返回字符串

r = get_url.request_get_content('https://******')  # 返回字节码

import requests
import randomdef header():headers_list = [{'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 10; SM-G981B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (iPad; CPU OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.109 Safari/537.36 CrKey/1.54.248666'}, {'user-agent': 'Mozilla/5.0 (X11; Linux aarch64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.188 Safari/537.36 CrKey/1.54.250320'}, {'user-agent': 'Mozilla/5.0 (BB10; Touch) AppleWebKit/537.10+ (KHTML, like Gecko) Version/10.0.9.2372 Mobile Safari/537.10+'}, {'user-agent': 'Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.3; en-us; SM-N900T Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.1; en-us; GT-N7100 Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.0; en-us; GT-I9300 Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 7.0; SM-G950U Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G965U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.111 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.1.0; SM-T837A) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.80 Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; U; en-us; KFAPWI Build/JDQ39) AppleWebKit/535.19 (KHTML, like Gecko) Silk/3.13 Safari/535.19 Silk-Accelerated=true'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.4.2; en-us; LGMS323 Build/KOT49I.MS32310c) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 550) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Mobile Safari/537.36 Edge/14.14263'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Moto G (4)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 10 Build/MOB31T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Nexus 5X Build/OPR4.170623.006) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F26U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Nexus 6P Build/OPP3.170518.006) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 7 Build/MOB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)'}, {'user-agent': 'Mozilla/5.0 (MeeGo; NokiaN9) AppleWebKit/534.13 (KHTML, like Gecko) NokiaBrowser/8.5.0 Mobile Safari/534.13'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 9; Pixel 3 Build/PQ1A.181105.017.A1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.158 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 10; Pixel 4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 11; Pixel 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.181 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Pixel 2 XL Build/OPD1.170816.004) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1'}, {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1'}, {'user-agent': 'Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1'}]header = random.choice(headers_list)return headerdef request_get_text(url):try:r = requests.get(url,headers = header(),timeout = 5)r.raise_for_status()r.encoding = r.apparent.encoding()r.close()return r.textexcept:for i in range(3):r = requests.get(url,headers = header(),timeout = 5)if r.status_code == 200:r.close()  return r.textr.raise_for_status()def request_get_content(url):try:r = requests.get(url,headers = header(),timeout = 5)r.raise_for_status()r.close()return r.contentexcept:for i in range(3):r = requests.get(url,headers = header(),timeout = 5)if r.status_code == 200:r.close()  return r.contentr.raise_for_status()if __name__ == '__main__':pass


http://www.ppmy.cn/news/508580.html

相关文章

linux与windows市场占有率,Windows 10 Mobile市场份额已达14%

导读 AdDuplex公布了2016年8月22日Windows Mobile生态系统当中智能手机访问其广告网络的快照数据。捕获全世界24小时内Windows Mobile智能手机使用情况。 随着Windows 10 Mobile发布,新的操作系统市场份额不断增长,但是在7月份Windows 10 Mobile市场份额…

Windows Phone 8.1 应用商店将于 12 月 16 日关闭

微软最近在更新的技术支持文档中宣布 Windows Phone 8.1 应用商店将于 2019 年 12 月 16 日关闭。 微软称,Windows Phone 8.1 系统的支持在 2017 年 7 月 11 日结束,作为支持流程结束的一个顶点,Windows Phone 8.1 应用商店将在 12 月 16 日…

华为说企业使用windows连肉末都吃不到

这么多年我们使用windows并没有创造出一家大型软件公司,都是跟在微软后面捡丢下的骨头。现在微软自己卖windows phone又自己吃,卖自己的手机,大家跟着windows还能吃到骨头吗?现在是彻底消灭微软消灭windows的时候。苹果也是吃独食…

wp8.1 java_WP8.1系统升级到Win10 Mobile正式版的方法

很多使用Windows Phone 8.1系统的用户都一直在等待着Win10移动版系统的到来。现在,微软已经正式面向首批WP8.1手机开放了Win10 Mobile正式版升级。那么,我们该如何从WP8.1系统升级到Windows 10 Mobile正式版呢?下面,就随小编一起看…

Bootstrap--模仿官网写一个页面

本文参考Bootstrap官方文档写了简单页面来熟悉Bootstrap的栅格系统、常用CSS样、Javascript插件和部分组件。 以下html代码可以直接复制本地运行&#xff1a; BootstrapPage1:常见的一种页面类型&#xff0c;页面导航&#xff0c;左侧分类、右侧新闻列表 【点击查看效果】 <…

中国移动Lumia机强制升级Windows10手机开发者预览版的方法

【最新消息4-9】微软已经确定将于PST太平洋标准时间周五上午十点也就是北京时间本周六&#xff08;4-11&#xff09;凌晨一点推送更新Windows10手机预览版 但是推送更新的机型不包括Lumia Icon 和930 以下是详细列表&#xff1a; Lumia1020/1320/1520/520/525/526/530/530双卡版…

Nokia Lumia1330/1335首评发布,Nokia Lumia 1330/1335后壳现身 将支持LTE-A

【评测360】最近Lumia 1330配置已遭泄露&#xff0c;现在该设备谍照也已现身百度诺记吧。从后壳谍照显示&#xff0c;Lumia 1330背部拥有Microsoft的标志&#xff0c;后置摄像头凸起&#xff0c;但是未看见此前曝光的PureView认证的迹象&#xff0c;仅有蔡司&#xff08;ZEISS&…

微软Lumia出现触摸屏的问题的相关介绍

目前外界的注意力都被集中在Lumia的Denim升级上。不过&#xff0c;微软最近还为Lumia535推出了修复补丁&#xff0c;可以说这个修复补丁比Denim升级更加重要&#xff0c;因为它修复了用户遇到的触摸屏可用性问题。目前&#xff0c;Denim升级包已经内建了这个补丁。 这个修复补丁…