获取摩拜单车在地区的车辆python多线程实现

news/2024/11/17 19:58:45/

通过微信小程序(摩拜),填写请求头,数据,post方式传递给服务器获取response

反反爬虫:useragent轮转(手机useragent)、代理ip、休眠0.1s

代码分为两部分:多线程获取代理ip,多线程爬虫

一、多线程获取代理ip

from urllib.request import urlopen
import re
import requests
from bs4 import BeautifulSoup as bs
from urllib import request
import socket
import threading
import time#init timeout = 3
socket.setdefaulttimeout(5)
test_url = "http://ip.chinaz.com/getip.aspx"#request the xiciURL and get the response
def request_to_get(url):hearder = {"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Encoding":"gzip, deflate","Accept-Language":"zh-CN,zh;q=0.9","Connection":"keep-alive","Host":"www.xicidaili.com","Referer":"http://www.xicidaili.com/","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",}response = requests.get(url,headers=hearder).contentcontent = str(response,encoding = "utf-8")bs_obj = bs(content,"html.parser")return bs_obj#get ip port and return a list format:{"https://":"ip:port"}
def find_ip_port(bs_obj):ip_list = []port_list = []ips = bs_obj.findAll('tr')for x in range(1,len(ips)):ip = ips[x]tds = ip.findAll("td")ip_list.append(tds[1].text)port_list.append(tds[2].text)proxys = []for i in range(len(ip_list)):proxy_host = "http://"+ip_list[i]+":"+port_list[i]proxy_temp = {"http":proxy_host}proxys.append(proxy_temp)return proxys#check ip alright
def check_ip(alright_proxys,proxy):try:proxy_support = request.ProxyHandler(proxy)opener = request.build_opener(proxy_support)opener.addheaders =  [('User-Agent','Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36')]request.install_opener(opener)response = request.urlopen(test_url).read()content = str(response,encoding = "utf-8")alright_proxys.append(proxy)#print(proxy)#print(content)#print("is alright")except Exception as e:#print(proxy)#print(e)pass#test the proxy and return proxy that can be used
def return_ok_proxys(proxys):alright_proxys = []for i in range(len(proxys)):t = threading.Thread(target = check_ip,args = (alright_proxys,proxys[i],))t.start()time.sleep(5)return alright_proxys#main function
def main_function():url = "http://www.xicidaili.com/nn/"bs_obj = request_to_get(url)proxys = find_ip_port(bs_obj)alright_proxys = return_ok_proxys(proxys)return alright_proxys

二、多线程爬虫

long、alt为经纬度,百度自己找范围,city_code为标准的城市编码,wxcode用假的即可,开始也不知道后来百度发现可以

import requests
import time
import threading
import random
from get_ip_pools import *url = "https://mwx.mobike.com/mobike-api/rent/nearbyBikesInfo.do"
bike_id = []
user_agents = ['Mozilla/5.0 (Linux; U; Android 5.1; zh-cn; m1 metal Build/LMY47I) AppleWebKit/537.36 (KHTML, like Gecko)Version/4.0 Chrome/37.0.0.0 MQQBrowser/7.6 Mobile Safari/537.36','Mozilla/5.0 (Linux; Android 5.1.1; vivo X7 Build/LMY47V; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/48.0.2564.116 Mobile Safari/537.36 baiduboxapp/8.6.5 (Baidu; P1 5.1.1)','Mozilla/5.0 (Linux; Android 6.0; MP1512 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/35.0.1916.138 Mobile Safari/537.36 T7/7.4 baiduboxapp/8.4 (Baidu; P1 6.0)','Mozilla/5.0 (Linux; U; Android 4.4.4; zh-cn; X9007 Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko)Version/4.0 Chrome/37.0.0.0 MQQBrowser/7.6 Mobile Safari/537.36','Mozilla/5.0 (iPhone 6s; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 MQQBrowser/7.6.0 Mobile/14E304 Safari/8536.25 MttCustomUA/2 QBWebViewType/1 WKType/1','Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; vivo Xplay6 Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko)Version/4.0 Chrome/37.0.0.0 MQQBrowser/7.6 Mobile Safari/537.36','Mozilla/5.0 (Linux; Android 6.0.1; SM-A9000 Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/48.0.2564.116 Mobile Safari/537.36 baiduboxapp/8.6.5 (Baidu; P1 6.0.1)','Mozilla/5.0 (Linux; Android 6.0.1; vivo X9Plus Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/48.0.2564.116 Mobile Safari/537.36 baiduboxapp/8.6.5 (Baidu; P1 6.0.1)','Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko) Mobile/14C92 MicroMessenger/6.5.9 NetType/WIFI Language/zh_CN','Mozilla/5.0 (Linux; Android 7.1.1; OPPO R11t Build/NMF26X; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043307 Safari/537.36 MicroMessenger/6.5.8.1060 NetType/WIFI Language/zh_CN','Mozilla/5.0 (iPhone 6s; CPU iPhone OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 MQQBrowser/7.5.1 Mobile/13G36 Safari/8536.25 MttCustomUA/2 QBWebViewType/1 WKType/1']
t_count = 1def init_datas(user_agent,city_code,long_itude,lat_itude):header = {'host':'mwx.mobike.com','content-type':'application/x-www-form-urlencoded','opensrc':'list','moblieno':'','wxcode':'fuck_wxcode','platform':'3','accept-language':'zh-cn','subsource':'','lang':'zh','user-agent':'%s'%user_agent,'time':'%s'%str(int(time.time()*1000)),'citycode':'%s'%city_code,}datas = {'verticalAccuracy':10,'speed':-1,'horizontalAccuracy':65,'accuracy':65,'citycode':'getLocation:ok','citycode':'%s'%city_code,'wxcode':'fuck_wxcode','longitude': '%s'%long_itude,'latitude': '%s'%lat_itude,}return header,datasdef main(city_code,long,lat,ip):global url,bike_id,t_count#print("%s is excute"%t_count)one_agent = user_agents[random.randint(0,len(user_agents)-1)]header,datas = init_datas(one_agent,city_code,long,lat)data = requests.post(url,headers = header,data = datas,proxies = ip).contentimport jsondata = json.loads(str(data,encoding = 'utf-8'))obj = data['object']try:for i in obj:if i['distId'] not in bike_id:bike_id.append(i['distId'])print(i['distId'])except Exception as e:print(e)pass#print("%s is finish"%t_count)t_count += 1city_code = '010'
start_long = "116.250000000000"
start_alt = "39.910000000000"
end_long = "116.330000000000"
end_alt = "39.92000000000"
threads = []
thread_count = 0
ip_pools = main_function()for i in range(int(float(start_long)*2000),int(float(end_long)*2000)):for j in range(int(float(start_alt)*2000),int(float(end_alt)*2000)):long_itude = str(float(i)/2000.0) + "00000000000"lat_itude = str(float(j)/2000.0) + "00000000000"t = threading.Thread(target = main,args = (city_code,long_itude,lat_itude,ip_pools[random.randint(0,len(ip_pools)-1)],))t.start()time.sleep(0.1)threads.append(t)thread_count += 1if thread_count == 128:for t in threads:t.join()thread_count = 0threads = []
print(len(bike_id))    

三、效果图


自己跑的找不到了懒得跑拿同学跑的吧(自己帮同学写的mmp我好大公无私)


http://www.ppmy.cn/news/271995.html

相关文章

python3.6爬虫案例:爬取顶点小说(爱看小说同学的福利)

一、写在前面 这次本来打算爬百思不得姐视频的,谁料赶上此网站调整,视频专栏下线了,网站中也没有视频可爬。所幸先来说说如何爬取顶点小说吧。 顶点小说(https://www.x23us.com)里面的内容很丰富,不过我们要…

【leetcode】周赛197---(1)1512. 好数对的数目(2)1513. 仅含 1 的子串数(3)1514. 概率最大的路径(4)1515. 服务中心的最佳位置

1512、给你一个整数数组 nums 。 如果一组数字 (i,j) 满足 nums[i] nums[j] 且 i < j &#xff0c;就可以认为这是一组 好数对 。 返回好数对的数目。 示例 1&#xff1a; 输入&#xff1a;nums [1,2,3,1,1,3] 输出&#xff1a;4 解释&#xff1a;有 4 组好数对&#xff0…

hdu 1512 Monkey King (左偏树 可并堆)

hdu 1512 Monkey King &#xff08;左偏树 实现 可并堆&#xff09; 模板&#xff1a;http://hi.baidu.com/cjn1466572108/item/c2b7c13e58f7aba1b711dba6 待验证 //#pragma comment(linker, "/STACK:1024000000,1024000000") #include <cstdio> #include &l…

【DB宝42】MySQL高可用架构MHA+ProxySQL实现读写分离和负载均衡

文章目录 一、MHAProxySQL架构二、快速搭建MHA环境2.1 下载MHA镜像2.2 编辑yml文件&#xff0c;创建MHA相关容器2.3 安装docker-compose软件&#xff08;若已安装&#xff0c;可忽略&#xff09;2.4 创建MHA容器2.5 主库131添加VIP 三、配置ProxySQL环境3.1 申请ProxySQL主机并…

hdu1512 zoj2334Monkey King(左偏树 + 并查集)

参考:http://blog.csdn.net/pi9nc/article/details/11827501 题目:https://vjudge.net/problem/HDU-1512 他的注释很详细. 题目大意&#xff1a;有n个猴子&#xff0c;一开始每个猴子只认识自己。每个猴子有一个力量值&#xff0c;力量值越大表示这个猴子打架越厉害。如果2个…

HDOJ 1512 Monkey King -- 左偏树

题目链接&#xff1a;http://acm.hdu.edu.cn/showproblem.php?pid1512 分析&#xff1a;左偏树应用。在结点中加入了parent指针和id字段&#xff0c;这样可以代替并查集。关于左偏树可以参考黄源河的论文《左偏树的特点及其应用》。 #include #include #include #include #in…

第 197 场周赛 leetcode 1512. 好数对的数目 1513. 仅含 1 的子串数 1514. 概率最大的路径

1512. 好数对的数目 直接算每个数即可 class Solution { public:int numIdenticalPairs(vector<int>& nums) {int mp[110]{0};memset(mp,0,sizeof(0));for(int i0;i<nums.size();i){mp[nums[i]];}long long ans0;for(int i0;i<100;i){ans(long long)mp[i]*(m…

一文看尽深度学习中的20种卷积(附源码整理和论文解读)

点击上方“计算机视觉工坊”&#xff0c;选择“星标” 干货第一时间送达 引言 卷积&#xff0c;是卷积神经网络中最重要的组件之一。不同的卷积结构有着不一样的功能&#xff0c;但本质上都是用于提取特征。比如&#xff0c;在传统图像处理中&#xff0c;人们通过设定不同的算子…