紧接上一篇——Google地图瓦片爬虫
clash节点自动切换
为了防止一个IP地址访问频率过快问题,自动切换clash的节点
def change_node(is_stop):while True:_r = requests.get("http://127.0.0.1:11053/proxies", headers=clash_headers, verify=False)# 这里匹配Lv开头的节点proxy_list = [proxy for proxy in _r.json()["proxies"]["XXX"]["all"] if proxy.startswith("Lv")]proxy = random.choice(proxy_list)payload = json.dumps({"name": proxy})# XXX替换为自己的分组名称requests.put("http://127.0.0.1:11053/proxies/XXX", headers=clash_headers, data=payload, verify=False)time.sleep(5)if is_stop.value:print("Finished.")break
多进程爬虫
- get_tile:用于爬取瓦片
- write_to_db:用于写入数据库
- change_node:用于切换clash节点
def main():is_stop = multiprocessing.Value("I", 0)db_path, db_name, tile_list = init_db()total = len(tile_list)print(total)# 创建任务队列data_queue = Queue()process_list = []p_number = 10step = total // p_number + 1for i in range(p_number + 1):process_list.append(Process(target=get_tile, args=(data_queue, db_path, db_name, tile_list[i * step:(i + 1) * step],)))process_list.append(Process(target=write_to_db, args=(data_queue, db_path, total, is_stop)))process_list.append(Process(target=change_node, args=(is_stop,)))for p in process_list:p.start()for p in process_list:p.join()
完整程序
下载地址:https://download.csdn.net/download/this_is_id/90343579