基于百度翻译的python爬虫示例

news/2025/3/22 20:04:31/

(今年java工作真难找啊,有广州java高级岗位招人的好心人麻烦推一下,拜谢。。)

花了一周时间,从零基础开始学习了python,学有所获之后,就总想爬些什么,不然感觉不得劲,所以花了一天时间整出了个百度翻译爬虫示例,主要卡点花在了找token、sign以及调试请求上。代码有点乱,毕竟是demo,但是功能是实现了的。

python">import requests
import js2py
import re
from urllib.parse import urlencodeurl = "https://fanyi.baidu.com/#zh/en/"
session  = requests.session()
headers = {'Content-Type': 'application/x-www-form-urlencoded','User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 16_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1',
}cookies = {'BAIDUID': '624820D8D9163F370A491E7CA70C23D4:SL=0:NR=10:FG=1',
}response = session.get(url,headers=headers,cookies=cookies)print(dict(response.cookies))with open('baidu.html', 'w') as f:f.write(response.content.decode())token_pattern = r"token:\s*'([a-f0-9]+)'"
token = re.search(token_pattern, response.content.decode()).group(1)gtk_pattern = "gtk:\s*'([^']+)'"
gtk = re.search(gtk_pattern, response.content.decode()).group(1)print(token)
print(gtk)# 获取sign
context = js2py.EvalJs()
public_js = ""
with open('public.js', 'r') as f:public_js += f.read()
context.execute(public_js)
context.wd = '好好学习,天天向上'
context.token = token
context.gtk = gtksug_response = session.post("https://fanyi.baidu.com/sug", data={'kw': context.wd}, headers=headers)
print(sug_response.json())context.execute("""function n(r, o) {for (var t = 0; t < o.length - 2; t += 3) {var e = o.charAt(t + 2);e = e >= "a" ? e.charCodeAt(0) - 87 : Number(e),e = "+" === o.charAt(t + 1) ? r >>> e : r << e,r = "+" === o.charAt(t) ? r + e & 4294967295 : r ^ e}return r}function a(r) {var a = r.length;a > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(a / 2) - 5, 10) + r.substr(-10, 10))var l = void 0, d = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);l = gtk;for (var m = l.split("."), S = Number(m[0]) || 0, s = Number(m[1]) || 0, c = [], v = 0, F = 0; F < r.length; F++) {var p = r.charCodeAt(F);128 > p ? c[v++] = p : (2048 > p ? c[v++] = p >> 6 | 192 : (55296 === (64512 & p) && F + 1 < r.length && 56320 === (64512 & r.charCodeAt(F + 1)) ? (p = 65536 + ((1023 & p) << 10) + (1023 & r.charCodeAt(++F)),c[v++] = p >> 18 | 240,c[v++] = p >> 12 & 63 | 128) : c[v++] = p >> 12 | 224,c[v++] = p >> 6 & 63 | 128),c[v++] = 63 & p | 128)}for (var w = S, A = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), b = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), D = 0; D < c.length; D++)w += c[D],w = n(w, A);return w = n(w, b),w ^= s,0 > w && (w = (2147483647 & w) + 2147483648),w %= 1e6,w.toString() + "." + (w ^ S)}var sign = a(wd)
""")print(context.sign)url = 'https://fanyi.baidu.com/basetrans'
data = {"query": context.wd,"from": "zh","to": "en","token": token,"sign": context.sign
}
encoded_data = urlencode(data)
print(cookies)
print(encoded_data)
headers = {'Content-Type': 'application/x-www-form-urlencoded','User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 16_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1',
}
# session请求会更改user-agent {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
print(session.headers)
response = requests.post(url, headers = headers,cookies=cookies,data=data, verify=True)
print(response.json())"""
wd=全家的执行结果:
{}
3d7980a56760ca30e97aeeeda8e8fc6d
320305.131321201
{'errno': 0, 'data': [{'k': '全家福', 'v': '(全家合影) a photograph of the whole family; (中餐菜名) ho'}, {'k': '全家团聚', 'v': '动. whole family gather'}], 'logid': 2318810217}
681757.951340
{'BAIDUID': '624820D8D9163F370A491E7CA70C23D4:SL=0:NR=10:FG=1'}
query=%E5%85%A8%E5%AE%B6&from=zh&to=en&token=3d7980a56760ca30e97aeeeda8e8fc6d&sign=681757.951340
{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
{'errno': 0, 'from': 'zh', 'to': 'en', 'trans': [{'dst': 'whole family', 'prefixWrap': 0, 'result': [[0, 'whole family', ['0|6'], [], ['0|6'], ['0|12']]], 'src': '全家'}], 'dict': {'symbols': [{'word_symbol': 'quán jiā', 'parts': [{'part_name': '名', 'means': [{'text': 'the whole family', 'word_mean': 'the whole family'}]}]}], 'word_name': '全家', 'from': 'green', 'word_means': ['the whole family']}, 'keywords': []}"""

最新版本python3.13不支持js2py模块,所以我切换到了3.8版本


http://www.ppmy.cn/news/1581212.html

相关文章

gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 上编译问题笔记

编译错误如下&#xff1a; In file included from /usr/include/glib-2.0/glib/glib-typeof.h:39, from /usr/include/glib-2.0/glib/gatomic.h:28, from /usr/include/glib-2.0/glib/gthread.h:32, from /usr/include/gl…

ChatGPT、DeepSeek、Grok 与大数据:智能 AI 在数据时代的角色与未来

&#x1f4dd;个人主页&#x1f339;&#xff1a;一ge科研小菜鸡-CSDN博客 &#x1f339;&#x1f339;期待您的关注 &#x1f339;&#x1f339; 1. 引言 随着大数据技术的飞速发展&#xff0c;人工智能&#xff08;AI&#xff09;成为处理海量数据的核心驱动力。ChatGPT、De…

【USTC 计算机网络】第二章:应用层 - DNS

本文介绍了互联网中的一个核心基础服务&#xff1a;域名系统&#xff08;DNS&#xff09;&#xff0c;从如何命名设备、如何完成名字到 IP 地址的转换、如何维护域名这三个问题逐步讲解了 DNS 的名字空间、名字服务器以及报文格式&#xff0c;最后简单介绍了 DNS 的攻击与防御手…

基于STM32电子钟闹钟数码管显示设计(Proteus仿真+程序+设计报告+原理图PCB+讲解视频)

基于STM32电子钟闹钟数码管显示设计 1.主要功能2.仿真设计3.程序设计4.设计报告5.原理图PCB6.实物图7.下载链接 基于STM32电子钟闹钟数码管显示设计(Proteus仿真程序设计报告原理图PCB讲解视频&#xff09; 仿真图proteus 8.9 程序编译器&#xff1a;keil 5 编程语言&#xf…

Dify:开源大模型应用开发平台全解析

从部署到实践&#xff0c;打造你的AI工作流 一、项目简介 Dify 是一款面向开发者和企业的开源大语言模型&#xff08;LLM&#xff09;应用开发平台&#xff0c;旨在降低AI应用开发门槛&#xff0c;让用户通过可视化界面快速构建、管理和部署基于大模型的智能应用。其名称寓意“…

深度学习:从零开始的DeepSeek-R1-Distill有监督微调训练实战(SFT)

原文链接&#xff1a;从零开始的DeepSeek微调训练实战&#xff08;SFT&#xff09; 微调参考示例&#xff1a;由unsloth官方提供https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynbhttps://colab.research.google.com/git…

Swagger2 使用教程

Swagger2 使用教程 Swagger&#xff08;现称为 OpenAPI Specification&#xff09;是一套用于描述、生成、消费和可视化 RESTful 风格 Web 服务的工具和规范。Swagger 2 是 OpenAPI 规范的一个重要版本&#xff0c;广泛应用于 API 的设计、文档化、测试和客户端代码生成。本文…

云原生周刊丨CIO 洞察:Kubernetes 解锁 AI 新纪元

开源项目推荐 DRANET DRANET 是由谷歌开发的 K8s 网络驱动程序&#xff0c;利用 K8s 的动态资源分配&#xff08;DRA&#xff09;功能&#xff0c;为高吞吐量和低延迟应用提供高性能网络支持。它旨在优化资源管理&#xff0c;确保 K8s 集群中的网络资源能够按需高效分配。DRA…