reader-lm：小模型 html转markdown

reader-lm：小模型 html转markdown

news/2024/12/22 23:30:28/

markdown_views prism-atom-one-dark">

参考：
https://huggingface.co/jinaai/reader-lm-0.5b

在线demo：
https://colab.research.google.com/drive/1wXWyj5hOxEHY6WeHbOwEzYAC0WB1I5uA#scrollTo=0mG9ISzHOuKK

输入网址：https://www.galaxy-geely.com/E5
结果：
在这里插入图片描述

代码：

# pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "jinaai/reader-lm-0.5b"device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)# example html content
html_content = "<html><body><h1>Hello, world!</h1></body></html>"messages = [{"role": "user", "content": html_content}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)print(input_text)inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08)print(tokenizer.decode(outputs[0]))

http://www.ppmy.cn/news/1528093.html

相关文章

repo sync 提示输入密码 git@git.xxx.com password

repo sync 提示输入密码 git@git.xxx.com password

异常现象一直提示输入密码 djqhuali-virtual-machine:~/mokar/test/project_test$ repo sync gitgit.huali-tec.coms password: Permission denied, please try again. gitgit.huali-tec.coms password:1：前置条件在使用repo sync之前，请确保 .xml…

阅读更多...

Oracle数据库中的Oracle Label Security是什么

Oracle数据库中的Oracle Label Security是什么

Oracle Label Security（OLS）是Oracle数据库中的一个强大特性，它提供了基于标签的行级安全性控制。通过OLS，组织可以实施细粒度的数据访问控制，确保用户只能访问他们被授权的数据。 Oracle Label Security的工作原理 …

阅读更多...

推荐几个高质量C/C++项目，全程干货没有废话！

推荐几个高质量C/C++项目，全程干货没有废话！

5个项目视频代码都打包好了，需要的朋友来文章底部获取每年的就业季都有很多同学惆怅，在校期间没有项目经历，简历一片空白，不知道该怎么写。所以今天为大家盘点了五个C/C项目，由浅入深，既可以作为求职简历…

阅读更多...

通往AGI的皇冠：逻辑推理能力

通往AGI的皇冠：逻辑推理能力

文章来自新浪微博机器学习团队 AI Lab 负责人张俊林，OpenAI发布新模型o1之后的一些观点，很有启发： GPT 4o本质上是要探索不同模态相互融合的大一统模型应该怎么做的问题，对于提升大模型的智力水平估计帮助不大；而o1本…

阅读更多...

蓝桥杯4. Fizz Buzz 经典问题

蓝桥杯4. Fizz Buzz 经典问题

题目描述给定一个整数 NN，从 1 到 NN 按照下面的规则返回每个数： 如果这个数被 3 整除，返回 Fizz。如果这个数被 5 整除，返回 Buzz如果这个数能同时被 3 和 5 整除，返回 FizzBuzz。如果这个数既不能被 3 也不能被 5…

阅读更多...

docker-compose 部署 flink [支持pyflink]

docker-compose 部署 flink [支持pyflink]

下载 flink 镜像 [rootlocalhost ~]# docker pull flink Using default tag: latest latest: Pulling from library/flink 762bedf4b1b7: Pull complete 95f9bd9906fa: Pull complete a880dee0d8e9: Pull complete 8c5deab9cbd6: Pull complete 56c142282fae: Pull comple…

阅读更多...

PCIe扫盲（11）

PCIe扫盲（11）

系列文章目录 PCIe扫盲（一） PCIe扫盲（二） PCIe扫盲（三） PCIe扫盲（四） PCIe扫盲（五） PCIe扫盲（六） PCIe扫盲（七&#xff09…

阅读更多...

win10怎么配置dnat规则，访问win10的网口A ip的6443端口，映射到1.1.1.1的6443端口去

win10怎么配置dnat规则，访问win10的网口A ip的6443端口，映射到1.1.1.1的6443端口去

在Windows 10上配置DNAT（Destination Network Address Translation）规则，可以使用Windows自带的netsh命令来实现。以下是具体步骤： 打开命令提示符（以管理员身份运行）： 按 Win X，…

阅读更多...

最新文章