python爬虫--------Beautiful Soup 案列(二十一天)

embedded/2025/3/13 14:44:50/

🎈🎈作者主页: 喔的嘛呀🎈🎈
🎈🎈所属专栏:python爬虫学习🎈🎈
✨✨谢谢大家捧场,祝屏幕前的小伙伴们每天都有好运相伴左右,一定要天天开心哦!✨✨ 

目录

一、股票信息提取(http://quote.stockstar.com/)

1、首先打开网页

2、我们选取信息技术行业的股票,点进去。然后先复制网页地址http://quote.stockstar.com/stock/industry_I.shtml

3、然后点点击键盘上的F12打开开发工具分析网页结构,开始定位要爬取的数据对应的网页结构

4、提取定位的网页结构元素进行分析

5、分析完了,开写

(1)使用Beautiful Soup解析HTML代码:

(2)找到包含股票信息的表格:

(3)提取表格中的行数据:

(4)遍历每一行,提取股票信息:

(6)完整代码

6、结果演示

二、提取新浪新闻热榜新闻

三、结语


一、股票信息提取(http://quote.stockstar.com/)

1、首先打开网页

2、我们选取信息技术行业的股票,点进去。然后先复制网页地址http://quote.stockstar.com/stock/industry_I.shtml

3、然后点点击键盘上的F12打开开发工具分析网页结构,开始定位要爬取的数据对应的网页结构

上图可以看出爬取的数据都在box box_02这个盒子中

4、提取定位的网页结构元素进行分析

<div class="box box02"><div class="bg_box" id="dataTable"><div class="con">**//这里是股票所对应的表格 需要提取**<table width="100%" border="0" cellpadding="0" cellspacing="0" class="trHover" id="table1"><thead class="tbody_right"><tr><td width="6%" class="align_center"><a href="javascript:void(0)" sort="0" target="_self" class="newup">代码</a></td><td width="24%" class="align_center">简称</td><td width="17.5%" class="align_right"><a href="javascript:void(0)" sort="1" target="_self">流通市值(万元)</a></td><td width="17.5%" class="align_right"><a href="javascript:void(0)" sort="2" target="_self">总市值(万元)</a></td><td width="17.5%" class="align_right"><a href="javascript:void(0)" sort="3" target="_self">流通股本(万元)</a></td><td width="17.5%" class="align_right"><a href="javascript:void(0)" sort="4" target="_self">总股本(万元)</a></td></tr></thead><tbody class="tbody_right" id="datalist">//start(从start到end是要每一行对应的股票信息,我们进行遍历,最后打印出来就好了)<tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000004.shtml">000004</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000004.shtml">国华网安</a></td><td class="align_right ">190063.58</td><td class="align_right ">199232.32</td><td class="align_right ">12628.81</td><td class="align_right ">13238.03</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000032.shtml">000032</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000032.shtml">深桑达A</a></td><td class="align_right ">1166377.73</td><td class="align_right ">2058568.25</td><td class="align_right ">64476.38</td><td class="align_right ">113795.92</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000158.shtml">000158</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000158.shtml">常山北明</a></td><td class="align_right ">1224653.26</td><td class="align_right ">1235730.73</td><td class="align_right ">158428.62</td><td class="align_right ">159861.67</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000409.shtml">000409</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000409.shtml">云鼎科技</a></td><td class="align_right ">364104.22</td><td class="align_right ">581661.43</td><td class="align_right ">42337.70</td><td class="align_right ">67635.05</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000503.shtml">000503</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000503.shtml">国新健康</a></td><td class="align_right ">905071.39</td><td class="align_right ">991065.37</td><td class="align_right ">89877.99</td><td class="align_right ">98417.61</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000555.shtml">000555</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000555.shtml">神州信息</a></td><td class="align_right ">1166774.74</td><td class="align_right ">1170929.32</td><td class="align_right ">97231.23</td><td class="align_right ">97577.44</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000676.shtml">000676</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000676.shtml">智度股份</a></td><td class="align_right ">914176.97</td><td class="align_right ">915255.50</td><td class="align_right ">127500.28</td><td class="align_right ">127650.70</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000682.shtml">000682</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000682.shtml">东方电子</a></td><td class="align_right ">1222620.92</td><td class="align_right ">1222743.03</td><td class="align_right ">134059.31</td><td class="align_right ">134072.70</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000839.shtml">000839</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000839.shtml">ST国安</a></td><td class="align_right ">764366.14</td><td class="align_right ">764366.14</td><td class="align_right ">391982.64</td><td class="align_right ">391982.64</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000889.shtml">000889</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000889.shtml">ST中嘉</a></td><td class="align_right ">148744.29</td><td class="align_right ">160105.78</td><td class="align_right ">86984.97</td><td class="align_right ">93629.11</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000948.shtml">000948</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000948.shtml">南天信息</a></td><td class="align_right ">528632.72</td><td class="align_right ">539879.79</td><td class="align_right ">38614.52</td><td class="align_right ">39436.07</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000971.shtml">000971</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000971.shtml">ST高升</a></td><td class="align_right ">134618.70</td><td class="align_right ">166725.83</td><td class="align_right ">84665.85</td><td class="align_right ">104859.01</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/000997.shtml">000997</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/000997.shtml">新 大 陆</a></td><td class="align_right ">1788947.49</td><td class="align_right ">1798885.70</td><td class="align_right ">102636.12</td><td class="align_right ">103206.29</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002063.shtml">002063</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002063.shtml">远光软件</a></td><td class="align_right ">943962.50</td><td class="align_right ">1024941.65</td><td class="align_right ">175457.71</td><td class="align_right ">190509.60</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002065.shtml">002065</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002065.shtml">东华软件</a></td><td class="align_right ">1625588.07</td><td class="align_right ">1795070.13</td><td class="align_right ">290283.58</td><td class="align_right ">320548.24</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002093.shtml">002093</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002093.shtml">国脉科技</a></td><td class="align_right ">713868.17</td><td class="align_right ">714317.50</td><td class="align_right ">100686.62</td><td class="align_right ">100750.00</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002095.shtml">002095</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002095.shtml">生 意 宝</a></td><td class="align_right ">392665.13</td><td class="align_right ">394243.20</td><td class="align_right ">25170.84</td><td class="align_right ">25272.00</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002123.shtml">002123</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002123.shtml">梦网科技</a></td><td class="align_right ">650474.06</td><td class="align_right ">757978.52</td><td class="align_right ">68687.86</td><td class="align_right ">80039.97</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002131.shtml">002131</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002131.shtml">利欧股份</a></td><td class="align_right ">1309578.84</td><td class="align_right ">1515714.58</td><td class="align_right ">584633.41</td><td class="align_right ">676658.29</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002148.shtml">002148</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002148.shtml">北纬科技</a></td><td class="align_right ">249222.00</td><td class="align_right ">308537.10</td><td class="align_right ">45148.91</td><td class="align_right ">55894.40</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002153.shtml">002153</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002153.shtml">石基信息</a></td><td class="align_right ">1121434.43</td><td class="align_right ">1913164.88</td><td class="align_right ">159976.38</td><td class="align_right ">272919.38</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002174.shtml">002174</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002174.shtml">游族网络</a></td><td class="align_right ">916019.16</td><td class="align_right ">917717.77</td><td class="align_right ">91419.08</td><td class="align_right ">91588.60</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002195.shtml">002195</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002195.shtml">岩山科技</a></td><td class="align_right ">1674280.22</td><td class="align_right ">1694554.91</td><td class="align_right ">565635.21</td><td class="align_right ">572484.77</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002197.shtml">002197</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002197.shtml">证通电子</a></td><td class="align_right ">491566.64</td><td class="align_right ">565213.89</td><td class="align_right ">53431.16</td><td class="align_right ">61436.29</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002212.shtml">002212</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002212.shtml">天融信</a></td><td class="align_right ">811854.37</td><td class="align_right ">823376.63</td><td class="align_right ">116813.58</td><td class="align_right ">118471.46</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002230.shtml">002230</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002230.shtml">科大讯飞</a></td><td class="align_right ">10642815.24</td><td class="align_right ">11280510.86</td><td class="align_right ">218448.59</td><td class="align_right ">231537.58</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002232.shtml">002232</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002232.shtml">启明信息</a></td><td class="align_right ">637335.59</td><td class="align_right ">637335.59</td><td class="align_right ">40854.85</td><td class="align_right ">40854.85</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002235.shtml">002235</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002235.shtml">安妮股份</a></td><td class="align_right ">319414.68</td><td class="align_right ">334413.21</td><td class="align_right ">55357.83</td><td class="align_right ">57957.23</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002238.shtml">002238</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002238.shtml">天威视讯</a></td><td class="align_right ">898866.26</td><td class="align_right ">898866.26</td><td class="align_right ">80255.92</td><td class="align_right ">80255.92</td></tr><tr><td class="align_center select"><a href="//stock.quote.stockstar.com/002247.shtml">002247</a></td><td class="align_center"><a href="//stock.quote.stockstar.com/002247.shtml">聚力文化</a></td><td class="align_right ">108114.83</td><td class="align_right ">142946.17</td><td class="align_right ">64354.07</td><td class="align_right ">85087.00</td></tr>//end</tbody><tbody><tr id="has_fyStock_data" class="noSelect no_trHover"><td colspan="12" class="time notSelect"><span class="fl" id="latesttime_span">数据时间:2024-03-29</span><div class="fenye fr" id="divPageControl1">共<strong>422</strong>条记录<span><em>1</em></span><ahref="/stock/industry_I_0_0_2.html" target="_self"><em>2</em></a><ahref="/stock/industry_I_0_0_3.html" target="_self"><em>3</em></a><ahref="/stock/industry_I_0_0_4.html" target="_self"><em>4</em></a><ahref="/stock/industry_I_0_0_5.html" target="_self"><em>5</em></a><em>...</em><ahref="/stock/industry_I_0_0_15.html" target="_self"><em>15</em></a><ahref="/stock/industry_I_0_0_2.html" target="_self"class="n"><em>下一页</em></a>到第<input type="text" class="page_input"id="txtPageNumber"onkeydown="if (event.keyCode == 13){PagedControl.GoToThePage('/stock/industry_I_0_0_{0}.html');return false;}">页<ahref="javascript:void(0);"onclick="PagedControl.GoToThePage('/stock/industry_I_0_0_{0}.html');return false;"><em>确定</em></a></div></td></tr></tbody></table></div></div>
</div>

5、分析完了,开写

(1)使用Beautiful Soup解析HTML代码:


import requests                                          
from bs4 import BeautifulSoup                            url = "<http://quote.stockstar.com/stock/industry_I.shtml>"
response = requests.get(url)                             
response.encoding = 'gbk'  # 设置编码为 gbk                   
soup = BeautifulSoup(response.text, 'html.parser')       

(2)找到包含股票信息的表格:


table = soup.find('table', class_='trHover')

(3)提取表格中的行数据:

rows = table.find_all('tr')

(4)遍历每一行,提取股票信息:

pythonCopy code
for row in rows[1:]:  # Skip the header rowcells = row.find_all('td')if len(cells) >= 6:  # Ensure there are enough cellsstock_code = cells[0].text.strip()stock_name = cells[1].text.strip()circulation_market_value = cells[2].text.strip()total_market_value = cells[3].text.strip()circulation_stock = cells[4].text.strip()total_stock = cells[5].text.strip()print(f"股票代码: {stock_code}, 股票名称: {stock_name}, 流通市值: {circulation_market_value}, 总市值: {total_market_value}, 流通股本: {circulation_stock}, 总股本: {total_stock}")

(6)完整代码

import requests                                                                
from bs4 import BeautifulSoup                                                  url = "<http://quote.stockstar.com/stock/industry_I.shtml>"                      
response = requests.get(url)                                                   
response.encoding = 'gbk'  # 设置编码为 gbk   不设置这个编码会乱码                                      
soup = BeautifulSoup(response.text, 'html.parser')                             table = soup.find('table', class_='trHover')                                   
rows = table.find_all('tr')                                                    for row in rows[1:]:  # Skip the header row                                    cells = row.find_all('td')                                                 if len(cells) >= 6:  # Ensure there are enough cells                       stock_code = cells[0].text.strip()                                     stock_name = cells[1].text.strip()                                     circulation_market_value = cells[2].text.strip()                       total_market_value = cells[3].text.strip()                             circulation_stock = cells[4].text.strip()                              total_stock = cells[5].text.strip()                                    print(f"股票代码: {stock_code}, 股票名称: {stock_name}, "                      f"流通市值: {circulation_market_value}, 总市值: {total_market_value}, " f"流通股本: {circulation_stock}, 总股本: {total_stock}")                

这样就可以提取出表格中的股票信息了。如果你有其他需求或者需要进一步解析页面,请提供更多详细信息。

6、结果演示

二、提取新浪新闻热榜新闻

还是给以上的步骤一样

打开网页点F12提取要爬取数据的页面结构代码,分析,写出代码。

就是提取蓝色部分的网页结构代码

<div class="blk_main_card"><!-- 热榜 -->//blk_main_li为父元素<div class="blk_main_li" tab-type="tab-cont"><ul class="uni-blk-list02 list-a list-0427" style="padding-top: 7px;"><li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711786917>" data="0" target="_blank">小米汽车遭遇上百余名消费者投诉</a></li><li><a href="<https://sinanews.sina.cn/native_page/quanzi_914931027323416577.html>" data="1" target="_blank">偷点外卖就不要写真实姓名了</a></li><li id="hot_list_ad"><a id="hotlist_index_3" href="<https://s.weibo.com/weibo?q=%E5%93%AA%E4%BA%9B%E4%BA%BA%E5%AE%B9%E6%98%93%E5%BE%97%E7%99%BE%E6%97%A5%E5%92%B3>" data="2" target="_blank">哪些人容易得百日咳</a><ins class="sinaads sinaads-fail" id="sinaads-right-hotlist" data-ad-pdps="PDPS000000067800" data-ad-width="360" data-ad-height="26" data-ad-type="embed" style="display:none" data-ad-status="done"></ins><script>(sinaads = window.sinaads || []).push({params: {element: document.getElementById("PDPS000000067800"),sinaads_success_handler:function () {var ads = document.getElementById("sinaads-right-hotlist");var _news= document.getElementById("hotlist_index_3");var hot_list_ad= document.getElementById("hot_list_ad")_news.style.display="none";ads.style.display= "block";hot_list_ad.classList.add("hotlist_have_ad")},sinaads_fail_handler: function () {console.log('sinaads_fail_handler')}}})</script></li>//热榜新闻都被包含在li标签中<li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711790585>" data="3" target="_blank">杭州东站</a></li><li><a href="<https://sinanews.sina.cn/native_page/quanzi_914336910352965633.html>" data="4" target="_blank">2024中国网络媒体论坛</a></li><li><a href="<https://sinanews.sina.cn/native_page/quanzi_914966334487650305.html>" data="5" target="_blank">雷军能不能生产一下相机</a></li><li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711790450>" data="6" target="_blank">医院取精室里都有些什么</a></li><li><a href="<https://k.sina.com.cn/article_5756451891_m1571c7c3303301b0u4.html?from=news&amp;subch=onews>" data="7" target="_blank">警方辟谣面具男用病毒针扎人</a></li><li><a href="<https://finance.sina.cn/2024-03-30/detail-inaqawts0171984.d.html>" data="8" target="_blank">殡葬用品店否认南通烧纸普遍2层楼高</a></li><li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711790306>" data="9" target="_blank">花间令女性群像没有郑合惠子</a></li><li><a class="fe661" href="<https://sinanews.sina.cn/h5/top_news_list.d.html>" data="10" target="_blank">点击查看更多实时热点</a></li></ul></div></div>

分析以后写出代码

import requests
from bs4 import BeautifulSoup# 网页 URL
url = '<https://news.sina.com.cn/>'# 发送 GET 请求并获取响应
response = requests.get(url)# 使用 BeautifulSoup 解析 HTML 内容
soup = BeautifulSoup(response.content, 'html.parser')# 找到热榜新闻所在的父元素
hot_news_parent = soup.find('div', class_='blk_main_li')# 找到所有热榜新闻条目
hot_news_list = hot_news_parent.find_all('li')# 遍历热榜新闻列表并提取信息
for news_item in hot_news_list:# 提取新闻标题和链接news_title = news_item.a.text.strip()  # 获取新闻标题文本并去除首尾空格news_link = news_item.a['href']  # 获取新闻链接# 打印新闻标题和链接print(f"标题: {news_title}\\n链接: {news_link}\\n")

结果:

三、结语

通过今天的案例练习和实践,我们可以进一步加深对Beautiful Soup的理解和运用。在进行网页爬取时,记得遵守网站的爬虫规则,不要频繁请求或者过度抓取,以免对网站造成影响。同时,保持学习的态度,不断探索和尝试新的技术和方法,提高自己的爬虫能力和效率。不管做什么都一样,祝兄弟姐妹们在自己的道路上取得更多的成就!


http://www.ppmy.cn/embedded/1833.html

相关文章

PyTorch环境配置问题

为什么深度学习都是用英伟达的显卡&#xff1f; 首先我们需要了解什么是CUDA&#xff1f; CUDA&#xff08;Compute Unified Device Architecture&#xff09;&#xff0c;是显卡厂商 NVIDIA 推出的运算平台。 CUDA就类似于编程语言&#xff0c;开发者和显卡通过CUDA进行交流…

Tomcat 启动闪退问题解决方法

在开发中&#xff0c;当使用 IntelliJ IDEA 遇到 Tomcat 启动后闪退的问题时&#xff0c;不要慌&#xff0c;可以尝试以下方法来解决&#xff1a; 1. 检查 Java 版本&#xff1a; 确保你安装的 Tomcat 版本与 IntelliJ IDEA 中配置的 JDK 版本兼容。Tomcat 8 要求 JDK 7 或更…

HTTP协议中的请求方法及其在前端的应用

简介&#xff1a; HTTP&#xff08;Hypertext Transfer Protocol&#xff09;是用于在网络上传输超文本的协议&#xff0c;定义了多种请求方法&#xff0c;用于指定客户端对服务器资源的操作方式。。 1. GET请求 GET请求用于从服务器获取资源&#xff0c;是最常见的请求方法之…

基于springboot实现车辆管理系统设计项目【项目源码+论文说明】计算机毕业设计

基于springboot实现车辆管理系统演示 摘要 随着信息技术在管理上越来越深入而广泛的应用&#xff0c;管理信息系统的实施在技术上已逐步成熟。本文介绍了车辆管理系统的开发全过程。通过分析车辆管理系统管理的不足&#xff0c;创建了一个计算机管理车辆管理系统的方案。文章介…

固体矿产资源储量分类GBT17766-2020

1999分类标准采用三轴体系划分资源量与处理&#xff0c;表达复杂、经济意义划分过细、实用性不强 虽然不再采用”三轴“表达方式&#xff0c;但依然考虑地质可靠程度、经济意义、可行性评价 矿产资源勘查&#xff1a;通常依靠地球科学知识&#xff0c;运用地质填图&#xff0…

关于“使用java中的二维矩阵方法生成二维码“ 以及 “Java加载外部字体文件时出错的原因“

生成二维码 铁铁们,这两日写了一个导出二维码的接口,要求有一个是在二维码下方生成字体,现在奉上生成二维码的代码: controller层 Operation(summary "导出机构二维码",description "导出机构二维码")GetMapping("/orgCode")public void getO…

塔面板php7.37.4版本不支持ZipArchive手工安装扩展方法

centos 7 宝塔面板安装的PHP7.3和7.4默认已经不带zip扩展&#xff0c;要手工安装zip扩展首先需要安装libzip&#xff0c; yum -y install libzip 方法如下&#xff1a; 宝塔面板php7.3版本在SSH命令行界面执行以下语句&#xff1a; cd /www/server/php/73/src/ext/zip/ /ww…

✯✯✯宁波 IATF16949 认证:助力汽车企业迈向卓越✯✯✯

&#x1f308;&#x1f308;&#x1f308;宁波IATF16949认证&#xff1a;&#x1f49d;助力汽车企业迈向卓越&#x1f497; &#x1f575;️‍♂️宁波&#xff0c;这座繁华的&#x1f98a;港口城市&#xff0c;不仅以其&#x1f42f;独特的地理位置和丰富的&#x1f54a;️历史…