python爬虫笔记（一）

文章目录

- html
- - 基础
  - - 标签和下划线
    - 无序列表和有序列表
    - 表格
    - - 加边框
  - html的属性
  - - a标签（网站）
    - - target属性
      - 换行线和水平分割线
    - 图片
    - - 设置宽高width，height
  - html区块——块元素与行内元素
  - - 块元素与行内元素
    - 块元素举例
    - - 行内元素举例
  - 表单
  - - from标签
    - - type属性
      - placeholder属性
      - value属性
      - span标签的应用
      - radio选项
    - label标签
    - - for属性
      - checkbox多选
      - submit属性
    - from标签
    - css
    - css三种导入方式
    - 选择器
    - 元素选择器、类选择器、ID选择器、通用选择器
    - 子元素选择器（嵌套），后代选择器
    - 后代选择器示例
    - 伪类选择器
  - css属性

在这里插入图片描述

python">import requests
response = requests.get( "http://books.toscrape.com/")
if response.ok:print(response.text)
else:print("请求失败")

有User-Agent

python">import requests
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.62"
}
response = requests.get("https://movie.douban.com/top250",headers=headers)
print(response.status_code)

python">from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/116.0.0.0"
}
response = requests.get("http://books.toscrape.com/", headers=headers)
content = response.text
soup = BeautifulSoup(content,"html.parser")
all_prices = soup.find_all("p",attrs={"class": "price_color"})
# for price in all_prices:
#     print(price)
for price in all_prices:# print(price.text)# print(price)# print(price.string)print(price.string[2:])

python">from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/116.0.0.0"
}
for start_num in range(0,250,25):response = requests.get(f"https://movie.douban.com/top250?start={start_num}", headers=headers)html = response.textsoup = BeautifulSoup(html,"html.parser")all_titles = soup.findAll( "span",attrs={"class":"title"})for title in all_titles:title_string=title.stringif "/" not in title_string:print(title_string)

在这里插入图片描述

html

基础

标签和下划线

在这里插入图片描述

无序列表和有序列表

在这里插入图片描述

表格

在这里插入图片描述

加边框

在这里插入图片描述

html的属性

在这里插入图片描述

a标签（网站）

在这里插入图片描述

target属性

在这里插入图片描述

在这里插入图片描述
1.self链接在当前窗口打开
2.blank链接在新窗口打开
3.parent链接在父窗口打开
4.top链接在顶层窗口打开

换行线和水平分割线

在这里插入图片描述

换行是br，水平分割线是hr

图片

在这里插入图片描述

设置宽高width，height

html区块——块元素与行内元素

块元素与行内元素

在这里插入图片描述

块元素举例

用于结构或布局
-------------------------------------------------------------a
在这里插入图片描述

------------------------------------------------------------------------a

行内元素举例

用于内联样式化文本，给文本的一部分用样式或标记
在这里插入图片描述

在这里插入图片描述

表单

from标签

type属性

input标签，type属性规定了input的类型

在这里插入图片描述

python">w3cschool.cn/html5/html5-input.html

可以在网站输入内容
在这里插入图片描述

placeholder属性

在这里插入图片描述

value属性

自动填写“请输入内容”
在这里插入图片描述

span标签的应用

在这里插入图片描述

radio选项

在这里插入图片描述

加入gender只能选一个，不加可以全部勾选

在这里插入图片描述

label标签

for属性

for一般与id绑定

password属性
在这里插入图片描述

checkbox多选

在这里插入图片描述

submit属性

在这里插入图片描述

python"><!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=html, initial-scale=1.0"><title>html练习</title>
</head>
<body><form><label>用户名:</label><input type="text" placeholder="请输入用户名"><br><br><label for="pwd">密码:</label><input type="password" id="pwd" placeholder="请输入密码"><br><br><label>性别:</label><input type="radio" name="gender">男<input type="radio" name="gender">女<input type="radio" name="gender">其他<br><br><label>爱好:</label><input type="checkbox" name="hobby">唱歌<input type="checkbox" name="hobby">跳舞<input type="checkbox" name="hobby">RAP<input type="checkbox" name="hobby">篮球<br><br><input type="submit"></form><form action="#"></form>
</body>
</html>

#需要为服务器，即api

from标签

提交后具体数据存到哪里
就是from中action属性

css

在这里插入图片描述

python"><!DOCTYPE html>
<html>
<head><style>p {color: blue; /* 将所有 price_color 类的文本改为蓝色 */font-size: 16px;}</style>
</head>
<body><p>这是一个应用css样式的文本</p>  
</body>
</html>

css三种导入方式

在这里插入图片描述

python"><!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=html, initial-scale=1.0"><title>html练习</title><!-- <link rel="stylesheet" href="./style.css"  --><style>p {color: blue; /* 将所有 p 标签的文本改为蓝色 */font-size: 16px;}h2 {color: green;}</style>
</head>
<body><p>这是一个应用css样式的文本</p>  <h1 style="color: red;">这是一个一级标题使用内联样式</h1><h2>这是一个二级标题，应用外部样式</h2><h3>这是一个三级标题，应用外部样式</h3>
</body>
</html>

选择器

在这里插入图片描述

元素选择器、类选择器、ID选择器、通用选择器

python"><!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>Document</title><style>h2{color:aqua;}  /* 元素选择器 */.highlight{background-color: yellow;}#header{font-size: 55px;}        /* id选择器 */*{font-family:'Franklin Gothic Medium', 'Arial Narrow', Arial, sans-serif;}/* 通用选择器 */</style> 
</head> 
<body><h1>不同类型的css选择器</h1><h2>这是一个元素选择器示例</h2><h3 class="highlight">这是一个类选择器示例</h3> <h3>这是另一个类选择器示例</h3><h4 id="header">这是一个id选择器示例 </h4></body>
</html>

在这里插入图片描述

子元素选择器（嵌套），后代选择器

python"><!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>Document</title><style>.father>.son{color:yellowgreen;}/* 子元素选择器 */</style> 
</head> 
<body>

在这里插入图片描述

后代选择器示例

python"><!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>Document</title><style>#header{font-size: 55px;}        /* id选择器 */.father>.son{color:yellowgreen;}/* 子元素选择器 */</style> 
</head> 
<body><div class="father"><p class="son">这是一个子元素选择器示例</p>
</div>
<div><p class="grandson">这是一个后代选择器示例</p>
</div>
</body>
</html>

在这里插入图片描述

后代包含子代，子代不包含孙子代，之所以后代选择器每变色是英文grandson不是子代

伪类选择器

python"><!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>Document</title><style>#header{font-size: 55px;}        /* id选择器 */#element:hover{background-color: purple;}</style> 
</head> 
<body><h3 id="element">这是一个伪类选择器示例</h3></body>
</html>