Python 爬虫示例
要素:
导入必要的库:
python
import requests
from bs4 import BeautifulSoup
发送 HTTP 请求:
python
url = "http://www.example.com"
response = requests.get(url)
解析 HTML 响应:
python
soup = BeautifulSoup(response.text, "html.parser")
提取数据:
python
# 查找所有包含 "class" 属性且值为 "article" 的元素
articles = soup.find_all("article", class_="article")
# 提取每篇文章的标题和内容
for article in articles:
title = article.find("h1").text
content = article.find("div", class_="content").text
存储或处理数据:
python
# 存储数据到数据库
import sqlite3
conn = sqlite3.connect("data.db")
c = conn.cursor()
c.execute("INSERT INTO articles (title, content) VALUES (?, ?)", (title, content))
conn.commit()
示例:
爬取文章标题和内容:
python
import requests
from bs4 import BeautifulSoup
url = "http://www.example.com/articles"
# 发送 HTTP 请求
response = requests.get(url)
# 解析 HTML 响应
soup = BeautifulSoup(response.text, "html.parser")
# 查找所有文章
articles = soup.find_all("article")
for article in articles:
# 提取标题和内容
title = article.find("h1").text
content = article.find("div", class_="content").text
# 打印标题和内容
print(f"Title: {title}")
print(f"Content: {content}")
结果:
console
Title: Article Title 1
Content: Article content 1...
Title: Article Title 2
Content: Article content 2...
...
![](https://img1.baidu.com/it/u=3158118116,1117653989&fm=253.jpg)