当前位置: 首页 > news >正文

数据获取

 小红书单个帖子的图片保存到本地

import requests  
import re
import osif not os.path.exists('img'):os.mkdir('img')headers = {'cookie':'abRequestId=5e194f90-2bfd-585c-9fb2-713bd6df6f77; xsecappid=xhs-pc-web; a1=197b07f1d6a4g2nxvwx0wmu3qbrmi34rcwoc34t7250000269640; webId=da2bebd1ab71b8ece8eae5a8fd5de8c5; gid=yjWD8WiJdJSdyjWD8WiyfE49K04kJIChEC8ElWJ67qADFS2863q4Fq888JKjK488KKjWY2q2; acw_tc=0a00d49317510155369181162e4aad58e7ed63ff6b41170ad226c54d8a5e9b; webBuild=4.68.0; web_session=040069b92a168371f895ef286b3a4bc2a4ab56; loadts=1751015741844; websectiga=9730ffafd96f2d09dc024760e253af6ab1feb0002827740b95a255ddf6847fc8; sec_poison_id=4f647338-d155-4f9a-898c-b865ee36a1ce','referer':'https://www.xiaohongshu.com/user/profile/610dfa37000000000101d41d?xsec_token=ABP8LcWXwugkR5oQ8uZ0mjGEMHvgi_rbK6e_n6v22wQpM=&xsec_source=pc_note','user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36'
}url = 'https://www.xiaohongshu.com/explore/67958c98000000002a00e6a9?xsec_token=ABYXrvCI5tqlTaTgrn5w86OSaVPFq8ONY6XjTmQLUDNoc=&xsec_source=pc_user'response = requests.get(url=url, headers=headers)html = response.texttitle = re.findall('<meta name="og:title" content="(.*?)">', html)[0]
print(title)
img_list = re.findall('<meta name="og:image" content="(.*?)">',html)
num = 1
for img in img_list:print(img)img_content = requests.get(url=img).contentwith open('img\\' + title + str(num) + '.jpg', 'wb') as f:f.write(img_content)num += 1

 如下图所示:

 

 多页帖子采集:

import requests  
import re
import os
from DrissionPage import ChromiumPageif not os.path.exists('img'):os.mkdir('img')headers = {'cookie':'abRequestId=5e194f90-2bfd-585c-9fb2-713bd6df6f77; xsecappid=xhs-pc-web; a1=197b07f1d6a4g2nxvwx0wmu3qbrmi34rcwoc34t7250000269640; webId=da2bebd1ab71b8ece8eae5a8fd5de8c5; gid=yjWD8WiJdJSdyjWD8WiyfE49K04kJIChEC8ElWJ67qADFS2863q4Fq888JKjK488KKjWY2q2; acw_tc=0a00d49317510155369181162e4aad58e7ed63ff6b41170ad226c54d8a5e9b; webBuild=4.68.0; web_session=040069b92a168371f895ef286b3a4bc2a4ab56; loadts=1751015741844; websectiga=9730ffafd96f2d09dc024760e253af6ab1feb0002827740b95a255ddf6847fc8; sec_poison_id=4f647338-d155-4f9a-898c-b865ee36a1ce','referer':'https://www.xiaohongshu.com/user/profile/610dfa37000000000101d41d?xsec_token=ABP8LcWXwugkR5oQ8uZ0mjGEMHvgi_rbK6e_n6v22wQpM=&xsec_source=pc_note','user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36'
}dp = ChromiumPage()
dp.listen.start('web/v1/search/notes')
dp.get('https://www.xiaohongshu.com/search_result/?keyword=%25E5%25A3%2581%25E7%25BA%25B8&source=web_explore_feed&type=51')r = dp.listen.wait()json_data = r.response.bodyitems = json_data['data']['items']
for item in items:try:id_ = item['id']token = item['xsec_token']url = f'https://www.xiaohongshu.com/explore/{id_}?xsec_token={token}&xsec_source=pc_user'print(url)response = requests.get(url=url, headers=headers)html = response.textold_title = re.findall('<meta name="og:title" content="(.*?)">', html)[0]title = re.sub(r'[\\/:*?"<>|\n]', '', old_title)print(title)img_list = re.findall('<meta name="og:image" content="(.*?)">',html)num = 1for img in img_list:print(img)img_content = requests.get(url=img).contentwith open('img\\' + title + str(num) + '.jpg', 'wb') as f:f.write(img_content)num += 1except Exception as e:print(e)

 

http://www.lqws.cn/news/538921.html

相关文章:

  • word中如何保存高清图片,并保存为高质量的pdf文件(图像不失真)
  • 【Linux】基础开发工具(2)
  • 架构轻巧的kokoro 文本转语音模型
  • LeetCode 2302.统计得分小于K的子数组数目
  • Docker 入门教程(二):Docker 的基本原理
  • 大厂测开实习和小厂开发实习怎么选
  • python pandas数据清洗
  • NebulaGraph 图数据库介绍
  • 抖音图文带货和短视频带货有什么区别
  • Nginx配置文件介绍和基本使用
  • 面试150 文本左右对齐
  • 2-深度学习挖短线股-3-训练数据计算
  • mysql无法启动的数据库迁移
  • 【办公类-105-01】20250626 托小班报名表-条件格式-判断双胞胎EXCLE
  • Python 中 `for` 循环与 `while` 循环的实际应用区别:实例解析
  • 【NLP】使用 LangGraph 构建 RAG 的Research Multi-Agent
  • FFMpeg的AVFrame数据格式解析
  • C++(模板与容器)
  • 重定向攻击与防御
  • AI+时代已至|AI人才到底该如何培育?
  • AI编程工具深度对比:腾讯云代码助手CodeBuddy、Cursor与通义灵码
  • vscode ssh远程连接到Linux并实现免密码登录
  • 爬虫简单实操2——以贴吧为例爬取“某吧”前10页的网页代码
  • Spring Cloud Feign 整合 Sentinel 实现服务降级与熔断保护
  • [AI]从0到1通过神经网络训练模型
  • 每日算法刷题Day38 6.25:leetcode前缀和3道题,用时1h40min
  • 第七章:总结
  • 【RabbitMQ】多系统下的安装配置与编码使用(python)
  • Spring Task定时任务详解与实战应用
  • java中的anyMatch和allMatch方法