使用 Scrapy 的 ImagesPipeline 下载图片（使用合四一上尺工凡等字样为唱名的是）-eolink官网

使用 Scrapy 的 ImagesPipeline 下载图片（使用合四一上尺工凡等字样为唱名的是）

下载百度贴吧-动漫壁纸吧所有图片

定义item

Spider

spider 只需要得到图片的url，必须以列表的形式给管道处理

class PictureSpiderSpider(scrapy.Spider):

name = 'picture_spider'

allowed_domains = ['tieba.baidu.com']

start_urls = ['https://tieba.baidu.com/f?kw=%E5%8A%A8%E6%BC%AB%E5%A3%81%E7%BA%B8']

def parse(self, response):

# 贴吧中一页帖子的ID和标题

theme_urls = re.findall(r'',

response.text, re.S)

for theme in theme_urls:

# 帖子的url

theme_url = 'https://tieba.baidu.com/p/' + theme[0]

# 进入各个帖子

yield scrapy.Request(url=theme_url, callback=self.parse_theme)

# 贴吧下一页的url

next_url = re.findall(

r'下一页>',

response.text, re.S)

if next_url:

next_url = self.start_urls[0] + '&pn=' + next_url[0]

yield scrapy.Request(url=next_url)

# 下载每个帖子里的所有图片

def parse_theme(self, response):

item = PostBarItem()

# 每个贴子一页图片的缩略图的url

pic_ids = response.xpath('//img[@class="BDE_Image"]/@src').extract()

# 用列表来装图片的url

item['pic_urls'] = []

for pic_url in pic_ids:

# 取出每张图片的名称

item['pic_name'] = pic_url.split('/')[-1]

# 图片URL

url = 'http://imgsrc.baidu.com/forum/pic/item/' + item['pic_name']

# 将url添加进列表

item['pic_urls'].append(url)

# 将item交给pipelines下载

yield item

# 下完一页图片后继续下一页

next_url = response.xpath('//a[contains(text(),"下一页")]/@href').extract_first()

if next_url:

yield scrapy.Request('https://tieba.baidu.com' + next_url, callback=self.parse_theme)

ImagesPipeline

from scrapy.pipelines.images import ImagesPipeline

继承ImagesPipeline，重写get_media_requests()和file_path()方法

from scrapy.pipelines.images import ImagesPipeline

import scrapy

class PostBarPipeline(ImagesPipeline):

# 需要headers的网站，再使用

headers = {

'User-Agent': '',

'Referer': '',

}

def get_media_requests(self, item, info):

for pic_url in item['pic_urls']:

# 为每个url生成一个Request

yield scrapy.Request(pic_url)

# 需要请求头的时候，添加headers参数

# yield scrapy.Request(pic_url, headers=self.headers)

def file_path(self, request, response=None, info=None):

# 重命名(包含后缀名)，若不重写这函数，图片名为哈希

pic_path = request.url.split('/')[-1]

return pic_path

settings文件

运行结果

Python接口自动化之文件上传/下载接口怎么实现

340 2022-06-19

使用 Scrapy 的 ImagesPipeline 下载图片（使用合四一上尺工凡等字样为唱名的是）

java中的接口是类吗

Spring中的aware接口详情

Python接口自动化之文件上传/下载接口怎么实现

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

遇到百度网址安全中心提醒您该页面可能存在钓鱼欺诈信息

软件接口设计怎么做？前后端分离软件接口设计思路

使用 Scrapy 的 ImagesPipeline 下载图片（使用合四一上尺工凡等字样为唱名的是）

微信扫一扫：分享

推荐文章

最近发表

热评文章