小说下载脚本（写小说脚本）-eolink官网

下载与 chome 浏览器版本一致的 chromedriver, chromedriver 国内下载镜像

chromedriver.exe 复制到 python 的scripts目录中, 比如 C:\Anaconda3\Scripts\

是一个不断完善的过程, 所以, 最后一个下载脚本是最通用, 最完美的.

from selenium import webdriverweb = webdriver.Chrome()full_text="小说:穿越种田之将门妻"full_text=full_text+"\n" +"\n" +"\n"home_url=" #39start_page_id=0for i in range(chapter_start,chapter_end+1): page_id=i+start_page_id url=home_url+str(page_id)+".html" #print("第"+str(i)+"章") full_text=full_text+"\n" +"\n" +"\n" +"======================"+"\n"+"第"+str(i)+"章"+ "\n" web.get(url) #

content_tag = web.find_element_by_id("content") #content_tag = web.find_element_by_class_name("panel panel-default panel-readcontent") content = content_tag.text full_text=full_text+contentprint(full_text)web.close()

================================

从列表也提取单章url, 然后下载单章文本

================================

#========================================# 方法1: 数字转中文, 有缺陷,比如: 10将转成一零#========================================def num_to_char(num): """数字转中文""" num=str(num) new_str="" num_dict={"0":u"零","1":u"一","2":u"二","3":u"三","4":u"四","5":u"五","6":u"六","7":u"七","8":u"八","9":u"九"} listnum=list(num) # print(listnum) shu=[] for i in listnum: # print(num_dict[i]) shu.append(num_dict[i]) new_str="".join(shu) # print(new_str) return new_str#========================================# 方法2: 数字转中文, 比较完美#========================================# -------------------------------------------------------------------------------# Name: num2chinese# Author: yunhgu# Date: 2021/8/24 14:51# Description:# -------------------------------------------------------------------------------_MAPPING = (u'零', u'一', u'二', u'三', u'四', u'五', u'六', u'七', u'八', u'九',)_P0 = (u'', u'十', u'百', u'千',)_S4, _S8, _S16 = 10 ** 4, 10 ** 8, 10 ** 16_MIN, _MAX = 0, 9999999999999999class NotIntegerError(Exception): passclass OutOfRangeError(Exception): passclass Num2Chinese: def convert(self, number: int): """ :param number: :return:chinese number """ return self._to_chinese(number) def _to_chinese(self, num): if not str(num).isdigit(): raise NotIntegerError(u'%s is not a integer.' % num) if num < _MIN or num > _MAX: raise OutOfRangeError(u'%d out of range[%d, %d)' % (num, _MIN, _MAX)) if num < _S4: return self._to_chinese4(num) elif num < _S8: return self._to_chinese8(num) else: return self._to_chinese16(num) @staticmethod def _to_chinese4(num): assert (0 <= num < _S4) if num < 10: return _MAPPING[num] else: lst = [] while num >= 10: lst.append(num % 10) num = num // 10 lst.append(num) c = len(lst) # 位数 result = u'' for idx, val in enumerate(lst): if val != 0: result += _P0[idx] + _MAPPING[val] if idx < c - 1 and lst[idx + 1] == 0: result += u'零' return result[::-1].replace(u'一十', u'十') def _to_chinese8(self, num): assert (num < _S8) to4 = self._to_chinese4 if num < _S4: return to4(num) else: mod = _S4 high, low = num // mod, num % mod if low == 0: return to4(high) + u'万' else: if low < _S4 // 10: return to4(high) + u'万零' + to4(low) else: return to4(high) + u'万' + to4(low) def _to_chinese16(self, num): assert (num < _S16) to8 = self._to_chinese8 mod = _S8 high, low = num // mod, num % mod if low == 0: return to8(high) + u'亿' else: if low < _S8 // 10: return to8(high) + u'亿零' + to8(low) else: return to8(high) + u'亿' + to8(low)#========================================# 从列表页提取单章url, 然后下载单章文本#========================================from selenium import webdriverweb = webdriver.Chrome()num2chinese = Num2Chinese()full_text="小说:掌家小娘子"full_text=full_text+"\n" +"\n" +"\n"print(full_text)list_url=" #306for i in range(chapter_start,chapter_end+1): chinese_chapter_id=num2chinese.convert(i) #中文数字 #chinese_chapter_id=str(i) #阿拉伯数字 chinese_chapter_name="第"+chinese_chapter_id+"章" if chinese_chapter_name.find("百十"): chinese_chapter_name=chinese_chapter_name.replace("百十", "百一十") #print(chinese_chapter_name) web.get(list_url) #跳转会列表页, 以便抓取单页的url地址 url="" try: url=web.find_element_by_partial_link_text(chinese_chapter_name).get_attribute("href") except: url="" #print(url) if url: web.get(url) #

#//*[@id="content"] #content_tag = web.find_elements_by_css_selector("dd")[2] #content_tag = web.find_element_by_id("contents") #content_tag = web.find_element_by_class_name("container body-content") content_tag = web.find_element_by_xpath('''//*[@id="center"]''') content = content_tag.text else: content="不提供下载" chapter_text = "\n" + "\n" + "\n" + "======================" + "\n" + "第" + str(i) + "章" + "\n" chapter_text=chapter_text+content print(chapter_text) full_text=full_text+chapter_text#print(full_text)web.close()

================================

每章支持多个分页

作了性能优化

自动输出到文件

增加番外篇下载

代码逻辑优化

================================

根据正文内容 xpath 不固定

================================

小说下载脚本（写小说脚本）

java中的接口是类吗

Spring中的aware接口详情

Python接口自动化之文件上传/下载接口怎么实现

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

软件接口设计怎么做？前后端分离软件接口设计思路

接口管理平台推荐，几大接口管理平台总有一款适合你！