利用python爬虫(案例2)--X凰的一天（Python爬虫笔记）-eolink官网

利用python爬虫(案例2)--X凰的一天（Python爬虫笔记）

学习笔记 PS:为啥这个BLOG是案例2，但是我的BLOG里没有案例1，那是因为BLOG1被锁了。心痛。

爬取新闻标题和链接

我想通过Xpath拿到X凰X闻[open('html_data.txt', 'r') as f: html = f.read()

这时，我们可以用二进制读取方式进行读取，再用utf-8编码格式，进行解码。

完整代码如下：

# -*- coding: utf-8 -*-from lxml import etreeimport pymysqlclass FenghuangXpath: def __init__(self): self.db = pymysql.connect(host = '127.0.0.1', port = 3306, user = 'root', password = '19970928', database = 'datacup', charset = 'utf8') self.cur = self.db.cursor() def get_page(self): with open('html_data.txt', 'rb') as f: html = f.read().decode('utf-8') self.parse_page(html) def parse_page(self, html): link_xpath = \'//ul[@class="news-stream-basic-news-list"]/li[@class="news-stream-newsStream-news-item-has-image clearfix news_item"]//h2/a/@href' name_xpath = \'//ul[@class="news-stream-basic-news-list"]/li[@class="news-stream-newsStream-news-item-has-image clearfix news_item"]//h2/a/text()' parse_html = etree.HTML(html) link_list = parse_html.xpath(link_xpath) link_list = ['for i in link_list] name_list = parse_html.xpath(name_xpath) data_zip = zip(name_list, link_list) self.write_data(data_zip) def write_data(self, data_zip): sql = 'insert into news_table(name, news_link) \ values(%s, %s);' try: self.cur.executemany(sql, data_zip) self.db.commit() except Exception as e: self.db.rollback() print('错误信息:', e) def main(self): self.get_page() self.cur.close() self.db.close() if __name__ == '__main__': fengh = FenghuangXpath() fengh.main()

查看news_table数据表内的数据：

很好，都导入了！但是知道为啥id不是从1开始么？因为我在创建news_table表时，设置id为主键且自动增大。此时，我们就算不传输id值，mysql也会自动帮我们填好id值。那这和id为7有啥关系呢？那是因为博主以为偷懒，在利用python传数据时只传递了name和news_link字段，id字段让mysql自动帮博主填写了。

所以，当3条新闻记录第1次导入news_table时，id的确是从1开始的，但是由于博主是个傻子，代码敲错了n次，导致传入的记录总是不符合要求，所以博主不停的把mysql数据库里news_table表里的数据删了又删，直至第3次，导入的数据终于符合要求了，但是此时mysql给我们传入的id值就是从7开始排列了。悲伤的故事。。。

使用SoapUI测试webservice接口详细步骤

286 2022-08-31

利用python爬虫(案例2)--X凰的一天（Python爬虫笔记）

Gointerface接口声明实现及作用详解

使用SoapUI测试webservice接口详细步骤

使用SpringBoot实现API接口

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

软件接口设计怎么做？前后端分离软件接口设计思路

接口管理平台推荐，几大接口管理平台总有一款适合你！