Python----源码安装Python语言(CentOS7系统)(centos7 python3安装)
582
2022-08-22
Python 爬虫 爬取爱奇艺VIP视频(python和java哪个更值得学)
一、第三方库
requests >>> pip install requests 发送请求 访问网站
tqdm >>> pip install tqdm 进度条 模块
二、开发环境
版 本: python 3.8
编辑器:pycharm 2021.2
三、模块安装问题
win + R 输入cmd 输入安装命令 pip install 模块名 (如果你觉得安装速度比较慢, 你可以切换国内镜像源)
模块安装问题:
- 如何安装python第三方模块:
- 安装失败原因:
- 失败一: pip 不是内部命令
解决方法: 设置环境变量
- 失败二: 出现大量报红 (read time out)
解决方法: 因为是网络链接超时, 需要切换镜像源
清华:install -i 模块名
- 失败三: cmd里面显示已经安装过了, 或者安装成功了, 但是在pycharm里面还是无法导入
解决方法: 可能安装了多个python版本 (anaconda 或者 python 安装一个即可) 卸载一个就好
或者你pycharm里面python解释器没有设置好
四、配置pycharm里面的python解释器
1. 选择file(文件) >>> setting(设置) >>> Project(项目) >>> python interpreter(python解释器)
3. 添加python安装路径
五、pycharm如何安装插件
1. 选择file(文件) >>> setting(设置) >>> Plugins(插件)
六、爬虫基本思路
爬视频
m3u8: 视频流格式
ts片段 网站链接 总和 m3u8 网站链接(所有的ts片段链接)
省流
mp4 访问一个网站 视频网站
解放 服务器压力
实现一个视频爬虫
分析数据来源(m3u8网站链接)
发送请求 (访问网站)
2. 获取数据
3. 解析数据
七、完整代码
import requestsimport refrom tqdm import tqdmheaders = { 'cookie': 'QC005=fb211523bdc556b600a53cb72de24305; QC006=e0mhjuh843mffyx4kqdsf1po; QP0030=1; TQC030=1; T00404=d229739aacf304df0bbde71c6736c979; QC173=0; QP0034=%7B%22v%22%3A1%2C%22dm%22%3A%7B%22wv%22%3A1%7D%2C%22m%22%3A%7B%22wm-vp9%22%3A1%2C%22wm-av1%22%3A1%7D%7D; QC008=1658151456.1658151456.1659015716.2; nu=0; P00004=.1659015719.b9ba4b25bc; QC160=%7B%22type%22%3A2%2C%22conformLoginType%22%3A0%7D; QY_PUSHMSG_ID=fb211523bdc556b600a53cb72de24305; QYABEX={"mergedAbtest":"4269_B,3075_A,4580_A,1550_B,1707_B","PCW_1_LoginCash":{"value":"1","abtest":"4269_B"},"PCW_1_new_player":{"value":"0","abtest":"3075_A"},"PCW_1_qyhome_recommend_sources":{"value":"0","abtest":"4580_A"},"pcw_home_hover":{"value":"1","abtest":"1550_B"},"PCW-Home-List":{"value":"1","abtest":"1707_B"}}; QP0033=1; T00700=EgcI9L-tIRABEgcI58DtIRABEgcIq8HtIRABEgcIrcHtIRAB; QP0037=60; P00001=05fA3TTaBsyaafH2gNMCU7rFlsEK6qA9zeYPH8bDQN9auzFUsVMkYEfSm2Em1CTE4oim3b7; P00007=05fA3TTaBsyaafH2gNMCU7rFlsEK6qA9zeYPH8bDQN9auzFUsVMkYEfSm2Em1CTE4oim3b7; P00003=1637120337; P00002=%7B%22uid%22%3A1637120337%2C%22pru%22%3A1637120337%2C%22user_name%22%3A%22199****7649%22%2C%22nickname%22%3A%22%5Cu5bcc%5Cu58eb%5Cu5c71%5Cu4e0b2010duo%22%2C%22pnickname%22%3A%22%5Cu5bcc%5Cu58eb%5Cu5c71%5Cu4e0b2010duo%22%2C%22type%22%3A11%2C%22email%22%3A%22%22%7D; P00010=1637120337; P01010=1659024000; P00PRU=1637120337; QC170=1; QC179=%7B%22vipTypes%22%3A%2216%22%2C%22userIcon%22%3A%22%2F%2Fimg7.iqiyipic.com%2Fpassport%2F20200101%2F90%2F90%2Fpassport_1637120337_157780421165796_130_130.jpg%22%2C%22iconPendant%22%3A%22%22%2C%22uid%22%3A1637120337%2C%22bannedVip%22%3Afalse%2C%22allVip%22%3Atrue%7D; QC175=%7B%22upd%22%3Atrue%2C%22ct%22%3A1659016055538%7D; QP0013=16; QC163=1; QP0027=5; __dfp=a1691ca7d5a6964b49995377607b0302249996fd6c8dca1ebc59539a4f410e402d@1659447456189@1658151457189; QY00001=1637120337; QP0025=1; QP0035=5; QP0036=2022728%7C80.672; QC007=QC010=160607417; IMS=IggQABj_5IqXBioqCiA4ODgxNzY2YTAyOWZlMzc2ZDBhNDRkMzQzNGZiOTM1NBAAIgAoSjAFciQKIDg4ODE3NjZhMDI5ZmUzNzZkMGE0NGQzNDM0ZmI5MzU0EACCAQCKASQKIgogODg4MTc2NmEwMjlmZTM3NmQwYTQ0ZDM0MzRmYjkzNTQ; QC159=%7B%22color%22%3A%22FFFFFF%22%2C%22channelConfig%22%3A0%2C%22hideRoleTip%22%3A1%2C%22isOpen%22%3A1%2C%22speed%22%3A10%2C%22density%22%3A40%2C%22opacity%22%3A86%2C%22isFilterColorFont%22%3A1%2C%22isOpenMask%22%3A0%2C%22proofShield%22%3A0%2C%22forcedFontSize%22%3A24%2C%22isFilterImage%22%3A1%2C%22defaultSwitch%22%3A0%2C%22hadTip%22%3A1%2C%22clickRole%22%3A0%7D', 'origin': ' 'referer': ' 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',}url = '= requests.get(url=url, headers=headers)json_data = response.json()m3u8 = json_data['data']['program']['video'][1]['m3u8']ts_list = re.sub('#E.*', '', m3u8)ts_list = ts_list.split()for ts in tqdm(ts_list): ts_data = requests.get(ts).content with open('远山淡影.mp4', mode='ab') as f: f.write(ts_data)
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~