python多线程爬取网页名称写入到excel

网友投稿 367 2022-08-26


python多线程爬取网页名称写入到excel

#!/usr/bin/env python# coding: utf-8# In[1]:import pandas as pdimport threading import requestsfrom bs4 import BeautifulSoupfrom time import sleepfrom datetime import datetime# In[2]:df = pd.read_excel("网站对应名字.xlsx")# In[16]:sites = df.URLdata_count = len(sites)thread_count = 16threads = []n_loops = range(thread_count)# In[17]:names = [None]*data_count# In[18]:def get_url_title(site): try: html = requests.get(site) soup = BeautifulSoup(html.content) return soup.find("title").text except BaseException: return "网址有误"# In[19]:# 从改点开始def write_title(start): # 引用全局变量 global data_count,thread_count,names for i in range(start,data_count,thread_count): names[i] = get_url_title(sites[i]) print(i,names[i])# In[20]:def main(): global threads,n_loops for i in n_loops: t = threading.Thread(target=write_title,args=(i,)) threads.append(t) # 启动 多个线程 for i in n_loops: threads[i].start() # wait for all threads to finish for i in n_loops: threads[i].join() # In[21]:if __name__ == '__main__': main()# In[22]:names# In[10]:names# In[11]:len(names)# In[12]:df.info# In[23]:import multiprocessingprint(multiprocessing.cpu_count())# In[ ]:


版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:python-6.接受用户输入(python用于获取用户输入的命令)
下一篇:Quarkus中ConfigSourceInterceptor的加密配置实现
相关文章

 发表评论

暂时没有评论,来抢沙发吧~