Python自动化办公--Pandas玩转Excel【一】python处理Excel实现自动化办公教学（含实战）【一】python处理Excel实现自动化办公教学（含实战）-eolink官网

Python自动化办公--Pandas玩转Excel【一】python处理Excel实现自动化办公教学（含实战）【一】python处理Excel实现自动化办公教学（含实战）

python处理Excel实现自动化办公教学（含实战）【一】

python处理Excel实现自动化办公教学（含实战）【二】

python处理Excel实现自动化办公教学（数据筛选、公式操作、单元格拆分合并、冻结窗口、图表绘制等）【三】

python入门之后须掌握的知识点（模块化编程、时间模块）【一】

python入门之后须掌握的知识点（excel文件处理+邮件发送+实战：批量化发工资条）【二】

1.基础温故【Pandas】

1.1 创建文件

import pandas as pddf = pd.DataFrame()df.to_excel('001.xlsx') #可以指定路径#df.to_excel('H:\\Anaconda\\001.xlsx')df = pd.DataFrame({'id':[1,2,3],'name':['a','b','c']})df.to_excel('001-data.xlsx')df = pd.DataFrame({'id':[1,2,3],'name':['a','b','c']})df = df.set_index('id')df.to_excel('001-data-index.xlsx')

第一幅图索引默认在A列，通过set_index把ID设置为索引。

1.2 读取excel中的数据

脏数据处理：第一行错误数据，或者没有数据

import pandas as pdpeople = pd.read_excel('people001.xlsx')print(people.shape)print(people.columns)# 默认打印3行print(people.head())print(people.head(3))# 默认打印5行print(people.tail())#脏数据处理：第一行错误数据，或者没有数据#存在空行会自动识别并跳过，获取列名people = pd.read_excel('people002.xlsx',header=1)print(people.columns)#脏数据处理：第一行没有列名，添加列名people = pd.read_excel('people003.xlsx',header=None)people.columns = ['ID', 'Type', 'Title', 'FirstName', 'MiddleName', 'LastName']people = people.set_index('ID',inplace=True)people.to_excel('output.xlsx')

其中在colums中是把列名和索引区别的，

people = people.set_index('ID',inplace=True)#设置完index后，print(people.columns)#显示'Type', 'Title', 'FirstName', 'MiddleName', 'LastName'

再次读取时：id还是会当作列

这时候在读取的时候需要设置index，即可。

import pandas as pdpeople = pd.read_excel('people001.xlsx'，index_col="ID")

1.3 生成列、行、单元格（Series）

Series和python中的字典类似，下面是几种创建方法：

import pandas as pdd = { 'x':100, 'y':200, 'z':300,}print(d.values())print(d.keys())s1 = pd.Series(d)print(s1.index)L1 = [100,200,300]L2 = ['x','y','z']s2 = pd.Series(L1,index=L2)print(s2.index)s3 = pd.Series([100,200,300],index=['x','y','z'])print(s3.index)

创建一个简单的列表：行列不同形式添加。

index是默认对齐的方式，如果不相同会用NaN填充。

1.4 自动填充功能【数据区域读取填充数字】

1.4.1 数值填充

原始数据：只有name（书名）进行填充数据

数据区域不是定格，无法自动识别

import pandas as pdbooks = pd.read_excel('books.xlsx',skiprows=4,usecols='C:F',index_col=None)#usecols='C,D,E,F',填充完再设置index_colprint(books)#NaN填充的dtype是float64

import pandas as pdbooks = pd.read_excel('books.xlsx',skiprows=4,usecols='C:F',index_col=None)for i in books.index: books["ID"].at[i]=i+1print(books)

为了显示为整型，先把类型设置为str

import pandas as pdbooks = pd.read_excel('books.xlsx',skiprows=4,usecols='C:F',index_col=None,dtype={"ID":str,"InStore":str,"Date":str})for i in books.index: books["ID"].at[i]=i+1print(books)

import pandas as pdbooks = pd.read_excel('books.xlsx',skiprows=4,usecols='C:F',index_col=None,dtype={"ID":str,"InStore":str,"Date":str})for i in books.index: books["ID"].at[i]=i+1 books["InStroe"].at[i]="yes" if i%2==0 else "no"print(books)

import pandas as pdfrom datetime import date, timedeltabooks = pd.read_excel('books.xlsx',skiprows=4,usecols='C:F',index_col=None,dtype={"ID":str,"InStore":str,"Date":str})start=date(2018,1,1)for i in books.index: books["ID"].at[i]=i+1 books["InStroe"].at[i]="yes" if i%2==0 else "no" books["Date"].at[i]=start+timedelta(days=i) #没有年月 month year；时分秒有 #books["Date"].at[i]=date(start.year+i,start.month,start.day)print(books)

月份相加需要计算一下，定义个子函数

import pandas as pdfrom datetime import date, timedeltadef add_month[d, md): yd=md/12 m=d.month+md%12 if m!= 12: yd+=m/12 m=m%12 return date(d.year + yd,m, d.day)books = pd.read_excel('books.xlsx',skiprows=4,usecols='C:F',index_col=None,dtype={"ID":str,"InStore":str,"Date":str})start=date(2018,1,1)for i in books.index: books["ID"].at[i]=i+1 books["InStroe"].at[i]="yes" if i%2==0 else "no" books["Date"].at[i]=start+timedelta(days=i) #没有年月 month year；时分秒有 #books["Date"].at[i]=date(start.year+i,start.month,start.day) #books["Date"].at[i]=add_month(start,i)#print(books)books.set_index("ID",inplace=True)books.to_excel("output/xlsx")

还有一种写法不改series直接改单元格写法如下：

for i in books.index: booksat[i,"ID"]]=i+1 books.at[i,"InStroe"]="yes" if i%2==0 else "no" books.at[i,"Date"]=start+timedelta(days=i) #没有年月 month year；时分秒有 #books["Date"].at[i]=date(start.year+i,start.month,start.day) #books["Date"].at[i]=add_month(start,i)#print(books)

1.4.2 计算填充（列操作）

列相乘，操作符重载【不用循环计算更方便】

循环：【不从头到尾计算，部分区域计算采用单元格计算】

价格加2 使用apply

lambda：

1.5 排序，多重排序

ascending默认从小到大排序：【true 从大到小 false从小到大】

1.6 数据筛选、过滤

找出年龄【18，30】分数【60，90】之间的

import pandas as pddef validate_age(a): return 18 <= a <= 30 #pandas特有写法def level_b(s): return 60 <= s < 90students = pd.read_excel('Students.xlsx', index_col='ID')#id作为indexstudents = students.loc[students['Age'].apply(validate_age)].loc[students.Score.apply(level_b)] # 两种语法students = students.loc[students.Age.apply(validate_age)].loc[students.Score.apply(level_b)] # 两种语法print(students)

loc与iloc功能介绍:数据切片。通过索引来提取数据集中相应的行数据or列数据(可以是多行or多列)总结不同: 1. loc函数通过调用index名称的具体值来取数据 2. iloc函数通过行序号来取数据 3.取多行数据时iloc不包含末尾 4.对数据进行筛选使用loc函数，当使用loc函数时，如果index不具有特定意义，而且重复，那么提取的数据需要进一步处理,可用.reset index()函数重置index相同: . 5.【】中无逗号时，默认取行

筛选出来的结果：

Name Age ScoreID4 Student_004 27 738 Student_008 21 619 Student_009 18 8519 Student_019 19 86

换一种写法：lambda

import pandas as pd# def validate_age(a):# return 18 <= a <= 30# def level_b(s):# return 60 <= s < 90students = pd.read_excel('Students.xlsx', index_col='ID')students = students.loc[students['Age'].apply( lambda a:18 <= a <= 30)] .loc[students.Score.apply(lambda s:60 <= s < 90)] # 两种语法print(students)

2.数据可视化

2.1 柱状图

Field	Number
Agriculture	12,318
Business and Management	200,312
Education	19,483
Engineering	216,932
Fine and Applied Arts	59,736
Health Professions	33,947
Humanities	17,664
Mathematics and Computer Sciences	141,651
Other/Unspecified Subject Areas	185,107
Physical and Life Sciences	75,385
Social Sciences	81,304

import pandas as pdimport matplotlib.pyplot as pltstudents = pd.read_excel('Students1.xlsx')students.sort_values(by='Number', inplace=True, ascending=False)students.index = range(0, len(students))print(students)plt.bar(students['Field'], students['Number'], color='orange', width=0.7)#plt.xticks(students['Field'], rotation='90') #rotation旋转plt.title('International Student by Field', fontsize=16)plt.xlabel('Field')plt.ylabel('Number')plt.tight_layout() #j紧凑型，避免下标显示不全plt.show()

pandas中inplace参数在很多函数中都会有，它的作用是:是否在原对象基础上进行修改

inplace = True:不创建新的对象，直接对原始对象进行修改; inplace = False:对数据进行修改，创建并返回新的对象承载其修改结果。

默认是False，即创建新的对象进行修改，原对象不变，和深复制和浅复制有些类似。

或者直接用pandas自带的：

import pandas as pdimport matplotlib.pyplot as pltstudents = pd.read_excel('C:/Temp/Students.xlsx')students.sort_values('Number', inplace=True, ascending=False)print(students)students.plot.bar(x='Field', y='Number', color='blue', title='International Students by Field')plt.tight_layout()plt.show()

2.2 分组柱图深度优化(比较图）

Field	2016	2017
Agriculture	12,318	12,602
Business and Management	200,312	200,754
Communications and Journalism	21,160	21,913
Education	19,483	17,993
Engineering	216,932	230,711
Fine and Applied Arts	59,736	61,506
Humanities	17,664	17,561
Intensive English	40,877	30,309
Legal Studies and Law Enforcement	15,077	15,306
Math and Computer Science	141,651	167,180
Physical and Life Sciences	75,385	76,838
Social Sciences	81,304	83,046
Other Fields of Study	81,318	87,577
Undeclared	26,675	21,131

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltstudents = pd.read_excel('Students2.xlsx')students.sort_values(by='2017', inplace=True, ascending=False)print(students)students.plot.bar('Field', ['2016', '2017'], color=['orange', 'Red'])plt.title('International Students by Field', fontsize=16,fontweight="bold")plt.xlabel('Field', fontweight='bold')plt.ylabel('Number', fontweight='bold')plt.tight_layout()ax = plt.gca() #坐标轴移动修改ax.set_xticklabels(students['Field'], rotation=40, ha='right') #默认中心旋转plt.gcf().subplots_adjust(left=0.2, bottom=0.42) #画布大小调整plt.show()

推荐第一个

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltstudents = pd.read_excel('Students2.xlsx')students.sort_values(by='2017', inplace=True, ascending=False)students.index = range(0, len(students))print(students)bar_width = 0.7x_pos = np.arange(len(students) * 2, step=2)plt.bar(x_pos, students['2016'], color='green', width=bar_width)plt.bar(x_pos + bar_width, students['2017'], color='blue', width=bar_width)plt.legend()plt.xticks(x_pos + bar_width / 2, students['Field'], rotation='90')plt.title('International Student by Field', fontsize=16)plt.xlabel('Field')plt.ylabel('Number')plt.tight_layout()plt.show()

2.3 叠加柱状图

用户总量从大到小排序：

import pandas as pdimport matplotlib.pyplot as pltusers = pd.read_excel('Users.xlsx')users['Total'] = users['Oct'] + users['Nov'] + users['Dec']users.sort_values(by='Total', inplace=True, ascending=False)print(users)users.plot.bar(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True)# users.plot.barh(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True)#水平柱状图堆积plt.tight_layout()plt.show()

users.sort_values(by='Total', inplace=True, ascending=Ture)users.plot.barh(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True)#水平柱状图堆积

2.4 饼图

其中2016 2017是字符串，避免pandas误认为数字。

import pandas as pdimport matplotlib.pyplot as pltstudents = pd.read_excel('Students3.xlsx', index_col='From')print(students)# counterclock顺逆时针，startangle开始点确认students['2017'].plot.pie(fontsize=8, counterclock=False, startangle=-270)plt.title('Source of International Students', fontsize=16, fontweight='bold')plt.ylabel('2017', fontsize=12, fontweight='bold')plt.show()

2.5 折现趋势图，叠加区域图

import pandas as pdimport matplotlib.pyplot as pltweeks = pd.read_excel('Orders.xlsx', index_col='Week')print(weeks)weeks.plot(y=['Accessories', 'Bikes', 'Clothing', 'Components'])weeks.plot.area(y=['Accessories', 'Bikes', 'Clothing', 'Components'])plt.title('Sales Trends', fontsize=16, fontweight='bold')plt.xticks(weeks.index, fontsize=8)plt.show()

2.6 散点图直方图密度图

import pandas as pdimport matplotlib.pyplot as pltpd.options.display.max_columns = 999#所有列都会显示homes = pd.read_excel('home_data.xlsx')# print(homes.head())print(homes.corr())#相关性homes.plot.scatter(x='sqft_living', y='price')plt.figure()homes.sqft_living.plot.kde() #密度图plt.figure()homes.sqft_living.plot.hist(bins=100) #区间设置plt.xticks(range(0, max(homes.sqft_living), 500), fontsize=8, rotation=90) #面积# homes.price.plot.hist(bins=200)# plt.xticks(range(0, max(homes.price), 100000), fontsize=8, rotation=90) #房价plt.show()

密度图：

Python接口自动化之文件上传/下载接口怎么实现

342 2022-09-04

Python自动化办公--Pandas玩转Excel【一】python处理Excel实现自动化办公教学（含实战）【一】python处理Excel实现自动化办公教学（含实战）

java中的接口是类吗

Spring中的aware接口详情

Python接口自动化之文件上传/下载接口怎么实现

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

遇到百度网址安全中心提醒您该页面可能存在钓鱼欺诈信息

软件接口设计怎么做？前后端分离软件接口设计思路