python_汇总和计算描述统计（python中统计）-eolink官网

python_汇总和计算描述统计（python中统计）

python_汇总和计算描述统计

Axis Indexes with Duplicate Labels# 带有重复标签的轴索引obj = pd.Series(range(5), index=['a', 'a', 'b', 'b', 'c'])obja 0a 1b 2b 3c 4dtype: int64obj.index.is_uniqueFalseobj['a']obj['c']4# 生成随机矩阵df = pd.DataFrame(np.random.randn(4, 3), index=['a', 'a', 'b', 'b'])df# df.loc['b']0 1 2a -1.442180 -2.276836 0.316662a -0.672861 0.644555 -0.593982b 3.645069 -0.690898 -1.010551b 0.590857 -0.285636 1.329229Summarizing and Computing Descriptive Statistics# 5.3 汇总和计算描述统计df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]], index=['a', 'b', 'c', 'd'], columns=['one', 'two'])dfone twoa 1.40 NaNb 7.10 -4.5c NaN NaNd 0.75 -1.3df.sum()one 9.25two -5.80dtype: float64# 传⼊axis='columns'或axis=1将会按⾏进⾏求和运算：df.sum(axis='columns')a 1.40b 2.60c 0.00d -0.55dtype: float64# NA值会⾃动被排除，除⾮整个切⽚（这⾥指的是⾏或列）都是NA。通过skipna选项可以禁⽤该功能df.mean(axis='columns', skipna=False)a NaNb 1.300c NaNd -0.275dtype: float64df.idxmax()one btwo ddtype: object# python_排名df.cumsum()one twoa 1.40 NaNb 8.50 -4.5c NaN NaNd 9.25 -5.8# 还有⼀种⽅法，它既不是约简型也不是累计型。describe就是⼀# 个例⼦，它⽤于⼀次性产⽣多个汇总统计：df.describe()one twocount 3.000000 2.000000mean 3.083333 -2.900000std 3.493685 2.262742min 0.750000 -4.50000025% 1.075000 -3.70000050% 1.400000 -2.90000075% 4.250000 -2.100000max 7.100000 -1.300000obj = pd.Series(['a', 'a', 'b', 'c'] * 4)obj.describe()count 16unique 3top afreq 8dtype: objectCorrelation and Covarianceconda install pandas-datareader# 相关系数与协⽅差# 加载数据price = pd.read_pickle('C:/file/code-sample-ok/python_practise/pydata-book-2nd-edition/examples/yahoo_price.pkl')volume = pd.read_pickle('C:/file/code-sample-ok/python_practise/pydata-book-2nd-edition/examples/yahoo_volume.pkl')import pandas_datareader.data as web all_data = {ticker: web.get_data_yahoo(ticker) for ticker in ['AAPL', 'IBM', 'MSFT', 'GOOG']}price = pd.DataFrame({ticker: data['Adj Close'] for ticker, data in all_data.items()}) volume = pd.DataFrame({ticker: data['Volume'] for ticker, data in all_data.items()})price.head()AAPL GOOG IBM MSFTDate 2010-01-04 27.990226 313.062468 113.304536 25.8841042010-01-05 28.038618 311.683844 111.935822 25.8924662010-01-06 27.592626 303.826685 111.208683 25.7335662010-01-07 27.541619 296.753749 110.823732 25.4659442010-01-08 27.724725 300.709808 111.935822 25.641571# 现在计算价格的百分数变化returns = price.pct_change()returns.tail()AAPL GOOG IBM MSFTDate 2016-10-17 -0.000680 0.001837 0.002072 -0.0034832016-10-18 -0.000681 0.019616 -0.026168 0.0076902016-10-19 -0.002979 0.007846 0.003583 -0.0022552016-10-20 -0.000512 -0.005652 0.001719 -0.0048672016-10-21 -0.003930 0.003011 -0.012474 0.042096returns['MSFT'].corr(returns['IBM'])returns['MSFT'].cov(returns['IBM'])8.870655479703546e-05returns.MSFT.corr(returns.IBM)0.4997636114415114# 另⼀⽅⾯，DataFrame的corr和cov⽅法将以DataFrame的形式# 分别返回完整的相关系数或协⽅差矩阵：returns.corr()returns.cov()AAPL GOOG IBM MSFTAAPL 0.000277 0.000107 0.000078 0.000095GOOG 0.000107 0.000251 0.000078 0.000108IBM 0.000078 0.000078 0.000146 0.000089MSFT 0.000095 0.000108 0.000089 0.000215利⽤DataFrame的corrwith⽅法，你可以计算其列或⾏跟另⼀个# Series或DataFrame之间的相关系数。传⼊⼀个Series将会返回# ⼀个相关系数值Series（针对各列进⾏计算）：# 利⽤DataFrame的corrwith⽅法，你可以计算其列或⾏跟另⼀个# Series或DataFrame之间的相关系数。传⼊⼀个Series将会返回# ⼀个相关系数值Series（针对各列进⾏计算）：returns.corrwith(returns.IBM)AAPL 0.386817GOOG 0.405099IBM 1.000000MSFT 0.499764dtype: float64传⼊⼀个DataFrame则会计算按列名配对的相关系数。这⾥，我# 计算百分⽐变化与成交量的相关系数：# 传⼊⼀个DataFrame则会计算按列名配对的相关系数。这⾥，我# 计算百分⽐变化与成交量的相关系数：returns.corrwith(volume)AAPL -0.075565GOOG -0.007067IBM -0.204849MSFT -0.092950dtype:

唯⼀值、值计数以及成员资格

Unique Values, Value Counts, and Membership# 唯⼀值、值计数以及成员资格obj = pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])uniques = obj.unique()uniquesarray(['c', 'a', 'd', 'b'], dtype=object)value_counts⽤于计算⼀个# Series中各值出现的频率：# value_counts⽤于计算⼀个# Series中各值出现的频率：obj.value_counts()a 3c 3b 2d 1dtype: int64# isin⽤于判断⽮量化集合的成员资格，可⽤于过滤Series中或# DataFrame列中数据的⼦集：pd.value_counts(obj.values, sort=False)a 3c 3d 1b 2dtype: int64objmask = obj.isin(['b', 'c'])maskobj[mask]0 c5 b6 b7 c8 cdtype: object# 与isin类似的是Index.get_indexer⽅法，它可以给你⼀个索引数# 组，从可能包含重复值的数组到另⼀个不同值的数组：to_match = pd.Series(['c', 'a', 'b', 'b', 'c', 'a'])unique_vals = pd.Series(['c', 'b', 'a'])pd.Index(unique_vals).get_indexer(to_match)array([0, 2, 1, 1, 0, 2], dtype=int64)# 到DataFrame中多个相关列的⼀张柱状图。data = pd.DataFrame({'Qu1': [1, 3, 4, 3, 4], 'Qu2': [2, 3, 1, 2, 3], 'Qu3': [1, 5, 2, 4, 4]})dataQu1 Qu2 Qu30 1 2 11 3 3 52 4 1 23 3 2 44 4 3 4# 将pandas.value_counts传给该DataFrame的apply函数# result = data.apply(pd.value_counts).fillna(0)result = data.apply(pd.value_counts)resultQu1 Qu2 Qu31 1.0 1.0 1.02 NaN 2.0 1.03 2.0 2.0 NaN4 2.0 NaN 2.05 NaN NaN 1.0

Gointerface接口声明实现及作用详解

316 2022-08-24

python_汇总和计算描述统计（python中统计）

java中的接口是类吗

Gointerface接口声明实现及作用详解

java接口与抽象类有哪些区别？

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

遇到百度网址安全中心提醒您该页面可能存在钓鱼欺诈信息

软件接口设计怎么做？前后端分离软件接口设计思路

python_汇总和计算描述统计（python中统计）

微信扫一扫：分享

推荐文章

最近发表

热评文章