python_范围切分cut

网友投稿 266 2022-08-24


python_范围切分cut

python_范围切分cut

Discretization and Binning¶ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]bins = [18, 25, 35, 60, 100]# 离散化和⾯元划分 范围切分 cats = pd.cut(ages, bins)cats[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]Length: 12Categories (4, interval[int64]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]cats.categoriespd.value_counts(cats)# pandas返回的是⼀个特殊的Categorical对象。结果展示了# pandas.cut划分的⾯元。你可以将其看做⼀组表示⾯元名称的字# 符串。它的底层含有⼀个表示不同分类名称的类型数组,以及⼀# 个codes属性中的年龄数据的标签:cats.codescats.categoriespd.value_counts(cats)(18, 25] 5(35, 60] 3(25, 35] 3(60, 100] 1dtype: int64# 划分区间pd.cut(ages, [18, 26, 36, 61, 100], right=False)[[18, 26), [18, 26), [18, 26), [26, 36), [18, 26), ..., [26, 36), [61, 100), [36, 61), [36, 61), [26, 36)]Length: 12Categories (4, interval[int64]): [[18, 26) < [26, 36) < [36, 61) < [61, 100)]# 设置区间名称group_names = ['Youth', 'YoungAdult', 'MiddleAged', 'Senior']pd.cut(ages, bins, labels=group_names)[Youth, Youth, Youth, YoungAdult, Youth, ..., YoungAdult, Senior, MiddleAged, MiddleAged, YoungAdult]Length: 12Categories (4, object): [Youth < YoungAdult < MiddleAged < Senior]data = np.random.rand(20)pd.cut(data, 4, precision=2)data = np.random.randn(1000) # Normally distributedcats = pd.qcut(data, 4) # Cut into quartilescatspd.value_counts(cats)pd.qcut(data, [0, 0.1, 0.5, 0.9, 1.])pd.qcut(data, [0, 0.1, 0.5, 0.9, 1.])Detecting and Filtering Outliers检测和过滤异常值# 过滤或变换异常值(outlier)在很⼤程度上就是运⽤数组运算。# 来看⼀个含有正态分布数据的DataFrame:# 检测和过滤异常值# 过滤或变换异常值(outlier)在很⼤程度上就是运⽤数组运算。# 来看⼀个含有正态分布数据的DataFrame:data = pd.DataFrame(np.random.randn(1000, 4))data.describe()0 1 2 3count 1000.000000 1000.000000 1000.000000 1000.000000mean -0.042219 -0.021445 -0.007653 -0.007220std 0.949976 0.987228 1.030498 0.997743min -3.194414 -3.108915 -3.645860 -3.48159325% -0.702411 -0.699059 -0.714481 -0.68923750% -0.048305 0.001385 0.029337 0.02267175% 0.617693 0.619308 0.685055 0.674289max 3.023720 2.859053 3.189940 3.525865# 假设你想要找出某列中绝对值⼤⼩超过3的值col = data[2]col[np.abs(col) > 3]30 -3.645860334 -3.018842489 -3.183867536 -3.140963929 3.082067957 3.189940Name: 2, dtype: float64# 含有“超过3或-3的值”的⾏,可以在布尔型DataFrame中使⽤any⽅法:data[(np.abs(data) > 3).any(1)]0 1 2 39 0.582317 -0.658090 -0.207434 3.52586530 -0.080332 0.599947 -3.645860 0.255475252 -1.528975 -1.559625 0.336788 -3.333767334 0.581893 -1.116332 -3.018842 -0.298748359 -0.048478 -3.108915 1.117755 -0.152780489 -0.274138 1.188742 -3.183867 1.050471536 1.741426 -2.214074 -3.140963 -1.509976702 -3.194414 0.077839 -1.733549 0.235425732 3.023720 -1.105312 0.105141 0.995257760 0.062528 2.368010 0.452649 -3.481593929 -0.071320 0.164293 3.082067 -0.516982957 0.617599 -0.843849 3.189940 0.070978# 以将值限制在区间-3到3以内:data[np.abs(data) > 3] = np.sign(data) * 3data.describe()0 1 2 3count 1000.000000 1000.000000 1000.000000 1000.000000mean -0.042048 -0.021336 -0.006936 -0.006930std 0.949274 0.986893 1.026570 0.993388min -3.000000 -3.000000 -3.000000 -3.00000025% -0.702411 -0.699059 -0.714481 -0.68923750% -0.048305 0.001385 0.029337 0.02267175% 0.617693 0.619308 0.685055 0.674289max 3.000000 2.859053 3.000000 3.000000# 根据数据的值是正还是负,np.sign(data)可以⽣成1和-1:np.sign(data).head()0 1 2 30 1.0 -1.0 1.0 -1.01 -1.0 1.0 -1.0 -1.02 1.0 -1.0 -1.0 1.03 -1.0 -1.0 1.0 -1.04 -1.0 1.0 1.0 1.0


版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:rabbitmq学习系列教程之消息应答(autoAck)、队列持久化(durable)及消息持久化
下一篇:python计算得到auc值(python roc_auc_score)
相关文章

 发表评论

暂时没有评论,来抢沙发吧~