NumPy

网友投稿 247 2022-08-22


NumPy

Numpy提供多维数组对象(以存储同构或者异构<即结构数组>数据)以及操作这些对象的优化函数/方法。

参见维基百科​​NumPy​​

NumPy

Type: module

Provides

An array object of arbitrary homogeneous itemsFast mathematical operations over arraysLinear Algebra, Fourier Transforms, Random Number Generation

How to use the documentation

Documentation is available in two forms: docstrings provided with the code, and a loose standing reference guide, available from​​​the NumPy homepage​​​ ​​recommend exploring the docstrings using​​​IPython​​​ ​​an advanced Python shell with TAB-completion and introspection capabilities.

For some objects, ​​np.info(obj)​​​ may provide additional help(用来获取函数,类,模块的一些相关信息). This is particularly true if you see the line "Help on ufunc object:" at the top of the help() page. Ufuncs are implemented in C, not Python, for speed. The native Python help() does not know how to view their help, but our np.info() function does.

To search for documents containing a keyword, do::

import numpy as npnp.lookfor('keyword')

General-purpose documents like a glossary and help on the basic concepts of numpy are available under the ​​​doc​​ sub-module::

from numpy import dochelp(doc)

Available subpackages---------------------doc Topical documentation on broadcasting, indexing, etc.lib Basic functions used by several sub-packages.random Core Random Toolslinalg Core Linear Algebra Toolsfft Core FFT routinespolynomial Polynomial toolstesting NumPy testing toolsf2py Fortran to Python Interface Generator.distutils Enhancements to distutils with support for Fortran compilers support and more.Utilities---------test Run numpy unittestsshow_config Show numpy build configurationdual Overwrite certain functions with high-performance Scipy toolsmatlib Make everything matrices.__version__ NumPy version string

下面举几个例子:

import numpy as nphelp(doc) help(doc.creation)doc.basics?help(np.lib)

ndarray预览

翻译自​​Quickstart tutorial¶​​​ NumPy的主要的对象是同类的​​多维数组​​(homogeneous multidimensional array)。 NumPy的维度(dimensions)被称为​​轴(axes)​​。 轴的数字代表​​rank​​。

例如,在三维空间中一个坐标(coordinates)为​​[1, 2, 1]​​的点是一维数组,axis的长度(length)是3。而

[[ 1., 0., 0.], [ 0., 1., 2.]]

的rank是 2 (此数组是2-dimensional)。它的第一个维度(​​dimension (axis)​​ )的长度是 2, 第二个维度长度是3。

NumPy的array类被称为​​ndarray​​。

​​ndarray.ndim​​: 数组的坐标轴(或轴或维度)(axes (dimensions))的个数。​​ndarray.shape​​​: 数组的维度(dimensions),是由每个维度的​​length​​​组成的整数元组。对于一个n行m列的矩阵(matrix), shape便是​​​(n,m)​​。​​ndarray.size​​​: 数组的元素(elements)的总数,等于​​shape​​的元素的积。​​ndarray.dtype​​:一个描述数组的元素的类型的对象。​​ndarray.itemsize​​​:数组的每个元素的二进制表示的大小。 例如,元素的类型为​​float64​​​的数组有 8 (=64/8)个​​itemsize​​​,类型为​​ complex32​​​是​​itemsize 4 (=32/8)​​。​​ndarray.data​​:the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

下面有一些示例:

z = np.array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])t = np.array([z, 2 * z + 1])t

array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]], [[ 1, 3, 5, 7, 9], [11, 13, 15, 17, 19], [21, 23, 25, 27, 29]]])

print('z.ndim = ', z.ndim)print('t.ndim = ', t.ndim)

z.ndim = 2t.ndim = 3

print('z.shape = ',z.shape)print('t.shape = ',t.shape)

z.shape = (3, 5)t.shape = (2, 3, 5)

print('z.size = ',z.size)print('t.size = ',t.size)

z.size = 15t.size = 30

t.dtype.name

'int32'

t.itemsize

4

type(t)

numpy.ndarray

ndarray索引

z

array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])

z[0] # 第一行元素

array([0, 1, 2, 3, 4])

z[0, 2] # 第一行的第三个元素

2

t[0]

array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])

t[0][2]

array([10, 11, 12, 13, 14])

t[0, 2]

array([10, 11, 12, 13, 14])

t[0, 2, 3]

13

t[0, :2, 2:4]

array([[2, 3], [7, 8]])

对于列表

e = [1, 2, 3, 4]p = [e, e]p[0][0]

1

p[0,0] # 这种语法是错误的

---------------------------------------------------------------------------TypeError Traceback (most recent call last) in ()----> 1 p[0,0] # 这种语法是错误的TypeError: list indices must be integers or slices, not tuple

ndarray支持向量化运算

作用于每个元素的运算

z

array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])

z.sum() # 所有元素的sum

105

z.sum(axis = 0) # sum along axis 0, i.e. column-wise sum,相当于矩阵的行向量

array([15, 18, 21, 24, 27])

z.sum(axis = 1) # 相当于矩阵的列向量

array([10, 35, 60])

z.std() # 所有元素标准差

4.3204937989385739

z.std(axis = 0)

array([ 4.0824829, 4.0824829, 4.0824829, 4.0824829, 4.0824829])

z.cumsum() # 所有元素的累积和

array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105], dtype=int32)

z * 2 # 类似矩阵的数量乘法

array([[ 0, 2, 4, 6, 8], [10, 12, 14, 16, 18], [20, 22, 24, 26, 28]])

z ** 2

array([[ 0, 1, 4, 9, 16], [ 25, 36, 49, 64, 81], [100, 121, 144, 169, 196]], dtype=int32)

np.sqrt(z)

array([[ 0. , 1. , 1.41421356, 1.73205081, 2. ], [ 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ], [ 3.16227766, 3.31662479, 3.46410162, 3.60555128, 3.74165739]])

y = np.arange(10) # 类似 Python 的 range, 但是回传 arrayy

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

a = np.array([1, 2, 3, 6])b = np.linspace(0, 2, 4) # 建立一個array, 在0与2的范围之间4等分c = a - bc

array([ 1. , 1.33333333, 1.66666667, 4. ])

# 全域方法a = np.linspace(-np.pi, np.pi, 100) b = np.sin(a)c = np.cos(a)

b = np.array([1,2,3,4])a = np.array([4,5,6,7])print('a + b = ', a + b)print('a - b = ', a - b)print('a * b = ', a * b)print('a / b = ', a / b)print('a // b = ', a // b)print('a % b = ', a % b)

a + b = [ 5 7 9 11]a - b = [3 3 3 3]a * b = [ 4 10 18 28]a / b = [ 4. 2.5 2. 1.75]a // b = [4 2 2 1]a % b = [0 1 0 3]

对于非数值型数组

a = np.array(list('python'))a

array(['p', 'y', 't', 'h', 'o', 'n'], dtype='

b = np.array(list('numpy'))b

array(['n', 'u', 'm', 'p', 'y'], dtype='

a + b

---------------------------------------------------------------------------TypeError Traceback (most recent call last) in ()----> 1 a + bTypeError: ufunc 'add' did not contain a loop with signature matching types dtype('

list(a) + list(b)

['p', 'y', 't', 'h', 'o', 'n', 'n', 'u', 'm', 'p', 'y']

线性代数

from numpy.random import randfrom numpy.linalg import solve, inva = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]])a.transpose()

array([[ 1. , 3. , 5. ], [ 2. , 4. , 9. ], [ 3. , 6.7, 5. ]])

inv(a)

array([[-2.27683616, 0.96045198, 0.07909605], [ 1.04519774, -0.56497175, 0.1299435 ], [ 0.39548023, 0.05649718, -0.11299435]])

b = np.array([3, 2, 1])solve(a, b) # 解方程式 ax = b

array([-4.83050847, 2.13559322, 1.18644068])

c = rand(3, 3) # 建立一個 3x3 随机矩阵c

array([[ 0.98539238, 0.62602057, 0.63592577], [ 0.84697864, 0.86223698, 0.20982139], [ 0.15532627, 0.53992238, 0.65312854]])

np.dot(a, c) # 矩阵相乘

array([[ 3.14532847, 3.97026167, 3.01495417], [ 7.38477771, 8.94448958, 7.1230241 ], [ 13.32640097, 13.58984759, 8.33366406]])

数组的创建

参考 ​​np.doc.creation?​​ There are 5 general mechanisms for creating arrays:

Conversion from other Python structures (e.g., lists, tuples)Intrinsic numpy array array creation objects (e.g., arange, ones, zeros,etc.)Reading arrays from disk, either from standard or custom formatsCreating arrays from raw bytes through the use of strings or buffersUse of special library functions (e.g., random)

import numpy as npx = np.array([2,3,1,0])x1 = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, and typesx2 = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]])y = np.zeros((2, 3))y1 = np.ones((2,3))y2 = np.arange(10)y3 = np.arange(2, 10, dtype=np.float)y4 = np.arange(2, 10, 0.2)y5 = np.linspace(1., 4., 6) # 将1和4之间六等分z = np.indices((3,3))r = [x, x1, x2, y, y1, y2, y3, y4, y5, z]s = 'x, x1, x2, y, y1, y2, y3, y4, y5, z'.split(', ')for i in range(len(r)): print('%s = ' % s[i]) print('') print(r[i]) print(75 * '=')

x = [2 3 1 0]===========================================================================x1 = [[ 1.+0.j 2.+0.j] [ 0.+0.j 0.+0.j] [ 1.+1.j 3.+0.j]]===========================================================================x2 = [[ 1.+0.j 2.+0.j] [ 0.+0.j 0.+0.j] [ 1.+1.j 3.+0.j]]===========================================================================y = [[ 0. 0. 0.] [ 0. 0. 0.]]===========================================================================y1 = [[ 1. 1. 1.] [ 1. 1. 1.]]===========================================================================y2 = [0 1 2 3 4 5 6 7 8 9]===========================================================================y3 = [ 2. 3. 4. 5. 6. 7. 8. 9.]===========================================================================y4 = [ 2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2 4.4 4.6 4.8 5. 5.2 5.4 5.6 5.8 6. 6.2 6.4 6.6 6.8 7. 7.2 7.4 7.6 7.8 8. 8.2 8.4 8.6 8.8 9. 9.2 9.4 9.6 9.8]===========================================================================y5 = [ 1. 1.6 2.2 2.8 3.4 4. ]===========================================================================z = [[[0 0 0] [1 1 1] [2 2 2]] [[0 1 2] [0 1 2] [0 1 2]]]===========================================================================

Tips: 关于参数 ​​order​​:

​​order​​​ 指内存中存储元素的顺序,​​C​​​ 指和 ​​C语言​​​ 相似(即行优先),​​F​​​ 指和 ​​Fortran​​ 相似(即列优先)

g = np.ones((2,3,4), dtype = 'i', order = 'C') # 还有 `np.zeros()`g

array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]], dtype=int32)

# 可将其他数组作为参数传入,返回传入数组的 `shape` 相同的全一矩阵h = np.ones_like(g, dtype = 'float16', order = 'C') # 还有 `np.zeros_like()`h

array([[[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]], [[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]], dtype=float16)

注意事项:

数组的组成/长度/大小在任何维度内都是​​同质的​​。整个数组只允许一种数据类型(numpy.dtype)。

NumPy dtype对象

​dtype​

描述

示例

​t​

位域

​t4​​(4位)

​b​

布尔值

​b​​​(​​True​​​或​​False​​)

​I​

整数

​i8​​(64位)

​u​

无符号整数

​u8​​(64位)

​f​

浮点数

​f8​​(64位)

​c​

浮点复数

​c16​​(128位)

​o​

对象

​o​​(指向对象的指针)

​S,a​

字符串

​S24​​(24个字符)

​U​

​Unicode​

​U24​​(24个Unicode字符)

​V​

其他

​V12​​(12字节数据块)

结构数组

允许我们至少在每列上使用不同的NumPy数据类型。

np.info(np.dtype)

dtype()dtype(obj, align=False, copy=False)Create a data type object.A numpy array is homogeneous, and contains elements described by adtype object. A dtype object can be constructed from differentcombinations of fundamental numeric types.Parameters----------obj Object to be converted to a data type object.align : bool, optional Add padding to the fields to match what a C compiler would output for a similar C-struct. Can be ``True`` only if `obj` is a dictionary or a comma-separated string. If a struct dtype is being created, this also sets a sticky alignment flag ``isalignedstruct``.copy : bool, optional Make a new copy of the data-type object. If ``False``, the result may just be a reference to a built-in data-type object.See also--------result_typeExamples--------Using array-scalar type:>>> np.dtype(np.int16)dtype('int16')Structured type, one field name 'f1', containing int16:>>> np.dtype([('f1', np.int16)])dtype([('f1', '>> np.dtype([('f1', [('f1', np.int16)])])dtype([('f1', [('f1', '>> np.dtype([('f1', np.uint), ('f2', np.int32)])dtype([('f1', '>> np.dtype([('a','f8'),('b','S10')])dtype([('a', '>> np.dtype("i4, (2,3)f8")dtype([('f0', '>> np.dtype([('hello',(np.int,3)),('world',np.void,10)])dtype([('hello', '>> np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)}))dtype(('>> np.dtype({'names':['gender','age'], 'formats':['S1',np.uint8]})dtype([('gender', '|S1'), ('age', '|u1')])Offsets in bytes, here 0 and 25:>>> np.dtype({'surname':('S25',0),'age':(np.uint8,25)})dtype([('surname', '|S25'), ('age', '|u1')])Methods: newbyteorder -- newbyteorder(new_order='S')

dt = np.dtype([('Name', 'S10'), ('Age', 'i4'), ('Height', 'f'), ('Children/Pets', 'i4', 2)])s = np.array([('Smith', 45, 1.83, (0, 1)), ('Jones', 53, 1.72, (2, 2))], dtype=dt)s

array([(b'Smith', 45, 1.83000004, [0, 1]), (b'Jones', 53, 1.72000003, [2, 2])], dtype=[('Name', 'S10'), ('Age', '

s['Name']

array([b'Smith', b'Jones'], dtype='|S10')

s['Age']

array([45, 53])

s["Height"].mean()

1.7750001

s[1]

(b'Jones', 53, 1.72000003, [2, 2])

s[1]['Age']

53

代码向量化

r = np.array([[1,2,3],[2,3,4],[3,4,5],[4,5,6]])s = np.array([[2,3,4],[3,4,5],[4,5,6],[6,7,8]])

简单的数学运算

r + s

array([[ 3, 5, 7], [ 5, 7, 9], [ 7, 9, 11], [10, 12, 14]])

r * s

array([[ 2, 6, 12], [ 6, 12, 20], [12, 20, 30], [24, 35, 48]])

r % s

array([[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]], dtype=int32)

s // r

array([[2, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=int32)

支持广播

更多内容参考​​javascript:void(0)​​

r

array([[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]])

2 * r + 3

array([[ 5, 7, 9], [ 7, 9, 11], [ 9, 11, 13], [11, 13, 15]])

f = np.array([9,8,7])f

array([9, 8, 7])

r + f

array([[10, 10, 10], [11, 11, 11], [12, 12, 12], [13, 13, 13]])

# r.transpose() 转置np.shape(r.T)

(3, 4)

def f(x): return 3 * x + 5

f(r.T)

array([[ 8, 11, 14, 17], [11, 14, 17, 20], [14, 17, 20, 23]])

np.sin(r)

array([[ 0.84147098, 0.90929743, 0.14112001], [ 0.90929743, 0.14112001, -0.7568025 ], [ 0.14112001, -0.7568025 , -0.95892427], [-0.7568025 , -0.95892427, -0.2794155 ]])

np.sin(np.pi)

1.2246467991473532e-16

ufunc

​​Layout(内存布局)

x = np.random.standard_normal((5, 10000000))y = 2 * x + 3 # linear equation y = a * x + bC = np.array((x, y), order='C')F = np.array((x, y), order='F')x = 0.0; y = 0.0 # memory clean-up

C[:2].round(2)

array([[[ 0.67, 0.29, 1.54, ..., 0.07, 2.64, -0.65], [ 0.4 , -0.63, 1.43, ..., 1.11, 0.93, -0.52], [-0.41, 2.23, -1.16, ..., -1.66, 0.07, 0.21], [ 1.46, 1.22, 0.2 , ..., -0.56, 2.36, -1.65], [-0.39, 1.73, -0.24, ..., -1.45, 0.43, -0.41]], [[ 4.34, 3.58, 6.08, ..., 3.15, 8.28, 1.69], [ 3.79, 1.73, 5.86, ..., 5.22, 4.87, 1.97], [ 2.17, 7.46, 0.67, ..., -0.32, 3.15, 3.42], [ 5.93, 5.44, 3.4 , ..., 1.89, 7.72, -0.3 ], [ 2.22, 6.46, 2.51, ..., 0.1 , 3.85, 2.18]]])

%timeit C.sum()

135 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit F.sum()

134 ms ± 499 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

加总数组元素时,两种内存布局没有显著差异。但是,考虑以下情况便会有显著的差异。

%timeit C[0].sum(axis=0)

128 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit C[0].sum(axis=1)

66.5 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit F.sum(axis=0)

1.06 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit F.sum(axis=1)

2.12 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

F = 0.0; C = 0.0 # memory clean-up

从上面可以看出: 在少量大型的向量上的操作比在大量小型向量上性能好。 少量大型向量的元素保存在相邻的内存位置上,这可以解释相对的性能优势。 但是,与类C语言变种相比,整体操作要慢得多。

选择合适的内存布局,可将代码执行速度提高2个以上的数量级。

结语:

基本数据类型(整数,浮点数,字符串)提供了原始数据类型。标准数据结构(元组,列表,字典,集合类)提供了对数据集的各种操作。数组(numpy.ndarray类)提供了代码的向量化操作,使得代码变得更加简洁、方便、高性能。

值得参考的资料:

Python入门必备:​​http://python.org/doc/​​NumPy使用帮助文件:​​http://docs.scipy.org/doc/​​SciPy讲义:​​http://scipy-lectures.org/index.html​​

探寻有趣之事!


版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:资源整合(资源整合公司怎么注册)
下一篇:JAVA 流程控制专项精讲
相关文章

 发表评论

暂时没有评论,来抢沙发吧~