分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation（mpi python）-eolink官网

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation（mpi python）

如题：

无意中发现了一个Python下的mpi教程《A Python Introduction to Parallel Programming with MPI 1.0.2 documentation》

地址如下：

Communication

The Trapezoidal Rule

关于这个梯形规则，推荐资料：

trapSerial.py# example to run: python trapSerial.py 0.0 1.0 10000import numpyimport sysimport time# takes in command-line arguments [a,b,n]a = float(sys.argv[1])b = float(sys.argv[2])n = int(sys.argv[3])def f(x): return x * xdef integrateRange(a, b, n): '''Numerically integrate with the trapezoid rule on the interval from a to b with n trapezoids. ''' integral = -(f(a) + f(b)) / 2.0 # n+1 endpoints, but n trapazoids #for x in numpy.linspace(a, b, n + 1): # integral = integral + f(x) integral = integral + numpy.sum( f(numpy.linspace(a, b, n + 1)) ) integral = integral * (b - a) / n return integralbegin_time = time.time()integral = integrateRange(a, b, n)end_time = time.time()print("With n =", n, "trapezoids, our estimate of the integral\from", a, "to", b, "is", integral)print("total run time :", end_time - begin_time)

该代码为单机代码，在原始代码基础上改进为向量计算，进一步提高运算的效率。

改进后的mpi代码：

# trapParallel_1.py# example to run: mpiexec -n 4 python trapParallel_1.py 0.0 1.0 10000import numpyimport sysimport timefrom mpi4py import MPIfrom mpi4py.MPI import ANY_SOURCEcomm = MPI.COMM_WORLDrank = comm.Get_rank()size = comm.Get_size()# takes in command-line arguments [a,b,n]a = float(sys.argv[1])b = float(sys.argv[2])n = int(sys.argv[3])# we arbitrarily define a function to integratedef f(x): return x * x# this is the serial version of the trapezoidal rule# parallelization occurs by dividing the range among processesdef integrateRange(a, b, n): integral = -(f(a) + f(b)) / 2.0 # n+1 endpoints, but n trapazoids # for x in numpy.linspace(a, b, n + 1): # integral = integral + f(x) integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1))) integral = integral * (b - a) / n return integral# local_n is the number of trapezoids each process will calculate# note that size must divide nlocal_n = int(n / size)# h is the step size. n is the total number of trapezoidsh = (b - a) / (local_n*size)# we calculate the interval that each process handles# local_a is the starting point and local_b is the endpointlocal_a = a + rank * local_n * hlocal_b = local_a + local_n * h# initializing variables. mpi4py requires that we pass numpy objects.recv_buffer = numpy.zeros(size)if rank == 0: begin_time = time.time()# perform local computation. Each process integrates its own intervalintegral = integrateRange(local_a, local_b, local_n)# communication# root node receives results from all processes and sums themif rank == 0: recv_buffer[0] = integral for i in range(1, size): comm.Recv(recv_buffer[i:i+1], ANY_SOURCE) total = numpy.sum(recv_buffer)else: # all other process send their result comm.Send(integral, dest=0)# root process prints resultsif comm.rank == 0: end_time = time.time() print("With n =", n, "trapezoids, our estimate of the integral from" \ , a, "to", b, "is", total) print("total run time :", end_time - begin_time) print("total size: ", size)

运行命令：

mpiexec -np 4 python trapSerial_1.py 0 1000000 100000000

上面改进的代码本身也实现了原文中所提到的计算负载均衡的问题，不过上面的改进方法是通过修改总的切分个数，从而实现总的切分个数可以被运行进数所整除。

假设我们总共要切分的数量为1099，但是我们要进行计算的进程数量为100，那么每个进程需要分配多少切分数来进行计算呢，下面给出另一种改进方式，在改变总切分数量的前提下使每个进程所负责计算的切分数均为平均。

改进代码：

# trapParallel_2.py# example to run: mpiexec -n 4 python trapParallel_1.py 0.0 1.0 10000import numpyimport sysimport timefrom mpi4py import MPIfrom mpi4py.MPI import ANY_SOURCEcomm = MPI.COMM_WORLDrank = comm.Get_rank()size = comm.Get_size()# takes in command-line arguments [a,b,n]a = float(sys.argv[1])b = float(sys.argv[2])n = int(sys.argv[3])# we arbitrarily define a function to integratedef f(x): return x * x# this is the serial version of the trapezoidal rule# parallelization occurs by dividing the range among processesdef integrateRange(a, b, n): integral = -(f(a) + f(b)) / 2.0 # n+1 endpoints, but n trapazoids # for x in numpy.linspace(a, b, n + 1): # integral = integral + f(x) integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1))) integral = integral * (b - a) / n return integral# h is the step size. n is the total number of trapezoidsh = (b - a) / n# local_n is the number of trapezoids each process will calculate# note that size must divide nlocal_n = numpy.zeros(size, dtype=numpy.int32)local_n[:] = n // sizeif n%size!=0: local_n[-(n%size):] += 1# we calculate the interval that each process handles# local_a is the starting point and local_b is the endpointlocal_a = numpy.sum(local_n[:rank]) * hlocal_b = local_a + local_n[rank] * h# initializing variables. mpi4py requires that we pass numpy objects.recv_buffer = numpy.zeros(size)if rank == 0: begin_time = time.time()# perform local computation. Each process integrates its own intervalintegral = integrateRange(local_a, local_b, local_n[rank])# communication# root node receives results from all processes and sums themif rank == 0: recv_buffer[0] = integral for i in range(1, size): comm.Recv(recv_buffer[i:i+1], ANY_SOURCE) total = numpy.sum(recv_buffer)else: # all other process send their result comm.Send(integral, dest=0)# root process prints resultsif comm.rank == 0: end_time = time.time() print("With n =", n, "trapezoids, our estimate of the integral from" \ , a, "to", b, "is", total) print("total run time :", end_time - begin_time) print("total size: ", size)

计算负载均衡的核心代码为：

# h is the step size. n is the total number of trapezoidsh = (b - a) / n# local_n is the number of trapezoids each process will calculate# note that size must divide nlocal_n = numpy.zeros(size, dtype=numpy.int32)local_n[:] = n // sizeif n%size!=0: local_n[-(n%size):] += 1# we calculate the interval that each process handles# local_a is the starting point and local_b is the endpointlocal_a = numpy.sum(local_n[:rank]) * hlocal_b = local_a + local_n[rank] * h

运行命令：

mpiexec --oversubscribe -np 100 python trapSerial_2.py 0 1000000 1099

最后的改进方法更好的实现了计算的负载均衡。

=================================================

上面的改进方法对应集体通信的话又该如何改进呢？？？

Collective Communication

The Parallel Trapezoidal Rule 2.0

改进方法1对应的 trapParallel_1.py 改进：

# trapParallel_1.py# example to run: mpiexec -n 4 python26 trapParallel_2.py 0.0 1.0 10000import numpyimport sysimport timefrom mpi4py import MPIfrom mpi4py.MPI import ANY_SOURCEcomm = MPI.COMM_WORLDrank = comm.Get_rank()size = comm.Get_size()# takes in command-line arguments [a,b,n]a = float(sys.argv[1])b = float(sys.argv[2])n = int(sys.argv[3])# we arbitrarily define a function to integratedef f(x): return x * x# this is the serial version of the trapezoidal rule# parallelization occurs by dividing the range among processesdef integrateRange(a, b, n): integral = -(f(a) + f(b)) / 2.0 # n+1 endpoints, but n trapazoids #for x in numpy.linspace(a, b, n + 1): # integral = integral + f(x) integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1))) integral = integral * (b - a) / n return integral# local_n is the number of trapezoids each process will calculate# note that size must divide nlocal_n = int(n / size)# h is the step size. n is the total number of trapezoidsh = (b - a) / (local_n*size)# we calculate the interval that each process handles# local_a is the starting point and local_b is the endpointlocal_a = a + rank * local_n * hlocal_b = local_a + local_n * h# initializing variables. mpi4py requires that we pass numpy objects.#integral = numpy.zeros(1)total = numpy.zeros(1)if rank == 0: begin_time = time.time()# perform local computation. Each process integrates its own intervalintegral = integrateRange(local_a, local_b, local_n)# communication# root node receives results with a collective "reduce"comm.Reduce(integral, total, op=MPI.SUM, root=0)# root process prints resultsif comm.rank == 0: end_time = time.time() print("With n =", n, "trapezoids, our estimate of the integral from" \ , a, "to", b, "is", total) print("total run time :", end_time - begin_time) print("total size: ", size)

运行命令：

mpiexec --oversubscribe -np 100 python trapSerial_1.py 0 1000000 1099

改进方法2 对应的 trapParallel_2.py 改进：

# trapParallel_2.py# example to run: mpiexec -n 4 python26 trapParallel_2.py 0.0 1.0 10000import numpyimport sysimport timefrom mpi4py import MPIfrom mpi4py.MPI import ANY_SOURCEcomm = MPI.COMM_WORLDrank = comm.Get_rank()size = comm.Get_size()# takes in command-line arguments [a,b,n]a = float(sys.argv[1])b = float(sys.argv[2])n = int(sys.argv[3])# we arbitrarily define a function to integratedef f(x): return x * x# this is the serial version of the trapezoidal rule# parallelization occurs by dividing the range among processesdef integrateRange(a, b, n): integral = -(f(a) + f(b)) / 2.0 # n+1 endpoints, but n trapazoids #for x in numpy.linspace(a, b, n + 1): # integral = integral + f(x) integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1))) integral = integral * (b - a) / n return integral# h is the step size. n is the total number of trapezoidsh = (b - a) / n# local_n is the number of trapezoids each process will calculate# note that size must divide nlocal_n = numpy.zeros(size, dtype=numpy.int32)local_n[:] = n // sizeif n%size!=0: local_n[-(n%size):] += 1# we calculate the interval that each process handles# local_a is the starting point and local_b is the endpointlocal_a = numpy.sum(local_n[:rank]) * hlocal_b = local_a + local_n[rank] * h# initializing variables. mpi4py requires that we pass numpy objects.#integral = numpy.zeros(1)total = numpy.zeros(1)if rank == 0: begin_time = time.time()# perform local computation. Each process integrates its own intervalintegral = integrateRange(local_a, local_b, local_n[rank])# communication# root node receives results with a collective "reduce"comm.Reduce(integral, total, op=MPI.SUM, root=0)# root process prints resultsif comm.rank == 0: end_time = time.time() print("With n =", n, "trapezoids, our estimate of the integral from" \ , a, "to", b, "is", total) print("total run time :", end_time - begin_time) print("total size: ", size)

运行命令：

mpiexec --oversubscribe -np 100 python trapSerial_2.py 0 1000000 1099

c#自定义Attribute获取接口实现示例代码

301 2022-09-03

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation（mpi python）

Flask接口签名sign原理与实例代码浅析

zookeeper python接口实例详解

c#自定义Attribute获取接口实现示例代码

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

软件接口设计怎么做？前后端分离软件接口设计思路

接口管理平台推荐，几大接口管理平台总有一款适合你！