pytorch 自动混合精度训练（pytorch是什么）-eolink官网

pytorch 自动混合精度训练（pytorch是什么）

1 torch.cuda.amp混合精度训练2 Autocasting

2.1 torch.autocast2.2 torch.cuda.amp.autocast

3 Gradient Scaling

3.1 使用案例

1 torch.cuda.amp混合精度训练

混合精度训练提供了自适应的float32(单精度)与float16(半精度)数据适配，我们必须同时使用 torch.autocast and torch.cuda.amp.GradScaler 才能起到作用。然而，torch.autocast和GradScaler是模块化的，如果需要可以单独使用。混合精度的原理参考：

2 Autocasting

2.1 torch.autocast

torch.autocast(device_type, enabled=True, **kwargs)

上下文管理器或装饰器autocast的实例，允许脚本区域以混合精度训练。在这些区域中，ops 在 autocast 选择的特定于 op 的 dtype 中运行，以在保持准确性的同时提高性能。有关详细信息，请参阅Autocast Op 参考。autocast应该只包装网络的前向传递，包括损失计算。不推荐包含反向传播。后向操作与autocast在前向过程中操作的类型相同。

有两种形式可以实现autocast:

上下文管理器

# Creates model and optimizer in default precisionmodel = Net().cuda()optimizer = optim.SGD(model.parameters(), ...)for input, target in data: optimizer.zero_grad() # Enables autocasting for the forward pass (model + loss) with autocast(): output = model(input) loss = loss_fn(output, target) # Exits the context manager before backward() loss.backward() optimizer.step()

装饰器

class AutocastModel(nn.Module): ... @autocast() def forward(self, input): ...

在autocast区域的代码会将部分张量精度转为float16，如果直接使用这些张量进行计算，可能会报错。所以离开这个区域后我们尽量将其转换回float32再进行计算！

# Creates some tensors in default dtype (here assumed to be float32)a_float32 = torch.rand((8, 8), device="cuda")b_float32 = torch.rand((8, 8), device="cuda")c_float32 = torch.rand((8, 8), device="cuda")d_float32 = torch.rand((8, 8), device="cuda")with autocast(): # torch.mm is on autocast's list of ops that should run in float16. # Inputs are float32, but the op runs in float16 and produces float16 output. # No manual casts are required. e_float16 = torch.mm(a_float32, b_float32) # Also handles mixed input types f_float16 = torch.mm(d_float32, e_float16)# After exiting autocast, calls f_float16.float() to use with d_float32g_float32 = torch.mm(d_float32, f_float16.float())

autocast(enabled=False)子区域可以嵌套在autocast的区域中。例如，如果您想强制子区域在特定的dtype. 禁用自动转换使您可以显式控制执行类型。

# Creates some tensors in default dtype (here assumed to be float32)a_float32 = torch.rand((8, 8), device="cuda")b_float32 = torch.rand((8, 8), device="cuda")c_float32 = torch.rand((8, 8), device="cuda")d_float32 = torch.rand((8, 8), device="cuda")with autocast(): e_float16 = torch.mm(a_float32, b_float32) with autocast(enabled=False): # Calls e_float16.float() to ensure float32 execution # (necessary because e_float16 was created in an autocasted region) f_float32 = torch.mm(c_float32, e_float16.float()) # No manual casts are required when re-entering the autocast-enabled region. # torch.mm again runs in float16 and produces float16 output, regardless of input types. g_float16 = torch.mm(d_float32, f_float32)

参数说明

device_type(string,required) -- 是使用“cuda”还是“cpu”设备enabled(bool,optional,default=True) -- 是否应该在区域中启用autocast。dtype(torch_dpython:type,optional) -- 是使用 torch.float16 还是 torch.bfloat16。cache_enabled(bool,optional,default=True) -- 是否应该启用自动转换中的权重缓存。

2.2 torch.cuda.amp.autocast

相当于torch.autocast("cuda", args...)

3 Gradient Scaling

如果特定操作的前向传递具有float16输入，则该操作的反向传递将产生float16梯度。小幅值的梯度值可能无法在float16中表示。这些值将刷新为零（“下溢”），因此相应参数的更新将丢失。为了防止下溢，“梯度缩放”将网络的损失乘以比例因子，并在缩放的损失上调用反向传递。然后通过相同的因子缩放通过网络向后流动的梯度。换句话说，梯度值的幅度更大，因此它们不会刷新为零。在优化器更新参数之前，每个参数的梯度（.grad属性）都应该是未缩放的，因此缩放因子不会干扰学习率。

torch.cuda.amp.GradScaler(init_scale=65536.0, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000, enabled=True)

GradScaler有两个关键的方法：

GradScaler.step(optimizer)

内部调用unscale_(optimizer)（除非unscale_()在迭代早期明确调用）。optimizer作为unscale_()的一部分，梯度会检查 infs/NaNs;如果没有找到 inf/NaN 梯度，则optimizer.step()使用未缩放的梯度调用。否则，optimizer.step()将跳过以避免损坏参数。

update(new_scale=None)更新比例因子。如果跳过任何优化器步骤，则将比例乘以backoff_factor 以减少它。如果growth_interval未跳过的迭代连续发生，则将比例乘以growth_factor增加它。通过new_scale手动设置新的比例值。（new_scale不直接使用，它用于填充 GradScaler 的内部尺度张量。因此，如果 new_scale是张量，则稍后对该张量的就地更改不会进一步影响 GradScaler 内部使用的尺度。）

3.1 使用案例

model = Net().cuda()optimizer = optim.SGD(model.parameters(), ...)scaler = amp.GradScaler(enabled=True)for input, target in data: optimizer.zero_grad() # Enables autocasting for the forward pass (model + loss) with autocast(): output = model(input) loss = loss_fn(output, target) # Exits the context manager before backward() # Backward scaler.scale(loss).backward() scaler.step(optimizer) # optimizer.step scaler.update()

清澈的爱，只为中国

使用SoapUI测试webservice接口详细步骤

1222 2022-08-31

pytorch 自动混合精度训练（pytorch是什么）

Gointerface接口声明实现及作用详解

使用SoapUI测试webservice接口详细步骤

使用SpringBoot实现API接口

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

遇到百度网址安全中心提醒您该页面可能存在钓鱼欺诈信息

软件接口设计怎么做？前后端分离软件接口设计思路

pytorch 自动混合精度训练（pytorch是什么）

微信扫一扫：分享

推荐文章

最近发表

热评文章