torch.cuda.empty_cache无法释放显存的原因

news/2024/11/24 11:33:50/

众所周知，Pytorch中简单地del某个tensor并不能释放掉该tensor占用的显存，还需要搭配torch.cuda.empty()来进行。但是，这也有例外。比如，如果该tensor在一个依然alive的计算图中，就无法被释放显存。下面是示例：

import torchdef active_bytes():stats = torch.cuda.memory_stats()current_active_byte =  stats['active_bytes.all.current']return current_active_byte# initial usage
print("Init usage {}". format(active_bytes()))# vanilla tensor
x = torch.randn((256, 128), device='cuda')
w = torch.randn((128, 512), device='cuda')
l = torch.matmul(x, w)
print("Vanilla tensor {}". format(active_bytes()))del x
print("Vanilla tensor: del x {}". format(active_bytes()))
del w
print("Vanilla tensor: del w {}". format(active_bytes()))
l = l.cpu()
print("Vanilla tensor: l = l.cpu() {}". format(active_bytes()))# requires_grad=True
x = torch.randn((256, 128), device='cuda', requires_grad=True)
w = torch.randn((128, 512), device='cuda', requires_grad=True)
l = torch.matmul(x, w)
print("requires_grad=True {}". format(active_bytes()))del x
print("requires_grad=True: del x {}". format(active_bytes()))
del w
print("requires_grad=True: del w {}". format(active_bytes()))
l = l.cpu()
print("requires_grad=True: l = l.cpu() {}". format(active_bytes()))

以上代码的运行结果如下：

Init usage 0
Vanilla tensor 917504
Vanilla tensor: del x 786432
Vanilla tensor: del w 524288
Vanilla tensor: l = l.cpu() 0
requires_grad=True 917504
requires_grad=True: del x 917504
requires_grad=True: del w 917504
requires_grad=True: l = l.cpu() 393216

可以看到：在设置叶子节点x、w的requires_grad=True之后，del并不能释放x、w所占用的显存。我的理解是：w、x均需参与backward的过程，属于计算图的一部分，而此时计算图仍然“alive”，所以无法释放w、x的显存。那么，我把计算图整个移走可不可以呢？答案是：不行。Pytorch的计算图一旦初始化（即调用了forward），就不会改变位置。也就是说，如果你在某块GPU上进行了forward操作，你就只能在这块GPU上进行backward。

这种机制导致了一个很现实的状况：你在训练模型前最好提前估算好模型占用的显存规模，乖乖地选择一块显存足够用的GPU（nvidia狂喜）。

torch.cuda.empty_cache无法释放显存的原因

相关文章

torch.cuda.empty_cache()运行时主动清理GPU memory

CUDA——L1缓存全局内存加载

pytorch的显存机制torch.cuda.empty_cache()

【pytorch】torch.cuda.empty_cache()==＞释放缓存分配器当前持有的且未占用的缓存显存

为什么一打开Adobe Creative Cloud 桌面上就会出现一个GPUCache文件夹

Tensorflow100%占用GPU内存

显存优化 | Pytorch的显存机制torch.cuda.empty_cache及周边概念

9. CUDA shared memory使用------GPU的革命