GPU Support

GPU Support#

A crucial feature of PyTorch is the support of GPUs (Graphics Processing Unit). A GPU can perform many thousands of small operations in parallel, making it very well suitable for performing large matrix operations in neural networks, with an acceleration factor of 10x to 100x compared to a CPU. PyTorch implements a lot of functionality for supporting GPUs, including those of NVIDIA due to CUDA drivers and those of Apple due to Metal drivers.

import torch

Backends#

PyTorch can be configured to use different backends for the computation. The default backend is the CPU, but PyTorch also provides support for GPUs and other hardware accelerators. The available backends are the following.

CUDA - Devices compatible with the CUDA programming framework, such as NVIDIA GPUs running on Windows and Linux.
MPS - Devices compatible with the Metal programming framework, such as GPUs running on MacOS.
And more…

Checking for CUDA#

Let’s check whether you have a Nvidia GPU available on your Windows/Linux computer.

torch.cuda.is_available()

True

If the above command returns False, the reason could be one of the following.

You don’t have a Nvidia GPU.
You don’t have the correct version of CUDA installed.
You don’t have the correct version of PyTorch installed.

Refer to the PyTorch website for the correct version of PyTorch and CUDA to install, in case you have a Nvidia GPU.

if not torch.cuda.is_available():
    if not torch.backends.cuda.is_built():
        print("PyTorch was not built with CUDA support")
    else:
        print("No Nvidia GPU on this machine and/or CUDA is not installed properly")

Checking for MPS#

Let’s check whether you have a MPS GPU available on your MacOS computer.

torch.mps.is_available()

True

If the above command returns False, the reason could be one of the following.

You don’t have a MPS GPU.
PyTorch was not built with MPS enabled.
The current MacOS version is lower than 12.3.

if not torch.mps.is_available():
    if not torch.backends.mps.is_built():
        print("PyTorch was not built with MPS enabled.")
    else:
        print("The current MacOS version is not 12.3+ and/or you do not have an MPS-enabled device on this machine.")

Using a GPU#

By default, all tensors you create are stored on the CPU. You can push a tensor to the GPU by using the function .to(...), where the argument is the device you want to use. It is a good practice to define a device object in your code which points to the GPU if you have one, and otherwise to the CPU. Then, you can write a device-agnostic code that will run on the correct device based on the availability. Let’s try it below.

device = torch.device("cuda" if torch.cuda.is_available() else 
                      "mps"  if torch.mps.is_available() else
                      "cpu")

print(f"Using device:", device)

Using device: mps

Now let’s create a tensor and push it to the device.

x = torch.zeros(2, 3)
x = x.to(device)

print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.]], device='mps:0')

Alternatively, you can also use the device argument while creating a tensor.

y = torch.ones(2, 3, device=device)

print(y)

tensor([[1., 1., 1.],
        [1., 1., 1.]], device='mps:0')

In case you have a GPU, you should now see the attribute device='cuda:0' or device='mps:0' being printed next to your tensor. The zero indicates that this is the first GPU device on your computer. PyTorch also supports multi-GPU systems, but you will only need it once you have very big networks to train.

Setting the seed on all devices#

When generating random numbers, the seed between CPU and GPU is not synchronized. Hence, we need to set the seed on the GPU separately to ensure a reproducible code. Note that due to different GPU architectures, running the same code on different GPUs does not guarantee the same random numbers. Still, we don’t want that our code gives us a different output every time we run it on the exact same hardware. Hence, we also set the seed on the GPU.

# Setting the seed on the CPU
torch.manual_seed(42)

# GPU operations have a separate seed we also want to set
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)
    torch.cuda.manual_seed_all(42)

    # Some operations on a GPU are implemented stochastic for efficiency
    # We want to ensure that all operations are deterministic on GPU
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# MPS operations have a separate seed we also want to set
if torch.mps.is_available():
    torch.mps.manual_seed(42)

Speed comparison#

Let’s compare the runtime of a large matrix multiplication on the CPU with a operation on the GPU. Depending on the size of the operation and the CPU/GPU in your system, the speedup of this operation can be >50x. As matmul operations are very common in neural networks, we can already see the great benefit of training a NN on a GPU. The time estimate can be relatively noisy here because we haven’t run it for multiple times.

import time

x_cpu = torch.randn(5000, 5000)

## CPU version
start_time = time.time()
_ = torch.matmul(x_cpu, x_cpu)
end_time = time.time()

print(f"CPU time: {(end_time - start_time):6.5f}s")

CPU time: 0.80048s

## GPU version
x_cuda = x_cpu.to(device)
_ = torch.matmul(x_cuda, x_cuda)  # First operation to 'burn in' GPU

# CUDA is asynchronous, so we need to use different timing functions
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
_ = torch.matmul(x_cuda, x_cuda)
end.record()
torch.cuda.synchronize()  # Waits for everything to finish running on the GPU

print(f"GPU time: {0.001 * start.elapsed_time(end):6.5f}s")  # Milliseconds to seconds

GPU time: 0.02454s

## MPS version
x_mps = x_cpu.to("mps")
_ = torch.matmul(x_mps, x_mps)  # First operation to 'burn in' MPS

# MPS is asynchronous, so we need to use different timing functions
start = torch.mps.Event(enable_timing=True)
end = torch.mps.Event(enable_timing=True)

start.record()
_ = torch.matmul(x_mps, x_mps)
end.record()
torch.mps.synchronize()  # Waits for everything to finish running on the MPS device

print(f"MPS time: {0.001 * start.elapsed_time(end):6.5f}s")  # Milliseconds to seconds

MPS time: 0.09917s

Conclusion#

In this tutorial, we have seen how to check for GPU support, how to push tensors to the GPU, and how to set the seed on the GPU. We have also seen that the GPU can be much faster than the CPU for large matrix operations.