Tensors#

Tensors are the fundamental data structure in PyTorch, similar to NumPy arrays but with enhanced capabilities. This tutorial will guide you through the basics capabilities of tensors, helping you transition from NumPy to PyTorch. As a prerequisite, we recommend to be familiar with the NumPy package, as the core of PyTorch is based on very similar concepts. So, let’s start with importing PyTorch.

import torch

Creation#

Tensors are multidimensional arrays that can be used to represent scalars, vectors, matrices, and even higher-dimensional arrays. You can create tensors from Python lists, NumPy arrays, and other PyTorch tensors. Most common functions you know from numpy can be used on tensors as well. Actually, since numpy arrays are so similar to tensors, we can convert most tensors to numpy arrays and back.

Initialization#

Let’s first start by looking at different ways of creating a tensor. There are many possible options, the simplest one is to call torch.FloatTensor passing the desired shape as input argument. This will create a tensor of float values with the specified shape. You can also create tensors of different types, such as torch.IntTensor or torch.BoolTensor. (Note that torch.Tensor is an alias for torch.FloatTensor.)

x = torch.FloatTensor(2, 3, 4)

print(x)
tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

The tensor constructors (torch.FloatTensor, torch.IntTensor, torch.BoolTensor, …) allocate memory for the desired shape, but reuse any values that have already been in the memory. Their use is discouraged in favor of the following factory methods, which provide more options like specifying the the data type.

  • torch.empty - Creates a tensor with uninitialized memory

  • torch.zeros - Creates a tensor filled with zeros

  • torch.ones - Creates a tensor filled with ones

  • torch.rand - Creates a tensor with random values uniformly sampled between 0 and 1

  • torch.randn - Creates a tensor with random values sampled from a normal distribution

  • torch.arange - Creates a tensor containing the values \(N,N+1,N+2,...,M\)

  • torch.tensor - Creates a tensor from the provided list, always copying the data (see here)

  • torch.as_tensor - Creates a tensor from the provided list, sharing the memory with the list

Let’s see some examples.

# Create a tensor filled with zeros with the shape [2, 3, 4]
x = torch.zeros(2, 3, 4)

# Create a tensor from a (nested) list
x = torch.tensor([[1, 2], [3, 4]])

# Create a tensor with random values between 0 and 1 with the shape [2, 3, 4]
x = torch.rand(2, 3, 4)

Note

PyTorch documentation recommends using factory methods over tensor constructors.

Shape#

You can obtain the shape of a tensor in the same way as in numpy (x.shape) or using the method x.size(). Both will return a torch.Size object (subclass of tuple) containing the dimensions of the tensor.

x = torch.zeros(2, 3, 4)

# Numpy style
shape = x.shape

# PyTorch style
size = x.size()

# Both return the dimensions of the tensor
dim1, dim2, dim3 = x.shape
dim1, dim2, dim3 = x.size()

print("Shape:", x.shape)
print(" Size:", size)
print(" Dims:", dim1, dim2, dim3)
Shape: torch.Size([2, 3, 4])
 Size: torch.Size([2, 3, 4])
 Dims: 2 3 4

Conversion from/to NumPy#

You can easily convert a PyTorch tensor to a NumPy array and vice versa. By default, the NumPy array and PyTorch tensor share the same memory location, so changing one will change the other. This is useful when you want to use a library that only accepts NumPy arrays and you have a PyTorch tensor.

The conversion from a numpy array to a tensor is done by calling the function torch.from_numpy().

import numpy as np

array = np.array([[1, 2], [3, 4]])
tensor = torch.from_numpy(array)

The conversion from a tensor to a numpy array is done by calling the method .numpy() on the tensor.

tensor = torch.arange(4)
array  = tensor.numpy()

In both cases, the NumPy array and the PyTorch tensor will share their storage.

Note

The PyTorch to NumPy conversion is performed only if the tensor is on the CPU, does not require grad, does not have its conjugate bit set, and is a dtype and layout that NumPy supports.

Setting the optional argument force=True in .numpy() will force the conversion to be performed even if the tensor does not meet the above requirements. In this case, if the tensor is not on the CPU or the conjugate or negative bit is set, it will not share its storage with the returned NumPy array.

Reshaping#

You can reshape a tensor using the method .view() or reshape(). These methods return a new tensor with the same data but different shape. For example, a tensor of size (2,3) can be re-organized to any other shape with the same number of elements, that is a tensor of size (6), or (3,2), etc.

x = torch.arange(6)

y = x.reshape(2, 3)

print(x, '\n')
print(y)
tensor([0, 1, 2, 3, 4, 5]) 

tensor([[0, 1, 2],
        [3, 4, 5]])

A reshape operation will always try to reuse the same memory location if possible. This means that the data is not copied, and any changes to the tensor returned by a reshape operation will affect the original tensor and vice versa. If you want to avoid this, you can use the method .clone() to create a copy of the tensor.

Note

When performing a reshape, one of the provided dimensions can be -1, in which case its size is inferred to keep the number of elements the same.

Transpose#

You can transpose a tensor using the method .t() or .transpose(). The method .t() will return the transpose of a 2D tensor, while the method .transpose() can be used to swap any two dimensions of a multidimensional tensor. If input is a strided tensor, then the transposed tensor shares the same memory location, so changing the content of one would change the content of the other.

x = torch.randn(2, 3)

y = x.t() # Only works for 2D tensors

z = x.transpose(0, 1) # Works for any number of dimensions

print(x, '\n')
print(y, '\n')
print(z)
tensor([[-0.3375,  0.5714,  1.5208],
        [-1.0878,  0.6243, -0.9859]]) 

tensor([[-0.3375, -1.0878],
        [ 0.5714,  0.6243],
        [ 1.5208, -0.9859]]) 

tensor([[-0.3375, -1.0878],
        [ 0.5714,  0.6243],
        [ 1.5208, -0.9859]])

Permutation#

You can permute the dimensions of a tensor using the method .permute(). This method will not change the memory location of the permuted tensor, so changing its content will affect the original tensor and vice versa.

x = torch.randn(2, 3, 5)

y = x.permute(2, 0, 1)

print(x.shape)
print(y.shape)
torch.Size([2, 3, 5])
torch.Size([5, 2, 3])

View vs Copy#

All the operations presented above change the shape of a tensor. They will try to avoid copying the data whenever possible, but sometimes a copy is necessary. To understand when a copy is made, you need to understand the concept of contiguous memory.

Note

A tensor is contiguous when its elements are stored in memory in the same order as they appear when iterating over the tensor. This happens naturally when a tensor is created or reshaped without altering the original memory layout.

You can check if a tensor is contiguous by calling the method .is_contiguous().

x = torch.randn(2, 3, 4)
y = x.reshape(2, 12)

print('Is x contiguous?', x.is_contiguous())
print('Is y contiguous?', y.is_contiguous())
Is x contiguous? True
Is y contiguous? True

Certain operations like transpose or permute can rearrange the way data is accessed without changing the underlying memory layout. This makes the tensor non-contiguous.

x = torch.randn(2, 3, 4)
t = x.transpose(0, 2)
p = x.permute(0, 2, 1)

print('Is t contiguous?', t.is_contiguous())
print('Is p contiguous?', p.is_contiguous())
Is t contiguous? False
Is p contiguous? False

If a tensor is not contiguous in memory, it is not possible to perform a reshape operation without copying the data. In this case, the method .view() will raise an error, whereas the method .reshape() will return a copy of the tensor with the same data but contiguous in memory.

x = torch.randn(2, 3, 4)
t = x.transpose(0, 2)

t.reshape(6, 4)  # This will make a copy of the data

t.view(6, 4)  # This will raise a RuntimeError
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 6
      2 t = x.transpose(0, 2)
      4 t.reshape(6, 4)  # This will make a copy of the data
----> 6 t.view(6, 4)  # This will raise a RuntimeError

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Operations#

Most operations that exist in numpy, also exist in PyTorch. A full list of operations can be found in the PyTorch documentation. We will review the most common operations here.

Element-wise operations#

Element-wise operations are applied to each element of the tensor independently. For example, the operators +, -, *, /, and ** can be used to perform element-wise addition, subtraction, multiplication, division, and exponentiation, respectively.

x1 = torch.rand(2, 3)
x2 = torch.rand(2, 3)

y = x1 + x2  # New tensor is created

Calling x1 + x2 creates a new tensor containing the sum of the two inputs. However, we can also use in-place operations that are applied directly on the memory of a tensor. In-place operations are usually marked with a underscore postfix, e.g., add_ instead of add. They return the modified tensor to allow for method chaining.

x1.add_(x2)  # x1 is overwritten with x1 + x2, no new tensor is created
tensor([[1.0651, 0.5864, 1.7805],
        [1.3788, 1.1274, 0.6446]])

Similar to Numpy, element-wise operations are also performed by comparison operators ==, !=, >, <, >=, <=, logical operators on boolean arrays &, |, ^, and unary operations like abs(), sqrt(), exp(), log(), and neg().

Reduction operations#

Reduction operations are used to reduce the number of elements in a tensor. For example, the method .sum() will return the sum of all elements in a tensor.

x = torch.tensor([[1, 2], 
                  [3, 4]])

torch.sum(x)
tensor(10)

You can specify the dimension along which the reduction should be performed with the argument dim.

  • dim=0 - The reduction will be performed along the first dimension.

  • dim=1 - The reduction will be performed along the second dimension.

x = torch.tensor([[1, 2], 
                  [3, 4]])

y = torch.sum(x, dim=0)

print(y)
tensor([4, 6])

By default, reduction operations reduce the number of dimensions.

print('Original shape:', x.shape)
print('Shape after sum:', y.shape)
Original shape: torch.Size([2, 2])
Shape after sum: torch.Size([2])

You can keep the original dimensions by setting keepdim=True.

x = torch.tensor([[1, 2], [3, 4]])

y = torch.sum(x, dim=0, keepdim=True)

print('Original shape:', x.shape)
print('Shape after sum:', y.shape)
Original shape: torch.Size([2, 2])
Shape after sum: torch.Size([1, 2])

Note

In reduction operations, NumPy uses the arguments axis and keepdims, whereas PyTorch uses the arguments dim and keepdim.

Matrix operations#

PyTorch provides a wide range of matrix operations, which are essential for neural networks.

  • torch.matmul: Performs the matrix product over two tensors, where the specific behavior depends on the dimensions. If both inputs are matrices (2-dimensional tensors), it performs the standard matrix product. For higher dimensional inputs, the function supports broadcasting (for details see the documentation). It can also be written as a @ b, similar to numpy.

  • torch.mm: Performs the matrix product over two matrices, but doesn’t support broadcasting (see documentation).

  • torch.bmm: Performs the matrix product with a batch dimension. Ginen the first tensor \(T\) (\(b\times n\times m\)), and the second tensor \(R\) (\(b\times m\times p\)), the output \(O\) (\(b\times n\times p\)) is calculated by performing \(b\) matrix multiplications \(O_i = T_i R_i\), where \(T_i\) and \(R_i\) are the \(i\)-th matrices of the input tensors.

  • torch.einsum: Performs matrix multiplications and more (i.e., sums of products) using the Einstein summation convention.

Usually, we use torch.matmul or torch.bmm.

x = torch.tensor([[0, 1, 2], [3, 4, 5]])
y = torch.tensor([[5, 6], [7, 8], [9, 10]])

x @ y # or torch.matmul(x, y)
tensor([[ 25,  28],
        [ 88, 100]])

Indexing#

Indexing in PyTorch allows you to access and manipulate specific elements, slices, or sub-tensors within a tensor. It works similarly to NumPy but includes features designed for deep learning workflows.

Basic indexing#

You can access elements of a tensor using the square bracket notation. For a tensor with \(n\) dimensions, you need to provide \(n\) indices to access a single element. Each index must be within the range of the corresponding dimension. You can also use negative indices to access elements from the end of the tensor. This is pretty much the same as in NumPy.

x = torch.tensor([[0, 1, 2], 
                  [3, 4, 5]])

x[0, 1]
tensor(1)

Slicing#

Slicing in PyTorch allows you to extract a portion of a tensor by specifying ranges for each dimension. This is useful for working with sub-tensors without copying data, enabling efficient manipulation and computation. The general slicing syntax is start:stop:step where start is the index where the slice starts (inclusive), stop is the index where the slice ends (exclusive), and step is the step size between indices. You can omit parts to use their defaults:

  • : selects the entire dimension.

  • start: slices from the index start to the end of the dimension.

  • :stop slices from 0 to the index stop.

  • ::step selects the entire dimension with a step size of step.

x = torch.tensor([[1, 2, 3], 
                  [4, 5, 6], 
                  [7, 8, 9]])

a = x[:2, :]  # Get the first two rows and all columns
b = x[:, 1]   # Get the second column
c = x[::2]    # Get the first and third row

Slices do not copy the data, but return a view of the original tensor, so they can be assigned new values to modify the original tensor.

x[1:, 1:] = 0

print(x)
tensor([[1, 2, 3],
        [4, 0, 0],
        [7, 0, 0]])

The ellipsis ... simplifies slicing for higher dimensions. It expands to as many colons as needed to represent the remaining dimensions. For example, if you have a 4D tensor, x[..., 0] is equivalent to x[:, :, :, 0].

Boolean masking#

Boolean masking allows you to access elements of a tensor that satisfy a certain condition. This works in the following way.

  • You create a tensor of boolean values, called the mask, with True values indicating the elements to be selected. The mask tensor must have the same shape as the tensor that you want to access.

  • You use the mask tensor to index the original tensor, using the square bracket notation result = tensor[mask].

  • The result is a 1D tensor containing the elements of the original tensor that correspond to True values in the mask tensor.

Boolean masking always returns a new tensor, even if the original tensor is contiguous.

tensor = torch.tensor([[1, 2, 3],
                       [4, 5, 6],
                       [7, 8, 9]])

mask = tensor > 5
result = tensor[mask]

print(mask, '\n')
print(result)
tensor([[False, False, False],
        [False, False,  True],
        [ True,  True,  True]]) 

tensor([6, 7, 8, 9])

You can also use boolean masks to assign new values to the original tensor. For example, tensor[mask] = 0 will set to zero all elements of tensor that correspond to True values in the mask tensor.

tensor[mask] = 0

print(tensor)
tensor([[1, 2, 3],
        [4, 5, 0],
        [0, 0, 0]])

You can also combine conditions using logical operators like & (and), | (or), and ~ (not). For example, mask = (tensor > 0) & (tensor < 10) will create a mask tensor with True values for elements between 0 and 10.

Advanced indexing#

Advanced indexing allows you to access elements of a tensor using lists of indices. You provide a list of indices for each dimension, and the result is a new tensor with the elements at those indices. The shape of the result is the same as the shape of the index lists.

In the following example, you have a 2D tensor x and two lists of indices rows and cols, which provide the rows and columns of the elements you want to access. The syntax x[rows, cols] allows you to access the elements x[rows[0], cols[0]], x[rows[1], cols[1]], etc.

x = torch.tensor([[1, 2, 3], 
                  [4, 5, 6], 
                  [7, 8, 9]])

rows = torch.tensor([0, 1, 2])
cols = torch.tensor([2, 1, 0])

x[rows, cols] # Returns: x[0, 2], x[1, 1], x[2, 0]
tensor([3, 5, 7])

You can also use advanced indexing to assign new values to the original tensor. For example, x[rows, cols] = 0 will set to zero the elements of x at the positions specified by the index lists.

x[rows, cols] = 0

print(x)
tensor([[1, 2, 0],
        [4, 0, 6],
        [0, 8, 9]])

The indices are not required to be unique or sorted. You may have repetitions in the index lists, and in this case, the same value will be repeated in the result tensor. Moreover, there are no restrictions on the shape of the index lists, but they must be broadcastable to a common shape. Here is an example.

x = torch.tensor([[1, 2, 3], 
                  [4, 5, 6], 
                  [7, 8, 9]])

rows = torch.tensor([0, 1])
cols = torch.tensor([1, 2])

x[rows[:, None], cols]  # index lists broadcast to shape (2, 2)
tensor([[2, 3],
        [5, 6]])

The above example is based on the fact that the index lists are expanded to the same shape before indexing the tensor x. The result tensor is created by selecting the elements in the expanded index lists, which are shown below.

torch.broadcast_tensors(rows[:, None], cols)
(tensor([[0, 0],
         [1, 1]]),
 tensor([[1, 2],
         [1, 2]]))

Conclusion#

In this tutorial, we have covered the basics of PyTorch, including tensors, reshaping, conversion from/to NumPy, operations, and indexing. The concepts presented here are very similar to NumPy, so if you are already familiar with NumPy, you should be able to transition to PyTorch easily. In the next tutorial, we will cover more advanced topics, such as automatic differentiation and optimization.