PyTorch

Concepts

Backward Function

Broadcast

Broadcast: Tensors with different shape can't do element-wise operations. For example, 2 * [3, 4] cannot be executed since they have different shape. What is the solution? We duplicate smaller array so that the smaller array are broadcasted to bigger array [2, 2] * [3, 4] = [6, 8].

Broadcastable: whether we can do element-wise operation after extending dimensions of the smaller dimension tensor along an axis.

x=torch.empty(5,7,3)
y=torch.empty(5,7,3)
# same shapes are always broadcastable (i.e. the above rules always hold)

x=torch.empty((0,))
y=torch.empty(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension

# can line up trailing dimensions
x=torch.empty(5,3,4,1)
y=torch.empty(  3,1,1)
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist

# but:
x=torch.empty(5,2,4,1)
y=torch.empty(  3,1,1)
# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3

Contiguous

When you do some operation, for example, tensor.transpose(), your tensor will be transformed from a contiguous tensor to non-contiguous tensor.

t = torch.arange(12).reshape(3,4)
t
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])
t.stride()
# (4, 1)
t2 = t.transpose(0,1)
t2
# tensor([[ 0,  4,  8],
#         [ 1,  5,  9],
#         [ 2,  6, 10],
#         [ 3,  7, 11]])
t2.stride()
# (1, 4)
t.data_ptr() == t2.data_ptr() # they share memory
# True
t.is_contiguous(),t2.is_contiguous() # t is contiguous, but t2 is not
# (True, False)

In above example, we can see transpose doesn't create new tensor in memory, it just change the stride. When you use view on non-contiguous tensor, it will throw you an error because

The .contiguous function create memory

t3 = t2.contiguous()
t3
# tensor([[ 0,  4,  8],
#         [ 1,  5,  9],
#         [ 2,  6, 10],
#         [ 3,  7, 11]])
t3.data_ptr() == t2.data_ptr() # create memory
False

Functions

Initialization

# generate 3D tensor with inner most dimension of size 4
torch.randn([2, 3, 4])

View Preserves the Order

View Preserves the Order

# transform dimension, -1 means automatically specify dimension
# torch.reshape() always copies memory. view never copies memory
torch.randn([2, 3, 4]).view(-1, 4)
# Returns a tensor filled with uninitialized data. The shape of the tensor is defined by the variable argument size.
torch.empty((2,3), dtype=torch.int64)
# tensor([[ 9.4064e+13,  2.8000e+01,  9.3493e+13],
#         [ 7.5751e+18,  7.1428e+18,  7.5955e+18]])
# add extra dimension at dimension = 1
torch.randn([2, 3, 4]).unsqueeze(dim=1).shape
torch.Size([2, 1, 3])
# note that clone is differentiable, you should use requires_grad_ after clone
rand.clone().requires_grad_()
# generate interval from [start, end) with some stepsize
torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
# generate interval from [start, end] with step many numbers
torch.linspace(3, 10, steps=5)
# tensor([  3.0000,   4.7500,   6.5000,   8.2500,  10.0000])
torch.linspace(-10, 10, steps=5)
# tensor([-10.,  -5.,   0.,   5.,  10.])
torch.linspace(start=-10, end=10, steps=5)
# tensor([-10.,  -5.,   0.,   5.,  10.])
torch.linspace(start=-10, end=10, steps=1)
# tensor([-10.])

Dimensional Operation

For indexing:

# Einsum is all you need
# https://www.youtube.com/watch?v=pkVwUVEHmfI
# given tensors of the same size, put them into a list [t, t, t] and convert to tensor (if not same size, behavior might be unexpected - will generate concat(cat) behavior). The dim specify the position of added dimension
t = torch.tensor([1, 1, 2])
stacked = torch.stack([t, t, t], dim=-1)
# if you torch.stack([x, y, z]) where each x,y,z are coordinates, then dim -1 gives you points in the inner most dimension
# t.shape, stacked.shape, stacked
# (torch.Size([3]),
#  torch.Size([3, 3]),
#  tensor([[1, 1, 2],
#          [1, 1, 2],
#          [1, 1, 2]]))
# note that selection is exclusive for the last
a = np.array([1, 2, 3, 4])
a[:3]
# array([1, 2, 3])
a[2:3]
# array([3])
# create an axis of size 1 at the inner dimension
labels[:,None]
# rearrange order of dimensions
x = torch.randn(2, 3, 5)
x.size()
# torch.Size([2, 3, 5])
torch.permute(x, (2, 0, 1)).size()
# torch.Size([5, 2, 3])
# generate coordinates from possible values of X, Y, Z (or X, Y)
X, Y, Z = torch.meshgrid(X, Y, Z)

X = torch.tensor([1, 2, 3, 4])
Y = torch.tensor([4, 5, 6])
X, Y = torch.meshgrid(X, Y)
# X = tensor([
#   [1, 1, 1],
#   [2, 2, 2],
#   [3, 3, 3],
#   [4, 4, 4],
# ])
# Y = tensor([
#   [4, 5, 6],
#   [4, 5, 6],
#   [4, 5, 6],
#   [4, 5, 6],
# ])
# replace every value in input tensor by another value in the input tensor at shifted location
torch.gather(input, dim, index, *, sparse_grad=False, out=None) → Tensor

# out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
# out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
# out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2

Arithmetic Operation

# out = input + element-wise-mult(tensor1, tensor2) * constant-value
t = torch.tensor([[1., 2., 3.]])
t1 = torch.tensor([[1., 1., 1.]])
t2 = torch.tensor([[3., 2., 1.]])
print(torch.addcmul(t, tensor1=t1, tensor2=t2, value=0.1))
# tensor([[1.3000, 2.2000, 3.1000]])
a = torch.randn(10)
# a
# tensor([-0.8286, -0.4890,  0.5155,  0.8443,  0.1865, -0.1752, -2.0595,
#          0.1850, -1.1571, -0.4243])
# cumulative sum
torch.cumsum(a, dim=0)
# tensor([-0.8286, -1.3175, -0.8020,  0.0423,  0.2289,  0.0537, -2.0058,
#         -1.8209, -2.9780, -3.4022])
# interpolate, see https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html
F.interpolate(...)

Memory Operation

# reserve buffer to model, not parameter, but will be saved along other parameters
nn.Module.register_buffer(name, buff, persistent=True)

C++ Functions

2018/12/09: Pytorch CFFI (stands for C Foreign Function Interface) is now deprecated in favor of C++ extension from pytorch v1.0.

// See: https://pytorch.org/cppdocs/notes/tensor_creation.html
torch::Tensor tensor = torch::randint(/*high=*/10, {5, 5});
torch::Tensor tensor = torch::randint(/*low=*/3, /*high=*/10, {5, 5});

auto options =
  torch::TensorOptions()
    .dtype(torch::kFloat32)
    .layout(torch::kStrided)
    .device(torch::kCUDA, 1)
    .requires_grad(true);

torch::Tensor tensor = torch::full({3, 4}, /*value=*/123, options);
assert(tensor.dtype() == torch::kFloat32);
assert(tensor.layout() == torch::kStrided);
assert(tensor.device().type() == torch::kCUDA); // or device().is_cuda()
assert(tensor.device().index() == 1);
assert(tensor.requires_grad());

Torch Script

This Article is awesome!

Pytorch Lightning

rank: the process id local_rank: GPU id world_size:how many ranks node:machine

trainer = pl.Trainer(accelerator='ddp', gpus=8, num_nodes=10, max_epochs=20)

See this article

Table of Content