PyTorch

Concepts

Backward Function

Broadcast

Broadcast: Tensors with different shape can't do element-wise operations. For example, 2 * [3, 4] cannot be executed since they have different shape. What is the solution? We duplicate smaller array so that the smaller array are broadcasted to bigger array [2, 2] * [3, 4] = [6, 8].

Broadcastable: whether we can do element-wise operation after extending dimensions of the smaller dimension tensor along an axis.

x=torch.empty(5,7,3)
y=torch.empty(5,7,3)
# same shapes are always broadcastable (i.e. the above rules always hold)

x=torch.empty((0,))
y=torch.empty(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension

# can line up trailing dimensions
x=torch.empty(5,3,4,1)
y=torch.empty(  3,1,1)
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist

# but:
x=torch.empty(5,2,4,1)
y=torch.empty(  3,1,1)
# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3

Contiguous

When you do some operation, for example, tensor.transpose(), your tensor will be transformed from a contiguous tensor to non-contiguous tensor.

t = torch.arange(12).reshape(3,4)
t
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])
t.stride()
# (4, 1)
t2 = t.transpose(0,1)
t2
# tensor([[ 0,  4,  8],
#         [ 1,  5,  9],
#         [ 2,  6, 10],
#         [ 3,  7, 11]])
t2.stride()
# (1, 4)
t.data_ptr() == t2.data_ptr() # they share memory
# True
t.is_contiguous(),t2.is_contiguous() # t is contiguous, but t2 is not
# (True, False)

In above example, we can see transpose doesn't create new tensor in memory, it just change the stride. When you use view on non-contiguous tensor, it will throw you an error because

view cannot call .contiguous because otherwise it will create new memory
view cannot arrange because it is too stupid to figure out the correct representation for non-contiguous tensor

The .contiguous function create memory

t3 = t2.contiguous()
t3
# tensor([[ 0,  4,  8],
#         [ 1,  5,  9],
#         [ 2,  6, 10],
#         [ 3,  7, 11]])
t3.data_ptr() == t2.data_ptr() # create memory
False

Functions

Initialization

# generate 3D tensor with inner most dimension of size 4
torch.randn([2, 3, 4])

# transform dimension, -1 means automatically specify dimension
# torch.reshape() always copies memory. view never copies memory
torch.randn([2, 3, 4]).view(-1, 4)

# Returns a tensor filled with uninitialized data. The shape of the tensor is defined by the variable argument size.
torch.empty((2,3), dtype=torch.int64)
# tensor([[ 9.4064e+13,  2.8000e+01,  9.3493e+13],
#         [ 7.5751e+18,  7.1428e+18,  7.5955e+18]])

# add extra dimension at dimension = 1
torch.randn([2, 3, 4]).unsqueeze(dim=1).shape
torch.Size([2, 1, 3])

# note that clone is differentiable, you should use requires_grad_ after clone
rand.clone().requires_grad_()

# generate interval from [start, end) with some stepsize
torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

# generate interval from [start, end] with step many numbers
torch.linspace(3, 10, steps=5)
# tensor([  3.0000,   4.7500,   6.5000,   8.2500,  10.0000])
torch.linspace(-10, 10, steps=5)
# tensor([-10.,  -5.,   0.,   5.,  10.])
torch.linspace(start=-10, end=10, steps=5)
# tensor([-10.,  -5.,   0.,   5.,  10.])
torch.linspace(start=-10, end=10, steps=1)
# tensor([-10.])

Dimensional Operation

For indexing:

None means create a dimension
: means select all in that dimension
... means as many as : as needed.

# Einsum is all you need
# https://www.youtube.com/watch?v=pkVwUVEHmfI

# given tensors of the same size, put them into a list [t, t, t] and convert to tensor (if not same size, behavior might be unexpected - will generate concat(cat) behavior). The dim specify the position of added dimension
t = torch.tensor([1, 1, 2])
stacked = torch.stack([t, t, t], dim=-1)
# if you torch.stack([x, y, z]) where each x,y,z are coordinates, then dim -1 gives you points in the inner most dimension
# t.shape, stacked.shape, stacked
# (torch.Size([3]),
#  torch.Size([3, 3]),
#  tensor([[1, 1, 2],
#          [1, 1, 2],
#          [1, 1, 2]]))

# note that selection is exclusive for the last
a = np.array([1, 2, 3, 4])
a[:3]
# array([1, 2, 3])
a[2:3]
# array([3])

# create an axis of size 1 at the inner dimension
labels[:,None]

# rearrange order of dimensions
x = torch.randn(2, 3, 5)
x.size()
# torch.Size([2, 3, 5])
torch.permute(x, (2, 0, 1)).size()
# torch.Size([5, 2, 3])

# generate coordinates from possible values of X, Y, Z (or X, Y)
X, Y, Z = torch.meshgrid(X, Y, Z)

X = torch.tensor([1, 2, 3, 4])
Y = torch.tensor([4, 5, 6])
X, Y = torch.meshgrid(X, Y)
# X = tensor([
#   [1, 1, 1],
#   [2, 2, 2],
#   [3, 3, 3],
#   [4, 4, 4],
# ])
# Y = tensor([
#   [4, 5, 6],
#   [4, 5, 6],
#   [4, 5, 6],
#   [4, 5, 6],
# ])

# replace every value in input tensor by another value in the input tensor at shifted location
torch.gather(input, dim, index, *, sparse_grad=False, out=None) → Tensor

# out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
# out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
# out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2

Arithmetic Operation

# out = input + element-wise-mult(tensor1, tensor2) * constant-value
t = torch.tensor([[1., 2., 3.]])
t1 = torch.tensor([[1., 1., 1.]])
t2 = torch.tensor([[3., 2., 1.]])
print(torch.addcmul(t, tensor1=t1, tensor2=t2, value=0.1))
# tensor([[1.3000, 2.2000, 3.1000]])

a = torch.randn(10)
# a
# tensor([-0.8286, -0.4890,  0.5155,  0.8443,  0.1865, -0.1752, -2.0595,
#          0.1850, -1.1571, -0.4243])
# cumulative sum
torch.cumsum(a, dim=0)
# tensor([-0.8286, -1.3175, -0.8020,  0.0423,  0.2289,  0.0537, -2.0058,
#         -1.8209, -2.9780, -3.4022])

# interpolate, see https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html
F.interpolate(...)

Memory Operation

# reserve buffer to model, not parameter, but will be saved along other parameters
nn.Module.register_buffer(name, buff, persistent=True)

C++ Functions

2018/12/09: Pytorch CFFI (stands for C Foreign Function Interface) is now deprecated in favor of C++ extension from pytorch v1.0.

// See: https://pytorch.org/cppdocs/notes/tensor_creation.html
torch::Tensor tensor = torch::randint(/*high=*/10, {5, 5});
torch::Tensor tensor = torch::randint(/*low=*/3, /*high=*/10, {5, 5});

auto options =
  torch::TensorOptions()
    .dtype(torch::kFloat32)
    .layout(torch::kStrided)
    .device(torch::kCUDA, 1)
    .requires_grad(true);

torch::Tensor tensor = torch::full({3, 4}, /*value=*/123, options);
assert(tensor.dtype() == torch::kFloat32);
assert(tensor.layout() == torch::kStrided);
assert(tensor.device().type() == torch::kCUDA); // or device().is_cuda()
assert(tensor.device().index() == 1);
assert(tensor.requires_grad());

Torch Script

This Article is awesome!

Pytorch Lightning

rank: the process id local_rank: GPU id world_size：how many ranks node：machine

trainer = pl.Trainer(accelerator='ddp', gpus=8, num_nodes=10, max_epochs=20)

See this article

Table of Content