pytorch

torch.Tensor.scatter_add_

Tensor.scatter_add_(dim, index, src) -> Tensor - dim is an integer - self can have larger sizes than src - self, index, src have to have the same dimension An example is like

values1 = torch.zeros((3, 5))
src = torch.ones((2, 5))
index = torch.tensor([[0, 1, 2, 0, 0]])
values1.scatter_add_(0, index, src)
# values1 is updated
'''
trying to view vertically (along axis=1)
        [0][1]      [1][1]      [2][1]      [0][1]      [0][0]
tensor([[1.,        0.,         0.,         1.,         1.],
        [0.,        1.,         0.,         0.,         0.],
        [0.,        0.,         1.,         0.,         0.]])
'''

values2 = torch.zeros((3, 5))
values2.scatter_add_(1, index, src)
'''
index.size()[0] == 1, only the first row [0][:] is updated
tensor([[3., 1., 1., 0., 0.],
   idx
[0][0]
[0][1]
[0][2]
[0][0]
[0][0]
        [0., 0., 0., 0., 0.], 

        [0., 0., 0., 0., 0.]])
'''

For a 1-D tensor, self is updated as

idx = index[i]
self[idx] += src[i]  # if dim == 0

For a 2-D tensor, self is updated as

idx = index[i][j]
self[idx][j] += src[i][j]  # if dim == 0
self[i][idx] += src[i][j]  # if dim == 1

For a 3-D tensor, self is updated as

idx = index[i][j][k]
self[idx][j][k] += src[i][j][k]  # if dim == 0
self[i][idx][k] += src[i][j][k]  # if dim == 1
self[i][j][idx] += src[i][j][k]  # if dim == 2

Gpytorch

Issues

Calculate the gradient of prediction w.r.t inputs

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# one point at a time
test_x = torch.tensor([1.8], requires_grad=True)

with gpytorch.settings.fast_pred_var():
    # Make predictions
    observed_pred = likelihood(model(test_x))
    mean = observed_pred.mean
    lower, upper = observed_pred.confidence_region()

mean.backward()
gradient = test_x.grad

GPU memories

I have tried to use Oganov global fingerprints as inputs (100, 270) and total energy + its derivative w.r.t feature vector as outputs (100, 271) in GPModelWithDerivatives. Then the training is running out of CUDA memories on Tesla P100-PCIE-12GB. If I reduce the number of training data from 100 to 50, the training is running normally, but the prediction does not have enough CUDA memories. If I further reduce the number of training data to 30, both the training and prediction work fine.

scaling train_y

When training on derivative observations like atomic forces, the amplitude of target value (energy) is much higher than its derivatives. It becomes important to scale both target values and derivatives before training and inverse transform its predictions to original amplitudes.