pytorch

- TorchMD-Net 2.0 JCTC 2024: static neighbor_list, torch.compile() optimization
- RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
- torch.meshgrid behaves differently with numpy.meshgrid:
`torch.meshgrid(...)`

is equal to`np.meshgrid(..., indexing='ij')`

while the default in numpy is`indexing='xy'`

`repeat()`

is similar with`np.tile()`

Tensor.scatter_add_(dim, index, src) -> Tensor
- `dim`

is an integer
- `self`

can have larger sizes than `src`

- `self`

, `index`

, `src`

have to have the same dimension
An example is like

values1 = torch.zeros((3, 5)) src = torch.ones((2, 5)) index = torch.tensor([[0, 1, 2, 0, 0]]) values1.scatter_add_(0, index, src) # values1 is updated ''' trying to view vertically (along axis=1) [0][1] [1][1] [2][1] [0][1] [0][0] tensor([[1., 0., 0., 1., 1.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.]]) ''' values2 = torch.zeros((3, 5)) values2.scatter_add_(1, index, src) ''' index.size()[0] == 1, only the first row [0][:] is updated tensor([[3., 1., 1., 0., 0.], idx [0][0] [0][1] [0][2] [0][0] [0][0] [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) '''

For a 1-D tensor, `self`

is updated as

idx = index[i] self[idx] += src[i] # if dim == 0

For a 2-D tensor, `self`

is updated as

idx = index[i][j] self[idx][j] += src[i][j] # if dim == 0 self[i][idx] += src[i][j] # if dim == 1

For a 3-D tensor, `self`

is updated as

idx = index[i][j][k] self[idx][j][k] += src[i][j][k] # if dim == 0 self[i][idx][k] += src[i][j][k] # if dim == 1 self[i][j][idx] += src[i][j][k] # if dim == 2

- add feature flag to skip posterior covar computation: avoid covariance calculation during predictions

# Get into evaluation (predictive posterior) mode model.eval() likelihood.eval() # one point at a time test_x = torch.tensor([1.8], requires_grad=True) with gpytorch.settings.fast_pred_var(): # Make predictions observed_pred = likelihood(model(test_x)) mean = observed_pred.mean lower, upper = observed_pred.confidence_region() mean.backward() gradient = test_x.grad

I have tried to use Oganov global fingerprints as inputs (100, 270) and total energy + its derivative w.r.t feature vector as outputs (100, 271) in `GPModelWithDerivatives`

.
Then the training is running out of CUDA memories on `Tesla P100-PCIE-12GB`

.
If I reduce the number of training data from 100 to 50, the training is running normally, but the prediction does not have enough CUDA memories.
If I further reduce the number of training data to 30, both the training and prediction work fine.

When training on derivative observations like atomic forces, the amplitude of target value (energy) is much higher than its derivatives. It becomes important to scale both target values and derivatives before training and inverse transform its predictions to original amplitudes.