Cluster

Grendel

Sbatch

Submit a gpu test job: salloc -p qgpu --ntasks=4 --mem=40G --gres=gpu:1 --time=03:00:00
Check gpus
- python -c "import tensorflow as tf; tf.config.experimental.list_physical_devices('GPU')"
- python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"
- python -c "from jax.lib import xla_bridge; print(xla_bridge.get_backend().platform)"
Remote jupyter
- Remote: salloc -p q28 -c 4 --x11 --mem 50GB --time 12:00:00 /bin/bash
- Compute node: jupyter notebook --port=40000 --ip=COMPUTE_NODE_NAME.grendel.cscaa.dk
- Local: ssh USER@MAIN_SERVER_IP -J USER@JUMP_SERVER_IP -g -L8081:COMPUTE_NODE_IP:40000 -N

Python environments

ml load python/3.9.4
cd python-virtualenv
virtualenv --system-site-packages spk2
alias spk2='source /home/tang/python-virtualenv/spk2/bin/activate'

Useful scipts

# change to the submission directory of a slurm job
function scd (){
    workdir=$(scontrol show jobid $1 | grep WorkDir | cut -d = -f2 )
    echo $workdir
    cd $workdir
}

Copy a file from a computed node to its home disk

#!/bin/sh
workdir=$(scontrol show jobid $1 | grep WorkDir | cut -d = -f2 )
node=$(scontrol show jobid $1 | grep BatchHost | cut -d = -f2 )
command="cp /scratch/$1/$2 $workdir"
#echo $workdir
#echo $node
echo $command

ssh $node "$command"

Grid Engine

Grid Engine最初为SUN公司开发，后来SUN被Oracle收购，Sun Grid Engine更名为Oracle Grid Engine。 Grid Engine的开源版本主要有两个：Open Grid Scheduler和Son of Grid Engine。这里主要介绍Open Grid Scheduler，它的大多数代码是基于Sun Grid Engine 6.2u5（SGE 6.2 update 5 released in 2009）。 Open Grid Scheduler的使用手册可以在Grid Scheduler & Grid Engine man pages上面看到，常用的命令有qdel，qsub，qstat，下面的例子是在64集群上测试的。

qdel

qdel用于从队列中删除任务，最常用的方式就是根据jobID来删除任务，例如qdel 60335，可以使用qdel -help来打开帮助信息。

qsub

qsub用于向队列提交任务脚本，一般直接使用qsub script.sh就可以了。如果想在提交任务时加入一些参数，可以在提交命令上加，也可以在任务脚本里添加。

对于第一种方式，如果你想把任务投到指定的队列上，你可以qsub -q all.q@cn018.local script.sh来实现。
对于第二种方式，以一个提交vasp任务的脚本vasp.sh为例来解释各个参数的意义。

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -j y
#$ -N a001_t1_2
#$ -pe make 12

source /share/apps/intel/Compiler/11.1/073/bin/iccvars.sh intel64
source /share/apps/intel/Compiler/11.1/073/bin/ifortvars.sh intel64
source /share/apps/intel/impi/3.2.0.011/bin64/mpivars.sh

mpirun -r ssh -np 12 ~/bin/vasp

#$：grid engine提示符，表示后面的字符串将被解析为grid engine参数
-S：定义解析脚本的方式，用/bin/bash来解析vasp.sh
-cwd: 在当前目录下执行任务，即执行qsub命令时的目录。举个例子，你有一个文件夹TiO2，你在里面做了结构优化，结束之后你想做静态计算。于是你在TiO2目录下创建了一个新的目录scf，把输入文件和脚本都准备好之后准备提交任务。如果你在TiO2目录下qsub scf/vasp.sh，计算过程中所有的文件都会从TiO2目录下读取，而不是从TiO2/scf目录下读取，这显然是错误的，在写批量执行任务的脚本时应避免。如果你在TiO2/scf目录下qsub vasp.sh，计算过程中所有的文件都会从TiO2/scf目录下读取，输出文件也会在TiO2/scf目录下产生，任务正常运行。
-j y：把错误信息也写入输出文件里面，方便程序出错时找原因
-N a001_t1_2：定义任务的名称
-pe make 12：请求12个核的资源更多的参数信息可以在官方的Grid Scheduler & Grid Engine man pages上查找，也可以qsub -help。

qstat

qstat用于查看队列中的任务信息（jobID，任务名称，运行状态，使用节点等）。最常用的方式就是直接输入qstat，这样查看的就是当前用户的任务信息。

如果想查看所有用户的任务信息，可以使用qstat -u '*'，查看特定用户的任务信息可以使用qstat -u 'username'。
如果想查看节点的占用情况，可以使用qstat -f。
其他可用参数可以输入qstat -help查看。

Slurm

Simple Linux Utility for Resource Management (Slurm)是一个开源的，针对linux集群的任务调度系统，截至到2015年底最新版本为15.08。官方的手册上面可以查看到常用命令的使用方法，查看帮助信息可以在命令后加上--help。

sbatch：提交任务脚本到Slurm
scancel：终止任务
sinfo：查看Slurm节点和分区信息
squeue：查看提交到Slurm上的任务信息
srun：执行并行任务

input

#SBATCH --nodes=4              # Number of nodes
#SBATCH --ntasks-per-node=18   # Number of MPI ranks per node
#SBATCH --cpus-per-task=2      # Number of OpenMP threads for each MPI process/rank

国家超级计算天津中心（NSCC-TJ）上面的天河一号（TH-1A）采用的是基于slurm 2.6.9修改的任务调度系统。常用的命令有：

yhbatch：提交任务脚本到Slurm
yhcancel：终止任务
yhi或yhinfo：查看Slurm节点和分区信息
yhq或yhqueue：查看提交到Slurm上的任务信息
yhrun：以交互式方式执行并行任务

yhbatch

yhbatch用于提交任务脚本到Slurm上，实际上是调用sbatch。

[username@ln2%tianhe ~]$ which yhbatch
/usr/bin/yhbatch
[username@ln2%tianhe ~]$ cat /usr/bin/yhbatch
#!/bin/sh
NAME=yhbatch
CMD=/usr/bin/sbatch

exec -a $NAME $CMD "$@"

因此，yhbatch的参数设置和sbatch是一致的。参数可以写在提交任务的命令中，也可以写在任务脚本里面。

写在提交任务的命令中可以这样：yhbatch -n 12 -p TH_NET vasp.sh。
写在脚本里时需要在参数前面加上#SBATCH，下面是一个提交vasp任务的例子，提交时直接yhbatch vasp.sh即可。

#!/bin/sh
#SBATCH -J tzy001-0
#SBATCH -n 12
#SBATCH -p TH_NET
export LD_LIBRARY_PATH=/vol-th/lib/mklem64t:$ LD_LIBRARY_PATH
yhrun -pdebug -N1 -n12 ~/bin/vasp5.3.5-neb

yhbatch常用的参数有：

-c ncpus：每个进程包含的CPU的核数，默认是一个进程分配一个CPU。
-J jobname：定义任务名称，例如-J lc-test。
-n number：指定每个节点运行的进程数，默认的是一个节点运行一个进程。
-N minnodes[-maxnodes]：指定至少分配的节点数，也可以加上最多分配的节点数。例如-N 2表示至少分配两个节点；-N 2-4表示至少分配两个节点，最多分配4个节点；-N 2-2表示只分配两个节点。
-p partition：从分区partition中请求资源，例如-p TH_NET
-w nodelist：请求指定列表的节点，例如-w cn1091，-w cn[1091, 1094]
-w nodelist：排除指定列表的节点，任务不会提交到这些节点。

TODO：-c与-n比较？

举个例子，我们有一个任务需要请求4个进程，每个进程需要3个CPU核数。如果集群是由一个节点含4核的节点组成，我们只需要请求12个核，任务调度系统可能会只给我们3个节点。然而，我们使用了-c 3这个选项，任务调度系统就认为每个进程所需的3个核必须在同一个节点上面，所以会给我分配4个节点，每个节点运行1个进程。

yhcancel

yhcancel jobID用于取消任务

yhi

yhi用于查看节点的使用情况，只会显示当前用户有使用权限的节点。本组购买的账户上面可以用的分区有debug、TH_NET、TH_BM，一个节点上有12核，默认队列为TH_NET。

partition	Time-limit	Job-size	Total-nodes
debug	30:00	1-6	21
TH_NET	2-00:00:00	1-512	2428
TH_BM	infinite	1-infinite	122

node-state

*：该节点无法响应
alloc -> allocated：该节点已经分配了任务在运行
down：该节点暂时无法使用
drain -> drained：该节点不会被任务调度系统使用，通常是管理员在维护该节点
drng -> draining：该节点正在执行任务，任务结束之后状态变为drained
idle：该节点处于空闲状态，可以往上面投任务

slurm - What does the state 'drain' mean? - Stack Overflow

Q：2015-12-06 TH_NET 运行时间为2-02:46:29 仍在运行？

A：2015-12-10 有些任务跑了两天会停，有些任务跑了三天还在继续，原因未知。

yhq

yhq用于查看当前用户已经提交的任务的运行状况

yhrun

yhrun为交互式作业提交方式，基本可以替代mpirun，参数设置同yhbatch。为了保证任务的稳定性，如无特殊需求尽量使用yhbatch提交任务脚本。

Packaging smaller parallel jobs into one large parallel job

https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html#packaging-smaller-parallel-jobs-into-one-large-parallel-job

#!/bin/bash

#SBATCH --job-name=example
#SBATCH --ntasks=20
#SBATCH --time=0-00:05:00
#SBATCH --mem-per-cpu=500MB

cd ${SLURM_SUBMIT_DIR}

# first set of parallel runs
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &

wait

# here a post-processing step
# ...

# another set of parallel runs
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &

wait

exit 0

Using Variables in SLURM Jobs

Variables are not allowed in #SBATCH. For example, if you use sbatch --export=ALL,index=2, you cannot write #SBATCH --job-name=runs-$index in the bash script. The following procedure is working:

index=2
sbatch --job-name=runs-$index --exportALL,index=$index job.sh

EasyBuild

Define the directory to store EasyBuild:

mkdir $HOME/modules
export EASYBUILD_PREFIX=$HOME/modules
export EASYBUILD_MODULES_TOOL=Lmod

Download and install EasyBuild:

curl -O https://raw.githubusercontent.com/hpcugent/easybuild-framework/develop/easybuild/scripts/bootstrap_eb.py
python bootstrap_eb.py $EASYBUILD_PREFIX

Update the $MODULEPATH by module use, then load the EasyBuild module and check the basic EasyBuild functionality:

module use $EASYBUILD_PREFIX/modules/all
module load EasyBuild
module list
eb --version

Build Intel-2019b toolchain

Define the intel license file:

export INTEL_LICENSE_FILE=<file-path>

Download intel packages and move to specific directories.

mkdir -p $HOME/modules/sources/i/iccifort $HOME/modules/sources/i/imkl $HOME/modules/sources/i/impi
mv parallel_studio_xe_2019_update5_composer_edition.tgz $HOME/modules/sources/i/iccifort/
mv l_mkl_2019.5.281.tgz $HOME/modules/sources/i/imkl/
mv l_mpi_2018.5.288.tgz $HOME/modules/sources/i/impi/

Build the whole tool chain

eb intel-2019b.eb -r

Build GPAW

Load intel-2019b toolchain

module load intel/2019b

Build the GPAW, GPAW-setups and ASE software modules plus all prerequisites with intel-2019b by:

eb GPAW-20.1.0-intel-2019b-Python-3.7.4.eb -r