**Note:** this tutorial isn't valid for the present NSC clusters Tetralith and Sigma, new tutorials are in preparation.
In the mean time, have a look e.g. at the examples from the NSC VASP workshop.
This tutorial will get you started with the basics of running VASP on NSCs clusters. Here, we will use the Triolith system, but following the same steps works on Gamma as well.
In order to run VASP at NSC, you need to have
The first step is to log in to Triolith. On a Mac or Linux machine, you start by opening a terminal window and initatiating a connection with the ssh
program. On Windows, you could use a program like “PuTTY” to connect using ssh.
$ ssh x_username@triolith.nsc.liu.se
The welcome message is then displayed and you are logged in to the so-called “login node” of the cluster. This is the place where you prepare your calculations and send jobs to the compute nodes. Please note that you share this node with all the other users logged into Triolith at that moment, and it is only a single server, so you cannot (and should not try to) run any real calculations on this server.
Last login: Thu Jul 2 11:21:20 2015 from ...
Welcome to Triolith!
PLEASE READ THE USER GUIDE: http://www.nsc.liu.se/systems/triolith/
Note: Triolith has two login nodes, triolith1 and triolith2. If this
one is unavailable you can try the other one ("ssh triolith2.nsc.liu.se").
//NSC Support <support@nsc.liu.se>
[x_username@triolith1 ~]$
I would recommend that the first command you run is projinfo -C
, which shows
the status of the projects that you are a member of.
[x_username@triolith1 ~]$ projinfo -C
You are a member of 1 active project.
SNIC 2016/XX-YY
═══════════════
Principal Investigator (PI): Firstname Lastname
Project storage directory: /proj/directory
Slurm account: snic2016-XX-YY
Current core time allocation: ZZZZZ h/month
If you are not a member of any project, you cannot run any calculations on
Triolith. projinfo
will in that case tell you that you are not a member of
any active project.
Before setting up a new calculation, you need to determine where to store the input and output files. You have several options:
/home/x_username
./proj/[projectname]/username/
.The shared project storage is located on a high-performance clustered file system, so that is what where we typically recommend that you store your ongoing calculations. If you do not know where your project storage is located, you can try running the storagequota
command, it will show how much you can store, and also the name of your project storage directory. The project storage is fully described in the storage documentation.
In this tutorial, we will try an example from the VASP web page, a CO molecule on a Ni (111) surface. After having moved to the project directory, we download the archived input files and decompress them:
$ cd /proj/[projectname]/username
$ wget http://www.vasp.at/vasp-workshop/examples/3_5_COonNi111_rel.tgz
$ tar 3_5_COonNi111_rel.tgz
$ cd 3_5_COonNi111_rel.tgz
$ ls
INCAR KPOINTS POSCAR POTCAR
You should find the 4 required input files for VASP: INCAR
, KPOINTS
, POSCAR
, and POTCAR
. Now to run, we only need to request a compute node and invoke the actual VASP program.
There are two ways to get hold of the VASP program: you either download the source code for the VASP program and compile it yourself on the cluster, or you use the preinstalled binaries that are available in the directory /software/apps/vasp/
on the clusters. Please note that in order to use NSC binaries of VASP, you need to tell us about your VASP license. We have a page describing how that procedure works.
Compiling the VASP program from source code is necessary if you need to do modifications or install extra add-on packages. It is quite straightforward and NSCs support people can provide makefiles and instructions if you need help. There is a guide for how to compile VASP on Triolith, and you can often find makefiles in the /software/apps/vasp/
directory.
For this example, we will select the latest preinstalled version of VASP. An overview of all the VASP installations we have is available on the Triolith software depository page. When this tutorial was written, version 5.3.5-01Apr14 “build02” was the most recent standard version. The directory paths to the three standard binaries is:
/software/apps/vasp/5.3.5-01Apr14/build02/vasp
/software/apps/vasp/5.3.5-01Apr14/build02/vasp-gamma
/software/apps/vasp/5.3.5-01Apr14/build02/vasp-noncollinear
You do not need to copy the files from there into your home or working directory. The /software
is available on all compute nodes and the intention is that you should start these binaries directly. Note that NSC’s software installation policy is to never remove old versions unless they are fundamentally broken, so you can rely on the binaries being there for the full lifetime of the cluster.
VASP is a parallel program meant to run on many processor cores (or compute nodes) simultaneously using MPI for communication, so you should start the program with the mpprun
command in your job script or in the interactive shell (see below), for example:
mpprun /software/apps/vasp/5.3.5-01Apr14/build02/vasp
Usually, you do not need to give any flags to mpprun, it will automatically figure out how many cores it should use, and how to connect to the other compute nodes. Please keep in mind that mpprun is special command that only exists on NSCs clusters. More information about mpprun is available in the NSC build environment description and the mpprun software page.
It is not advisable to run directly on the machine where you logged in (the “login node”). If you want to test your calculation before running it for real in the batch queue, you allocate one or more compute nodes for interactive use with the interactive
command instead. This example is a very small calculation (7 atoms / 48 bands / 12 k-points), so we only need a single compute node. After some, hopefully short, waiting time, you will get a command shell on a compute node, where you can run VASP. Here, we are using 1 compute node, so VASP will run on 16 processor cores in parallel.
interactive -N 1 -t 00:20:00
.......
[pla@n448 3_5_COonNi111_rel]$ mpprun /software/apps/vasp/5.3.5-01Apr14/default/vasp
mpprun INFO: Starting impi run on 1 node ( 16 ranks )
running on 16 total cores
distrk: each k-point on 16 cores, 1 groups
distr: one band on 1 cores, 16 groups
using from now: INCAR
vasp.5.3.5 31Mar14 (build Apr 08 2014 11:32:36) complex
POSCAR found : 3 types and 7 ions ...
Check that the calculation starts without errors. This one will take around 10-12 seconds to finish on Triolith, so you can actually wait for it to finish. For a real calculation, you would likely have to stop it after a few iterations. Afterwards, collect some parameters from the OUTCAR file, like the time required for one SCF iteration and the number of electronic bands. Use the timing of the first iterations to extrapolate the run time for the whole calculation, as you will need to make an estimate when you write job script. The timings for each ionic step can be extracted from the OUTCAR file with the grep command using e.g.
$ grep LOOP+ OUTCAR
LOOP+: cpu time 3.70: real time 3.80
LOOP+: cpu time 1.88: real time 1.88
LOOP+: cpu time 1.59: real time 1.59
LOOP+: cpu time 0.95: real time 0.95
This should give you some idea of the amount of time required to run the full calculation. If it runs to slowly, you will have to either use more compute nodes or adjust the settings (for example use less k-points or a smaller basis set).
Part of the fun with using a supercomputing centre is that you can on many processors in parallel and thus speed up your calculations. Unfortunately, there is a limit on how many cores you can use efficiently. What is a good number? Two rough guidelines are:
More information about this topic can be found in the article “Selecting the Right Number of Cores for a VASP Calculation” by Peter Larsson at NSC.
To get good speed when running on many compute nodes, you also need to adjust some of the input parameters in the INCAR file, which influence the parallelization scheme. The two most influential ones are NCORE
and KPAR
, for band and k-point parallelization, respectively. For NCORE, try setting it to the number of cores that you use per compute. Typically, that is 16 on Triolith.
NCORE = 16
If you have more than one k-point in the calculation, you can try parallelization over k-points. Try setting KPAR to number of compute nodes or number of k-points, whichever is smaller.
KPAR = min(number of compute nodes,NKPT)
It works best with medium-sized hybrid calculations, where you have a few k-points and a lot of computational work per k-point. It does not work as well for metals, for example, you cannot expect to run a small metallic cell with 1000 thousands of k-points efficiently on 1000 compute nodes.
The best way to run many calculations is to prepare all of them at the same time, put them in Triolith’s job queue, and then let the system start the jobs as soon as there are enough compute available for you. To do this, you need to write a so-called job script for each job. It tells Triolith how much resources you need (e.g. how many nodes for how long time) and what you want to run on the nodes. The job script is typically written in bash and a minimal job script for VASP looks like this:
#!/bin/bash
#SBATCH -J jobname
#SBATCH -N 1
#SBATCH --exclusive
#SBATCH -t 12:00:00
#SBATCH -A snic2015-x-yyy
mpprun /software/apps/vasp/5.3.5-01Apr14/build02/vasp
It request 1 compute node for exclusive use for 12 hours with a specific job name. If you have computer time allocation in several project, you also need to specify which project you want to account for this job.
Note that this script assumes that you send it to the job queue while being in the same directory as you have the input files. Otherwise, you will need make a cd
command inside the script to move to the right directory with the input files before you start VASP with mpprun.
To send it to the job queue, use the sbatch
command.
[pla@n448 3_5_COonNi111_rel]$ sbatch job.sh
Submitted batch job 6944017
The sbatch command gives you a job number that you can use to track the job and it see it status in the queue system. You can do this with the squeue
command.
[pla@n448 3_5_COonNi111_rel]$ squeue -u x_username
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6944017 triolith COonNi pla PD 0:00 1 (None)
Here, the job is still waiting in the queue, but when it starts running, you will the status has changed to “R”.
[pla@n448 3_5_COonNi111_rel]$ squeue -u x_username
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6944017 triolith COonNi pla R 0:05 1 n122
Note that it is possible to inspect the output files and follow the progress of the job while it is running. Typically, people look at the OSZICAR
file, which provides condensed output, one line per SCF iteration.
[pla@triolith1 3_5_COonNi111_rel]$ cat OSZICAR
N E dE d eps ncg rms rms(c)
DAV: 1 0.247881146959E+03 0.24788E+03 -0.24207E+04 1568 0.100E+03
DAV: 2 -0.349973712274E+02 -0.28288E+03 -0.25375E+03 1760 0.197E+02
DAV: 3 -0.473201701633E+02 -0.12323E+02 -0.12181E+02 1920 0.477E+01
When the job has finished, it disappears from the job list, and you will find the full set of output files in the job directory.
[pla@triolith1 3_5_COonNi111_rel]$ ls
CHG CONTCAR EIGENVAL INCAR KPOINTS OUTCAR POSCAR slurm-6944033.out WAVECAR
CHGCAR DOSCAR IBZKPT job.sh OSZICAR PCDAT POTCAR vasprun.xml XDATCAR
If there was some kind of problem with job, e.g. if it crashes or terminated earlier than expected, you should look inside the output file from queue system. It is called “slurm-jobid.out”. It contains what you would normally see in the terminal window when you run the program manually. In this case, everything looks ok.
[pla@triolith1 3_5_COonNi111_rel]$ tail slurm-6944017.out
DAV: 1 -0.408333066044E+02 -0.63615E-04 -0.23970E-02 1504 0.686E-01 0.385E-02
RMM: 2 -0.408342800080E+02 -0.97340E-03 -0.26240E-04 852 0.123E-01 0.435E-01
RMM: 3 -0.408333994813E+02 0.88053E-03 -0.12356E-04 741 0.824E-02 0.162E-01
RMM: 4 -0.408333266195E+02 0.72862E-04 -0.19571E-05 686 0.292E-02
4 F= -.40833327E+02 E0= -.40828786E+02 d E =-.104236E-03
BRION: g(F)= 0.160E-03 g(S)= 0.000E+00 retain N= 2 mean eig= 0.27
eig: 0.281 0.255
reached required accuracy - stopping structural energy minimisation
writing wavefunctions
mpprun INFO: Elapsed time (h:m:s): 0:00:12.345511
This concludes the VASP tutorial.
Guides, documentation and FAQ.
Applying for projects and login accounts.