Resurrection script

From Wiki
Jump to navigation Jump to search

go back to Main Page, Group Pages, Núria_López_and_Group, Scripts_for_VASP

Setting up the calculation[edit]

First, put the three scripts listed here in your ~/bin/ folder in MareNostrum, and do chmod +x to make them executables.

Now, create the main directory for your dynamics, and the first folder, such as this:

you@login1:~> mkdir my_dynamics 
you@login1:~> mkdir my_dynamics/1
you@login1:~> cd my_dynamics/1

Now, put in the folder 1 your basic input files (INCAR, KPOINTS, POTCAR and POSCAR). Your first run.sh script should be like this (running in class_c queue):

#!/bin/bash
#BSUB -J name_of_job_1
#BSUB -q class_c 
#BSUB -n 64 
#BSUB -W 23:59
#BSUB -o o_name_of_job_1.%J
#BSUB -e e_name_of_job_1.%J
#BSUB -u youremail@iciq.es
#BSUB -R"span[ptile=16]"

### Load environment variables ###########
module load VASP/5.3.3

### Run job ##############################
resurrection_timecontrol 23 30 r_name_of_job_1 &
mpirun vasp.complex ; touch stopflag ; resurrection name_of_job_ 1 16c 64 23 30 ; echo the dynamics has been resurrected >> r_name_of_job_1 ; exit

Explanation:

  • In this example, you are running with 64 processors on class_c queue.
  • You have three quickly accesible log files: o_* is the standard output, e_* contains the errors, and r_* contains the information related to the resurrection process.
  • Your time limit will be 23:59 hours, the maximum allowed by class_c is 24:00
  • Before starting VASP, you will lauch resurrection_timecontrol, which will stop the calculation after 23:30 hours, via STOPCAR (LSTOP = .TRUE.).
  • Then the script will execute VASP on your local folder.
  • If the VASP calculation ends abruptly before the time limit, it will deliver a signal (stopflag) that will kill "resurrection_timecontrol", avoiding a phantom job to stay on the line for hours.
  • Now the calculation will be resurrected with the name name_of_job_2, on folder 2 (see script 1 for more details) on the same queue with the same number of processors and the same time control. This script will call internally to rungen_resurrection, but you can merge them if you prefer.
  • This set of scripts is totally self-contained.
  • Tested and debugged.

Now that you know how this work, begin to calculate by typing:

you@login1:~/my_dynamics/1> bsub < run.sh 

Do not forget to baby-sit your calculations every day, and verify that all your electronic cycles have been converged.

Script 1: resurrection[edit]

#!/bin/bash
# Rodrigo García-Muelas
# 28/03/2013
# 
# Input:
# $1 Name of work
# $2 Suffix (number id)
# $3 Queue
# $4 Number of processors
# $5 Number of hours of runtime
# $6 Extra number of minuts of runtime
#
# Motivation: I create a directory for the next step.
# Then, I create the new run.sh, which shall call this script
# And send
# run.sh has an internal time control

i=$(($2+1)) 

mkdir ../$i
cp ./INCAR ../$i/INCAR
cp ./KPOINTS ../$i/KPOINTS
cp ./CONTCAR ../$i/POSCAR
cp ./POTCAR ../$i/POTCAR
mv ./WAVECAR ../$i/WAVECAR
mv ./CHGCAR ../$i/CHGCAR
rm ./CHG

cd ../$i/
rungen_resurrection $1 $i $3 $4 $5 $6 # generate run.sh
bsub < run.sh                         # submit run.sh
exit

Script 2: rungen_resurrection[edit]

#!/bin/bash
# Rodrigo García-Muelas
# 28/03/2013
# 
# Input:
# $1 Name of work
# $2 Suffix (number id)
# $3 Queue
# $4 Number of processors
# $5 Runtime hours
# $6 Runtime minutes (add)
#
# Motivation: I create a directory for the next step.
# Then, I create the new run.sh, which shall call this script

case $3 in
 16a)  queue=class_a  ; mar=1 ; procqueue=16 ; maxhours=47 ;;
 16b)  queue=class_b  ; mar=1 ; procqueue=16 ; maxhours=22 ;; # maybe they give priority to shorter works 
 16c)  queue=class_c  ; mar=1 ; procqueue=16 ; maxhours=22 ;; # idem
 *)    echo "Error in queue name!!! " ; exit ;;
esac

# Comprobate if the number of processors is correct 
let AAA=`expr $4 % $procqueue` ; if [ 0 != $AAA ] ; then exit 1 ; fi # number of processars right?

# Generating the run.sh file 
cat >run.sh<<!
#!/bin/bash
#BSUB -J $1$2
#BSUB -q $queue 
#BSUB -n $4 
#BSUB -W $5:59
#BSUB -o o_$1$2.%J
#BSUB -e e_$1$2.%J
#BSUB -u rgarcia@iciq.es
#BSUB -R"span[ptile=16]"

### Load environment variables ###########
module load VASP/5.3.3

### Run job ##############################
resurrection_timecontrol $5 $6 r_$1$2 &
mpirun vasp.complex ; touch stopflag ; resurrection $1 $2 $3 $4 $5 $6 ; echo the dynamics has been resurrected >> r_$1$2 ; exit

!

Script 3: resurrection_timecontrol[edit]

#!/bin/bash
#
# Rodrigo García-Muelas
# Improved on May 17th, 2013
#
# INPUT
#
# $1 number of hours   +
# $2 number of minutes
#    (before generating file STOPCAR)
# $3 name of file
# 
# INTERNAL
#
# timeini : The calculus starts
# timeend : The calculus ends
# timenow : Current time 


timeini=`date +'%s'` 
timenow=$timeini
timeend=$(($timeini+3600*$1+60*$2))

echo resurrection flags are timeini $timeini timeend $timeend >> $3

# If VASP finishes before timeend, kill this process

while [ $timenow -lt $timeend ] ;  do 
 if [ -e stopflag ] ; then rm stopflag ; echo resurrection: VASP finished normally at $timenow >> $3 ; exit ; fi 
 sleep 5s  # Verify status each 5 seconds
 timenow=`date +'%s'` 
done

# If timeend is reached, write STOPCAR 

echo resurrection: writing STOPCAR at $timenow >> $3 
cat >STOPCAR<<!
 LSTOP = .TRUE.

!

exit