Resurrection script: Difference between revisions
Jump to navigation
Jump to search
| (5 intermediate revisions by one other user not shown) | |||
| Line 1: | Line 1: | ||
go back to [[Main Page]], [[Group Pages]], [[Núria_López_and_Group]], [[Scripts_for_VASP]] | |||
== Setting up the calculation== | == Setting up the calculation== | ||
First, put the three scripts listed here in your ~/bin/ folder in MareNostrum, and do chmod +x to make them executables. | |||
Now, create the main directory for your dynamics, and the first folder, such as this: | |||
you@login1:~> mkdir my_dynamics | |||
you@login1:~> mkdir my_dynamics/1 | |||
you@login1:~> cd my_dynamics/1 | |||
Now, put in the folder 1 your basic input files (INCAR, KPOINTS, POTCAR and POSCAR). Your first run.sh script should be like this (running in class_c queue): | |||
#!/bin/bash | |||
#BSUB -J name_of_job_1 | |||
#BSUB -q class_c | |||
#BSUB -n 64 | |||
#BSUB -W 23:59 | |||
#BSUB -o o_name_of_job_1.%J | |||
#BSUB -e e_name_of_job_1.%J | |||
#BSUB -u youremail@iciq.es | |||
#BSUB -R"span[ptile=16]" | |||
### Load environment variables ########### | |||
module load VASP/5.3.3 | |||
### Run job ############################## | |||
resurrection_timecontrol 23 30 r_name_of_job_1 & | |||
mpirun vasp.complex ; touch stopflag ; resurrection name_of_job_ 1 16c 64 23 30 ; echo the dynamics has been resurrected >> r_name_of_job_1 ; exit | |||
'''Explanation''': | |||
*In this example, you are running with 64 processors on class_c queue. | |||
*You have three quickly accesible log files: o_* is the standard output, e_* contains the errors, and r_* contains the information related to the resurrection process. | |||
*Your time limit will be 23:59 hours, the maximum allowed by class_c is 24:00 | |||
*Before starting VASP, you will lauch resurrection_timecontrol, which will stop the calculation after 23:30 hours, via STOPCAR (LSTOP = .TRUE.). | |||
*Then the script will execute VASP on your local folder. | |||
*If the VASP calculation ends abruptly before the time limit, it will deliver a signal (stopflag) that will kill "resurrection_timecontrol", avoiding a phantom job to stay on the line for hours. | |||
*Now the calculation will be resurrected with the name name_of_job_2, on folder 2 (see script 1 for more details) on the same queue with the same number of processors and the same time control. This script will call internally to rungen_resurrection, but you can merge them if you prefer. | |||
*This set of scripts is totally self-contained. | |||
*Tested and debugged. | |||
Now that you know how this work, begin to calculate by typing: | |||
you@login1:~/my_dynamics/1> bsub < run.sh | |||
Do not forget to baby-sit your calculations every day, and verify that all your electronic cycles have been converged. | |||
== Script 1: resurrection == | == Script 1: resurrection == | ||
| Line 117: | Line 160: | ||
while [ $timenow -lt $timeend ] ; do | while [ $timenow -lt $timeend ] ; do | ||
if [ -e stopflag ] ; then rm stopflag ; echo resurrection: VASP finished normally at $timenow >> $3 ; exit ; fi | |||
sleep 5s # Verify status each 5 seconds | |||
timenow=`date +'%s'` | timenow=`date +'%s'` | ||
done | done | ||
Latest revision as of 15:20, 9 August 2013
go back to Main Page, Group Pages, Núria_López_and_Group, Scripts_for_VASP
Setting up the calculation[edit]
First, put the three scripts listed here in your ~/bin/ folder in MareNostrum, and do chmod +x to make them executables.
Now, create the main directory for your dynamics, and the first folder, such as this:
you@login1:~> mkdir my_dynamics you@login1:~> mkdir my_dynamics/1 you@login1:~> cd my_dynamics/1
Now, put in the folder 1 your basic input files (INCAR, KPOINTS, POTCAR and POSCAR). Your first run.sh script should be like this (running in class_c queue):
#!/bin/bash #BSUB -J name_of_job_1 #BSUB -q class_c #BSUB -n 64 #BSUB -W 23:59 #BSUB -o o_name_of_job_1.%J #BSUB -e e_name_of_job_1.%J #BSUB -u youremail@iciq.es #BSUB -R"span[ptile=16]" ### Load environment variables ########### module load VASP/5.3.3 ### Run job ############################## resurrection_timecontrol 23 30 r_name_of_job_1 & mpirun vasp.complex ; touch stopflag ; resurrection name_of_job_ 1 16c 64 23 30 ; echo the dynamics has been resurrected >> r_name_of_job_1 ; exit
Explanation:
- In this example, you are running with 64 processors on class_c queue.
- You have three quickly accesible log files: o_* is the standard output, e_* contains the errors, and r_* contains the information related to the resurrection process.
- Your time limit will be 23:59 hours, the maximum allowed by class_c is 24:00
- Before starting VASP, you will lauch resurrection_timecontrol, which will stop the calculation after 23:30 hours, via STOPCAR (LSTOP = .TRUE.).
- Then the script will execute VASP on your local folder.
- If the VASP calculation ends abruptly before the time limit, it will deliver a signal (stopflag) that will kill "resurrection_timecontrol", avoiding a phantom job to stay on the line for hours.
- Now the calculation will be resurrected with the name name_of_job_2, on folder 2 (see script 1 for more details) on the same queue with the same number of processors and the same time control. This script will call internally to rungen_resurrection, but you can merge them if you prefer.
- This set of scripts is totally self-contained.
- Tested and debugged.
Now that you know how this work, begin to calculate by typing:
you@login1:~/my_dynamics/1> bsub < run.sh
Do not forget to baby-sit your calculations every day, and verify that all your electronic cycles have been converged.
Script 1: resurrection[edit]
#!/bin/bash # Rodrigo García-Muelas # 28/03/2013 # # Input: # $1 Name of work # $2 Suffix (number id) # $3 Queue # $4 Number of processors # $5 Number of hours of runtime # $6 Extra number of minuts of runtime # # Motivation: I create a directory for the next step. # Then, I create the new run.sh, which shall call this script # And send # run.sh has an internal time control i=$(($2+1)) mkdir ../$i cp ./INCAR ../$i/INCAR cp ./KPOINTS ../$i/KPOINTS cp ./CONTCAR ../$i/POSCAR cp ./POTCAR ../$i/POTCAR mv ./WAVECAR ../$i/WAVECAR mv ./CHGCAR ../$i/CHGCAR rm ./CHG cd ../$i/ rungen_resurrection $1 $i $3 $4 $5 $6 # generate run.sh bsub < run.sh # submit run.sh exit
Script 2: rungen_resurrection[edit]
#!/bin/bash # Rodrigo García-Muelas # 28/03/2013 # # Input: # $1 Name of work # $2 Suffix (number id) # $3 Queue # $4 Number of processors # $5 Runtime hours # $6 Runtime minutes (add) # # Motivation: I create a directory for the next step. # Then, I create the new run.sh, which shall call this script case $3 in 16a) queue=class_a ; mar=1 ; procqueue=16 ; maxhours=47 ;; 16b) queue=class_b ; mar=1 ; procqueue=16 ; maxhours=22 ;; # maybe they give priority to shorter works 16c) queue=class_c ; mar=1 ; procqueue=16 ; maxhours=22 ;; # idem *) echo "Error in queue name!!! " ; exit ;; esac # Comprobate if the number of processors is correct let AAA=`expr $4 % $procqueue` ; if [ 0 != $AAA ] ; then exit 1 ; fi # number of processars right? # Generating the run.sh file cat >run.sh<<! #!/bin/bash #BSUB -J $1$2 #BSUB -q $queue #BSUB -n $4 #BSUB -W $5:59 #BSUB -o o_$1$2.%J #BSUB -e e_$1$2.%J #BSUB -u rgarcia@iciq.es #BSUB -R"span[ptile=16]" ### Load environment variables ########### module load VASP/5.3.3 ### Run job ############################## resurrection_timecontrol $5 $6 r_$1$2 & mpirun vasp.complex ; touch stopflag ; resurrection $1 $2 $3 $4 $5 $6 ; echo the dynamics has been resurrected >> r_$1$2 ; exit !
Script 3: resurrection_timecontrol[edit]
#!/bin/bash # # Rodrigo García-Muelas # Improved on May 17th, 2013 # # INPUT # # $1 number of hours + # $2 number of minutes # (before generating file STOPCAR) # $3 name of file # # INTERNAL # # timeini : The calculus starts # timeend : The calculus ends # timenow : Current time timeini=`date +'%s'` timenow=$timeini timeend=$(($timeini+3600*$1+60*$2)) echo resurrection flags are timeini $timeini timeend $timeend >> $3 # If VASP finishes before timeend, kill this process while [ $timenow -lt $timeend ] ; do if [ -e stopflag ] ; then rm stopflag ; echo resurrection: VASP finished normally at $timenow >> $3 ; exit ; fi sleep 5s # Verify status each 5 seconds timenow=`date +'%s'` done # If timeend is reached, write STOPCAR echo resurrection: writing STOPCAR at $timenow >> $3 cat >STOPCAR<<! LSTOP = .TRUE. ! exit