Infi: Difference between revisions

From Wiki
Jump to navigation Jump to search
Line 29: Line 29:
We run SGE 6.1, that has many new features, like boolean expressions for hosts, queues, etc.
We run SGE 6.1, that has many new features, like boolean expressions for hosts, queues, etc.
[http://docs.sun.com/app/docs/doc/820-0699?l=en&q=sge+6.1 a manual here]
[http://docs.sun.com/app/docs/doc/820-0699?l=en&q=sge+6.1 a manual here]
p.e. to see your jobs
e.g. to see your jobs


   qstat -u my_username
   qstat -u my_username

Revision as of 10:43, 4 February 2009

11 Dual Intel Xeon processor at 3.06GHz


Access

ssh -X 10.3.30.254

User details

/home is on infi (not in sirius as kimik's and tekla's clusters). Backups are for files size lower than 10 Mb.

There is a quota limit

  20 Gb


Queues

There is a single queues called "n" (without quotes). This queue has 12 nodes (except when some nodes have hardware problems) with 2 processors each. So you ask for slots (NCPUS) and system will allocate for each node 2 slots, p.e. you ask for 8 slots and system will take 4 nodes as it uses 2 slots by node. (can someone write this clearly?, because I can even barely understand it... :)) OK, I'll try to write it better: Because each node has two processors, you should ideally ask for an even number of CPU to optimize the performance. :-) So you should send jobs asking for even slots (NCPUS).

To send jobs use

     qsub name_of_your_script



We run SGE 6.1, that has many new features, like boolean expressions for hosts, queues, etc. a manual here e.g. to see your jobs

  qstat -u my_username

to see all jobs

  qstat -u *

to see how many slots (quota) are you using and have you available

 qquota -u my_username

IMPORTANT:

As we still have some problems, to kill or delete jobs please use "qd":

  qd job_id program
  while "program" could be: adf nwchem dlpoly

for example, to delete job_id 4000 that runs a adf process (adf.exe) use:

  qd 4000 adf

for DLPOLY:

  qd 4000 dlpoly



Below you can see examples of script for ADF, NWChem, DL_POLY, etc.



Avaliable Programs

DLPOLY 2

NWChem

ADF 2006

ADF 2006 Script

#! /bin/bash
# queue system setup:
# pe request
#$ -pe n 4
#
#MPI stuff
export MPIDIR=/opt/mpi
export PATH=$MPIDIR:$MPIDIR/bin:$PATH
export LD_LIBRARY_PATH=$MPIDIR/lib
export P4_RSHCOMMAND=rsh
export SCM_MACHINEFILE=$TMPDIR/machines
export SCMWISH=""
export NSCM=4
export P4_GLOBMEMSIZE=16000000
export GFORTRAN_UNBUFFERED_ALL=y
export TMPDIR=/scratch
#
# ADF Stuff
export TMPDIR=/scratch
export ADFHOME=/opt/adf2006.01
export ADFBIN=$ADFHOME/bin
export ADFRESOURCES=$ADFHOME/atomicdata
export SCMLICENSE=$ADFHOME/license
export SCM_TMPDIR=$TMPDIR
export SCM_USETMPDIR=yes
#
cd /home/ezuidema/test
#
$ADFBIN/adf -n $NSCM <  test.in  > test.out
#
mv TAPE21 test.t21

ADF 2006 Scaling

DLPOLY Script

#! /bin/bash
# queue system setup:
# pe request
#$ -pe n0 8

export LD_LIBRARY_PATH=$MPIDIR/lib
export DLPOLYPATH=/home/pmiro/dlpoly_MPI/execute
export P4_RSHCOMMAND=rsh
export MACHINEFILE=$TMPDIR/machines
export NCPUS=8

cd /home/pmiro/Bola/SO4Simulation/DownTemp/02/

/opt/mvapich-0.9.9/bin/mpirun -np $NCPUS -machinefile MACHINEFILE $DLPOLYPATH/DLPOLY.X

DLPOLY Scaling

System with 27336 Atoms

      Shared Nodes 
NCPUS    %     Time 1ns (Days)
  1     100         52
  2     203         26
  3     268         19
  4     369         14
  5     428         12
  6     465         11
  7     499         10
  8     557          9
  9     565          9
 10     702          8
 11     732          9
      Non-Shared Nodes 
NCPUS    %     Time 1ns (Days)
  1     100         52
  2     196         26
  4     368         14

NWChem Script

#! /bin/bash
# queue system setup:
# pe request
#$ -pe n 4

#MPI INTEL
export MPIDIR=/opt/mpi/
export PATH=$MPIDIR:$MPIDIR/bin:$PATH
export LD_LIBRARY_PATH=$MPIDIR/lib
export P4_RSHCOMMAND=rsh
export MACHINEFILE=$TMPDIR/machines
export NCPUS=4
export NWCHEM_BASIS_LIBRARY=/opt/nwchem/data/libraries/

cd /home/abraga/TESTS

/opt/intel/mpi/bin/mpirun -np $NCPUS -machinefile $MACHINEFILE /opt/nwchem/bin/nwchem job.nw >& job.out