MPI Error message: Difference between revisions

From Wiki
Jump to navigation Jump to search
New page: Back to Main Page Extracted from: http://www.hpcc.nectec.or.th/wiki/index.php/MPI_Error_messsage ==Possible Error Messages== * If you get the error message: Could not find enou...
 
No edit summary
Line 1: Line 1:
Back to [[Main Page]]
Back to [[Main Page]], [[Computational Resources]], [[Information & Help]]


Extracted from:
Extracted from:

Revision as of 09:18, 12 June 2009

Back to Main Page, Computational Resources, Information & Help

Extracted from:

  http://www.hpcc.nectec.or.th/wiki/index.php/MPI_Error_messsage

Possible Error Messages

  • If you get the error message:
Could not find enough machines for architecture LINUX

I was adviced to use mpirun with the -nolocal flag for example

mpirun -np 2 -nolocal -machinefile <MACHINE_FILE> <PROGRAM>
  • What does this error message mean:
rm_10204:  p4_error: semget failed for setnum: 0
You are running on the interactive nodes using the MPICH module mpich/1.2.5.10-intel 
(as you should!) and have exhausted the shared memory resources. 
Kill all your MPI processes on the interactive nodes and run these commands 
on the interactive nodes and the login node: 
#ipcs -m | awk '/^ *0x/ {print $2 }' | xargs -n 50 ipcrm shm
#ipcs -s | awk '/^ *0x/ {print $2 }' | xargs -n 50 ipcrm sem
(The command ipcs -a lists the use memory resources.) 
If this does not help, contact pdc-staff. It may be other users that have exhausted the resources. 
  • What does p4_error: alloc_p4_msg failed: 0 mean?
p0_6773: (7.828703) xx_shmalloc: returning NULL; requested 1048616 bytes
p0_6773: (7.828762) p4_shmalloc returning NULL; request = 1048616 bytes 
You can increase the amount of memory by setting the environment variable
P4_GLOBMEMSIZE (in bytes); the current size is 3048616 
p0_6773: p4_error: alloc_p4_msg failed: 0

The default P4_GLOBMEMSIZE for has been set to the maximum size the amount of memory in the compute nodes will allow, but if you reset it, you may see errors like this. The P4_GLOBMEMSIZE variable must be set to much larger than the amount of memory the program is requesting. The current default size is 32000000. This can be reset by typing this:

export P4_GLOBMEMSIZE=32000000 (for bash users) 
setenv P4_GLOBMEMSIZE 32000000 (for csh or tcsh users)
  • What does libcprts.so.5: cannot open shared object file: No such file or directory mean?
/home/jbrandt/tests/test.exe: error while loading shared libraries:
libcprts.so.5: cannot open shared object file: No such file or directory
p0_792: p4_error: Child process exited while making connection to remote
process on compute-0-0.local: 0
/opt/mpich/intel/bin/mpirun: line 1: 792 Broken pipe /home/jbrandt/tests/test.exe - 
p4pg /home/jbrandt/tests/PI646 -p4wd /home/jbrandt/tests

This means you did not statically link the binary using the -static flag. Compile your programs using the -static flag, like this. Why do I get lots of errors trying to compile C++ programs using the Intel compilers? If you get lots and lots of errors while trying to compile C++ code with the Intel mpiCC, when the code compiles properly with the Gnu mpiCC, you should use the intel mpicc instead. It will compile both C and C++ code, and appears to work properly.

  • What does p4_error: semget failed for setnum: 0 mean?
p4_error: semget failed for setnum: 0

This means that the maximum number of allowed semaphores on the master node has been created, and the program you are trying to run cannot allocate a new semaphore for inter-process communication. This can happen when somebody has been testing software that does not exit properly, leaving semaphores and shared memory segments allocated. If the leftover semaphores are owned by you, it can be fixed by running the following command:


 /usr/local/mpich-1.2.7/sbin/cleanipcs