MPI Error message
go back to Main Page, Computational Resources, Information & Help, Common Errors & Solutions
There is another page about MPI Errors: MPI Errors, please merge it with this one.
Extracted from:
http://www.hpcc.nectec.or.th/wiki/index.php/MPI_Error_messsage
Possible Error Messages[edit]
- If you get the error message:
Could not find enough machines for architecture LINUX
I was adviced to use mpirun with the -nolocal flag for example
mpirun -np 2 -nolocal -machinefile <MACHINE_FILE> <PROGRAM>
- What does this error message mean:
rm_10204: p4_error: semget failed for setnum: 0
You are running on the interactive nodes using the MPICH module mpich/1.2.5.10-intel
(as you should!) and have exhausted the shared memory resources.
Kill all your MPI processes on the interactive nodes and run these commands
on the interactive nodes and the login node:
#ipcs -m | awk '/^ *0x/ {print $2 }' | xargs -n 50 ipcrm shm
#ipcs -s | awk '/^ *0x/ {print $2 }' | xargs -n 50 ipcrm sem
(The command ipcs -a lists the use memory resources.)
If this does not help, contact pdc-staff. It may be other users that have exhausted the resources.
- What does p4_error: alloc_p4_msg failed: 0 mean?
p0_6773: (7.828703) xx_shmalloc: returning NULL; requested 1048616 bytes p0_6773: (7.828762) p4_shmalloc returning NULL; request = 1048616 bytes You can increase the amount of memory by setting the environment variable P4_GLOBMEMSIZE (in bytes); the current size is 3048616 p0_6773: p4_error: alloc_p4_msg failed: 0
The default P4_GLOBMEMSIZE for has been set to the maximum size the amount of memory in the compute nodes will allow, but if you reset it, you may see errors like this. The P4_GLOBMEMSIZE variable must be set to much larger than the amount of memory the program is requesting. The current default size is 32000000. This can be reset by typing this:
export P4_GLOBMEMSIZE=32000000 (for bash users) setenv P4_GLOBMEMSIZE 32000000 (for csh or tcsh users)
- What does libcprts.so.5: cannot open shared object file: No such file or directory mean?
/home/jbrandt/tests/test.exe: error while loading shared libraries: libcprts.so.5: cannot open shared object file: No such file or directory p0_792: p4_error: Child process exited while making connection to remote process on compute-0-0.local: 0 /opt/mpich/intel/bin/mpirun: line 1: 792 Broken pipe /home/jbrandt/tests/test.exe - p4pg /home/jbrandt/tests/PI646 -p4wd /home/jbrandt/tests
This means you did not statically link the binary using the -static flag. Compile your programs using the -static flag, like this. Why do I get lots of errors trying to compile C++ programs using the Intel compilers? If you get lots and lots of errors while trying to compile C++ code with the Intel mpiCC, when the code compiles properly with the Gnu mpiCC, you should use the intel mpicc instead. It will compile both C and C++ code, and appears to work properly.
- What does p4_error: semget failed for setnum: 0 mean?
p4_error: semget failed for setnum: 0
This means that the maximum number of allowed semaphores on the master node has been created, and the program you are trying to run cannot allocate a new semaphore for inter-process communication. This can happen when somebody has been testing software that does not exit properly, leaving semaphores and shared memory segments allocated. If the leftover semaphores are owned by you, it can be fixed by running the following command:
/usr/local/mpich-1.2.7/sbin/cleanipcs