MPICH: Starting a Global MPD Ring
This is part four of a multi-part tutorial on installing and configuring MPICH2. The full tutorial includes
- Installing MPICH
- MPICH: Pick Your Paradigm
- MPICH without Torque Functionality
- MPICH: Starting a Global MPD Ring
- MPICH: Troubleshooting the MPD
Before you set up a ring of root mpd daemons, make sure MPICH is working correctly on a single machine. See the MPICH without Torque Functionality page for more information.
Contents |
Mpd.conf Files
You will absolutely need mpd.conf files for any users and for root on each of the worker nodes in order for this to work. If you don't already have this set up, you can follow the instructions on the MPICH without Torque Functionality page to do.
Password-Less SSH
Password-less SSH will also need to be setup for all users. See the Password-less SSH for Users page for information on how to do this.
Starting the Mpd Ring
Starting the First Node
Once you have /etc/mpd.conf in place on all of your worker nodes, an mpd daemon needs to be started on each one of the worker nodes. These will be used to manage any MPI processes. The first node started up serves as a kind of focal point for all of the other mpd's. For this reason, it's important to choose (and remember) a specific node as the head MPD node.
Start by ssh'ing into this special first node, and then running
mpd --daemon --ncpus=<# CPUs>
The --daemon part specifies that this should be run in the background, and that the process shouldn't be killed when the SSH session ends.
Next, in order to know where exactly this daemon is running, in order to have other daemons attach to it, run mpdtrace -l as shown below:
owl:~# mpdtrace -l owl_60519 (192.168.1.202)
You'll need the value after the underscore (_): this is the random port that the daemon is waiting for communication on.
Starting the Other Nodes
Then, on the other nodes, a slightly more complicated mpd command is needed:
mpd --daemon --host=<your first host> --port=<port found with mpdtrace> --ncpus=<# CPUs>
Do this one at a time on each of the other worker nodes, or see the Cluster Time-saving Tricks page to learn how to script it up. If you have any trouble, the MPICH: Troubleshooting the MPD page might help.
Checking the MPD Ring
Once you've started up an mpd daemon on each one of the worker nodes, ssh into one of the worker nodes and run
mpdtrace
This will show you all of the hosts currently hooked up as part of the ring. All of the worker nodes should be listed here. To get a quick count, run
mpdtrace | wc -l
If any are missing, those nodes should be further investigated and attempt made again to start up an mpd daemon on them.
Sanity Check: Running an MPI Program on Multiple Nodes
After the ring has been set up, it's finally time to try running an MPI job on multiple nodes. SSH into one of the worker nodes, become one of your user accounts, and follow the instructions at Creating and Compiling an MPI Program.
As when running multiple processes on the same machine, run the program with mpiexec. First, specify a number of processes smaller than or equal to the number of cpus you specified for this worker node. In my case, that's four or less.
mpiexec -np 4 ./hello.out
You should see the same hostname listed for all of the processes. This is because the mpd daemon will use all available CPUs on the host you're running on before branching out to CPUs on other hosts. To see this spread further than just one machine, ramp up the number of processes to higher than the number of cpus on this host.
mpiexec -np 7 ./hello.out
You should now be seeing different hostnames appearing in the list. The mpd on this machine automatically contacts other mpds in the ring when the host it's running on runs out of CPUs. (In MPICH1, you would have needed to specify this with a machinefile.) Pretty cool!