Setup Grid at University of Canterbury
From BeSTGRID
The Grid virtual machine is a system with client part of Globus Toolkit installed, which the users can use to submit jobs to the grid. The system also has PBS client tools installed, which allows users to submit jobs locally to the BeSTGRID prototype cluster. For general setup, we follow the rules for Vladimir:Bootstrapping a virtual machine
Contents |
[edit] Setup client PBS
- Get Torque client binaries - follow Vladimir:Setup NGCompute#Setup PBS
- add to /etc/services
pbs 15000/tcp # added by Vladimir Mencl pbs_dis 15001/tcp # added by Vladimir Mencl pbs_dis 15001/udp # added by Vladimir Mencl pbs_mom 15002/tcp # added by Vladimir Mencl pbs_mom 15003/udp # added by Vladimir Mencl pbs_mom 15003/tcp # added by Vladimir Mencl pbs_sched 15004/tcp # added by Vladimir Mencl
- compile PBS
./configure --disable-mom --disable-server make make install
- configure server name: /var/spool/torque/server_name
ngcompute.canterbury.ac.nz
- allow remote job submission at ngcompute
- allow ngcompute to scp results back - see Vladimir:Setup NGCompute#Permit grid and ng2 to submit jobs to ngcompute
- allow incoming mail: In /etc/mail/sendmail.cf change the following:
O DaemonPortOptions=Port=smtp,Addr=0.0.0.0, Name=MTA # O DaemonPortOptions=Port=smtp,Addr=127.0.0.1, Name=MTA
- Reason: PBS occasionally send email back to submitting user.
[edit] Install LAM (to compile)
http_proxy=http://gridws1:3128 yum install lam
[edit] Setup NFS shared homes
[edit] Configuration
- /etc/exports:
/export grid.canterbury.ac.nz(rw,async,no_root_squash) ngcompute.canterbury.ac.nz(rw,async,no_root_squash) ng2.canterbury.ac.nz(rw,async,no_root_squash)
- /etc/fstab (and same on ngcompute and ng2):
grid.canterbury.ac.nz:/export/home /home nfs fg,retry=20,hard 0 0 grid.canterbury.ac.nz:/export/opt/shared /opt/shared nfs fg,retry=20,hard 0 0
[edit] Services
When bootstrapping a virtual machine, a lot of services was turned of. The following services must be turned on on an NFS server:
chkconfig portmap on chkconfig nfs on chkconfig nfslock on chkconfig rpcidmapd on chkconfig netfs on service portmap start service nfs start service nfslock start service rpcidmapd start service netfs start
As some of the exported directories are mounted locally, the starting order has to be changed to start NFS server before netfs: changing /etc/rc.d/init.d/nfs to change the start order from 60 to 20 (must start before netfs @ 25) and kill order from 20 to 80 (must kill after netfs @ 75).
# chkconfig: - 20 80 ### chkOLDconfig: - 60 20
To put these changes into effect, do:
chkconfig nfs reset chkconfig nfs on
[edit] Machine startup dependencies
Note that in order for these shares to be available, grid must be started before ng2 and ngcompute. This has been achieved with lexicographical ordering of virtual machine (config file) names, and xendomains was modified to use reverse shutdown order.
For unknown reason, even
mount -o fg,retry=999,retrans=200 grid.canterbury.ac.nz:/export/home /home/
fails with
mount: mount to NFS server 'grid.canterbury.ac.nz' failed: System Error: No route to host.
(instead of waiting indefinitely).
