Running Jobs via Globus
From BeSTGRID
Contents |
[edit] Getting Grid certificate
If you have never used BeSTGRID before, you need to do these steps once only:
- Submit a request for a certificate
- Meet a RAO (Request Authority Operator) in your area and provide him a photo ID.
- Retrieve a public key of your certificate and apply for VO membership via Grix tool (par. 5-7)
[edit] Grid Client Host
Originally we will provide accounts on a machine with Globus clients and necessary software installed. To run jobs from the client host users will need to do the following.
Every time you wish to login and submit jobs, you need to do these steps:
- Activate MyProxy(par. 8)
- Login on the Grid client Host: SSH to gridclient.auckland.ac.nz (you will need to ask Yuriy for a login)
- Activate MyProxy credentials - replace "user.name" with your username:
[user@gridclient ~]$ myproxy-logon -l user.name -s myproxy.arcs.org.au Enter MyProxy pass phrase: A credential has been received for user user.name in /tmp/x509up_u514.
- Transfer any required files. (See below.)
- Submit your job. (See below.)
- ...
- Collect output.
Various useful tools to interface with grid are described here: Grid Tools
[edit] Examples
[edit] Transfering Files
globus-url-copy is a command line client that can be used to transfer files via GridFTP to the gateway ( and to the cluster via /home/grid-besgrid directory which is NFS share).
globus-url-copy file:///home/yhal003/something.tar.gz gsiftp://ng2.auckland.ac.nz/home/grid-bestgrid/something.tar.gz
local location is specified with full path, and file:// protocol. Remote location on the gateway needs gsiftp:// prefix, gateway domain name (currently ng2.auckland.ac.nz though we may switch to ng2.auckland.ac.nz later) and path. Files can be transferred to and from other gridftp servers. Both arguments to globus-url-copy can be remote locations.
GUI GridFTP clients also exist for example http://www.cs.virginia.edu/~gsw2c/GridToolsDir/Documentation/GridFtpClients.htm
[edit] Submitting Jobs via Command Line
For simple non CPU intensive tasks it may be easier to submit them on the gateway directly. Since gateway and cluster share home directory, it is convinient to unpack files submitted via grid ftp, for example:
globus-url-copy file:///home/yhal003/something.tar.gz gsiftp://ng2.auckland.ac.nz/home/grid-bestgrid/something.tar.gz globusrun-ws -submit -s -J -S -F ng2.auckland.ac.nz -Ft Fork -c /bin/tar -xzvf something.tar.gz
The first command transfers something.tar.gz to the /home/grid-bestgrid (directory shared by both gateway and cluster). Second command untars this file. The command and arguments are specified after -c option. Here:
- -F specifies gateway machine, for Auckland cluster it is ng2.auckland.ac.nz
- -Ft specifies type of task.
- PBS for jobs on the cluster.
- Fork for jobs running directly on the gateway. Good for "interactive" commands like ls, because jobs do not go through cluster queue and are scheduled almost immediately. Not good for process intensive jobs.
- nothing (no flag at all) for multijobs.
Additional details of the globusrun-ws options can be found in the globusrun-ws manual.
To submit jobs on the cluster, you need to describe them in xml format (RSL) and submit via globusrun-ws command.
More examples can be found here: http://wiki.arcs.org.au/bin/view/APACgrid/TestSuite
RSL documentation:
[edit] Simple Job Submission
The following RSL describes run of /bin/hostname on single machine. Save it in test1.rsl file and execute
globusrun-ws -submit -s -J -S -F ng2.auckland.ac.nz:8443 -Ft PBS -f test1.pbs
<job> <executable>/bin/hostname</executable> <jobType>single</jobType> </job>
[edit] Use of Environmental Variables, Arguments, and Standard Input/Output
To specify files for standard error and standard output modify globusrun-ws command line (-so and -se options).
<job> <executable>/bin/env</executable> <argument>TEST3='This variable was passed via command line arguments'</argument> <environment> <name>TEST1</name> <value>This value should appear on standard output </value> </environment> <environment> <name>TEST2</name> <value>And this one too</value> </environment> <jobType>single</jobType> </job>
[edit] Submitting Multiple Jobs
It is easy to submit more then one job in the same file by putting <job> descriptions in <multiJob> tag.
<multiJob> <job> ... </job> <job> ... </job> </multiJob>
The only difference is, the submission command line does not have -Ft flag, and each individual job should have the following at the start:
<job>
<factoryEndpoint
xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
<wsa:Address>
https://ng2.auckland.ac.nz:8443/wsrf/services/ManagedJobFactoryService
</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>PBS</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
...
MultiJob is useful if you want to synchronize on completion of job set, as globusws-run will only complete when all jobs complete.
[edit] Another method
Note - This mechanism is very strange and processes submited in this way will not be able to identify themselves or to locate peers. Communication can be arranged either by central server that will know about all jobs, some middleware like MPI (see next example) or some other means. Also see limits section. For most cases it is better to use multiJob or mpi jobs.
- hostCount will determine number of nodes, and count - number of processes. For best performance it is better if number of processes is no more then number of cores requested. If process count is significantly larger there can be problems (see below).
<job> <executable>/bin/hostname</executable> <count>2</count> <hostCount>2</hostCount> <queue>default@hpc-bestgrid.auckland.ac.nz</queue> <jobType>multiple</jobType> </job>
[edit] Submitting MPI jobs
MPI environment will take care of process communication and identification. Also because the internal mechanism for job submission is different from normal multijobs, the limits of processes are larger (but the performance still suffers).
<job> <executable>test</executable> <count>4</count> <hostCount>4</hostCount> <directory>/home/grid-bestgrid/MPI/</directory> <queue>default@hpc-bestgrid.auckland.ac.nz</queue> <jobType>mpi</jobType> </job>
[edit] How To Monitor Job Execution
This link shows Auckland cluster statistics: UoA Rocks cluster statistics
List of jobs executed and in the queue can be found here: List Of Jobs
The "name" column can be set from job description file by appending the following to the end, before closing </job> tag:
<extensions>
<jobname>Simple-Job-Name</jobname>
</extensions>
For example:
<job>
<executable>sleep</executable>
<directory>/tmp</directory>
<argument>10000</argument>
<jobType>single</jobType>
<extensions>
<jobname>simplename</jobname>
</extensions>
</job>
[edit] How To Query Job State From Command Line
Job status can be discoved by saving "job handle" in a file during submission and than using this handle in various queries.
To save job handle, add "-o test.epr" to globusrun-ws command, for example
globusrun-ws -submit -s -F ng2.auckland.ac.nz:8443 -Ft PBS \
-o test.epr -c echo "hello world" test.epr can be any filename. This file can be used to query job status from different terminal. You can also add "-b" flag for batch submission, so that globusrun-ws returns immediately.
- To find job state:
wsrf-query -e test.epr '//*[local-name()="state"]/text()'
- "active" for running job, "pending" for job in the queue. When job is finished this command returns error.
- you can substitute state with executable, count, userSubject etc.
- To get all job parameters, just run
wsrf-query -e test.epr
- It should give lots of unreadble XML unless job is finished.
- To find the queue status:
wsrf-query -s https://ng2.auckland.ac.nz:8443/wsrf/services/DefaultIndexService |grep Jobs
