APAC Test Suite For Auckland Gateway

From BeSTGRID

Jump to: navigation, search

Not all of the tests are going to work out of the box, so here are the modifications and notes.


[edit] Test 1

Pass

The exact command for test gateway:

globusrun-ws  -submit -s   -S -F ng2test.auckland.ac.nz:9443 -Ft Fork -c /usr/bin/whoami

Have to set valid gateway name and port number (the default 8443 is used by apache).

[edit] Test 2

Pass

Exact command:

 globusrun-ws -submit -s -J -S -F ng2test.auckland.ac.nz:9443 -Ft PBS -f gt1.rsl


Passing standard input and error between compute nodes and gateway requires either passwordless access between compute nodes and gateway, or using shared file system and configuring PBS to use cp instead of scp. We selected second option.

Shared file system is NFS mount from frontnode in /home/bestgrid-grid Relevant PBS configuration file on cluster is /opt/torque/mom_priv/config with following entry:

$usecp *:/home /home

Without the changes above I see the following error in /var/log/messages on compute nodes:

 May  7 17:15:37 compute-0-9 pbs_mom: sys_copy, command '/usr/bin/scp -rpB /opt/torque/spool/52.cluster.hpc.org.ER yhal003@ng2test.auckland.ac.nz:/  
home/yhal003/NG2Tests/STDIN.e52' failed with status=1, giving up after 4 attempts

The RSL file that works in our case would look like this:

<job>
    <factoryEndpoint
            xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"
            xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
        <wsa:Address>
            https://ng2test.auckland.ac.nz:9443/wsrf/services/ManagedJobFactoryService
        </wsa:Address>
        <wsa:ReferenceProperties>
            <gram:ResourceID>PBS</gram:ResourceID>
        </wsa:ReferenceProperties>
    </factoryEndpoint>
 <executable>/bin/hostname</executable> 
<stdout>test.out</stdout>
<count>10</count> 
<jobType>multiple</jobType> 
   <extensions>
       <resourceAllocationGroup>
               <hostCount>10</hostCount>
               <cpusPerHost>8</cpusPerHost>
               <processCount>80</processCount>
       </resourceAllocationGroup>
   </extensions>
</job>

i.e. the extensions section specifies hostCount and other stuff as well as using count and jobType=multiple tags.

If the extensions section is not used, the following error shows up:

-bash-3.00$ globusrun-ws -submit -s -J -S -F ng2test.auckland.ac.nz:9443 -Ft PBS -f gt1.rsl
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:ce4e34ce-1ca8-11dd-acae-00163e000004
Termination time: 05/09/2008 02:45 GMT
Current job state: Pending
Current job state: Active
Current job state: CleanUp-Hold
/bin/sh: /home/yhal003/.globus/ce988b50-1ca8-11dd-9baf-c04f6affd4b6/scheduler_pbs_cmd_script: No such file or directory
bash: /home/yhal003/.globus/ce988b50-1ca8-11dd-9baf-c04f6affd4b6/exit.0: No such file or directory
/bin/touch: cannot touch `/home/yhal003/.globus/ce988b50-1ca8-11dd-9baf-c04f6affd4b6/exit.0': No such file or directory
/opt/torque/mom_priv/jobs/110.cluster.hpc.org.SC: line 52: /home/yhal003/.globus/ce988b50-1ca8-11dd-9baf-c04f6affd4b6/exit.0: No such file or  directory 
/opt/torque/mom_priv/jobs/110.cluster.hpc.org.SC: line 53: [: too many arguments
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.


[edit] Test 3