From tskim at sc.tamu.edu Tue Jan 22 15:13:27 2008 From: tskim at sc.tamu.edu (Taesung Kim) Date: Tue, 22 Jan 2008 16:13:27 -0600 Subject: [Ccsm-users] [q]installation test of ccsm 3.0 on ibm Message-ID: <47966A87.8090109@sc.tamu.edu> Hello, I have been trying to install ccsm3.0 on ibm machine and now trying 'testing' part of user manual, page 54. I followed step 1 through step 5, but, I can't find batch file for step 6. Where is that file? -------- 1. Create a new directory for testing: > mkdir -p /ptmp/$USER/tstinstall 2. Go to the CCSM scripts directory $CCSMROOT/ccsm3/scripts/ > cd $CCSMROOT/ccsm3/scripts/ 3. Execute create test for each test case: > ./create_test -testname TER.01a.T31_gx3v5.B.bluesky -testroot /ptmp/$USER/tstinstall > ./create_test -testname TDB.01a.T31_gx3v5.B.bluesky -testroot /ptmp/$USER/tstinstall > ./create_test -testname TBR.01a.T31_gx3v5.B.bluesky -testroot /ptmp/$USER/tstinstall > ./create_test -testname TBR.02a.T31_gx3v5.B.bluesky -testroot /ptmp/$USER/tstinstall > ./create_test -testname THY.01a.T31_gx3v5.B.bluesky -testroot /ptmp/$USER/tstinstall > ./create_test -testname THY.02a.T31_gx3v5.B.bluesky -testroot /ptmp/$USER/tstinstall 4. Go to the test directory: > cd /ptmp/$USER/tstinstall/ 5. Edit the test scripts to modify the default batch queue (optional): > vi ./T*/*test 6. Build and submit all of the tests: > ./batch.bluesky ---------- And, I tried what the manual said in page 52, and submitted job to load leveler, but, it didn't run through. error file from LL is ----------- INFO: DEBUG_LEVEL changed from 0 to 4 D1: ./host.list file did not exist D1: node allocation strategy = 1 INFO: 0031-364 Contacting LoadLeveler to query information for batch job D1: 01/22 14:56:05.783896 Calling ll_init_job. D1: 01/22 14:56:05.891872 ll_init_job returned. D1: 01/22 14:56:05.891934 Calling ll_get_job. D1: 01/22 14:56:05.893427 ll_get_job returned. D4: 01/22 14:56:05.968643 LL_StepTaskInstanceCount returned by LL was 1 D1: 01/22 14:56:05.968727 Job key assigned by LoadLeveler = 7121 D4: 01/22 14:56:05.968780 LL_StepBulkXfer returned by LL was 0 D4: 01/22 14:56:05.968831 LL_StepNodeCount returned by LL was 1 D4: 01/22 14:56:05.968883 LL_NodeTaskCount returned by LL was 1 D4: 01/22 14:56:05.968934 LL_TaskTaskInstanceCount returned by LL was 1 INFO: 0031-119 Host f1n2 allocated for task 0 INFO: 0031-120 Host address 192.168.100.2 allocated for task 0 D4: 01/22 14:56:05.969074 LL_TaskInstanceAdapterCount returned by LL was 1 D4: 01/22 14:56:05.969133 Device returned... INFO: 0031-377 Using sn1 for MPI euidevice for task 0 D4: 01/22 14:56:05.969234 AdapterUsageRcxtBlocks returned value: 0 D4: 01/22 14:56:05.969286 Adapter info for task 0: proto=MPI, mode=us, window=13, name=sn1, addr=192.168.201.2, network ID=2, tag=0 D4: 01/22 14:56:05.969412 MP_RCXT_BLKS set to: 0 INFO: 0031-373 Using MPI for messaging API D4: 01/22 14:56:05.969503 Return for get_job_info() in batch D1: Entering pm_contact, jobid is 0 D1: Jobid = 7121 D1: Spawning /etc/pmdv4 on all nodes D1: 1 master nodes D4: LoadLeveler Version 3 Release 4 D1: 01/22 14:56:05.984148 Calling ll_spawn_connect for node 0, host name f1n2 D1: 01/22 14:56:05.984454 ll_spawn_connect returned for node 0, socket fd 4, host name f1n2 D4: ll_spawn_connect successful with f1n2 (task 0) D4: Calling pm_spawn_ready (number of nodes 1) D1: 01/22 14:56:05.984639 Calling pm_spawn_ready. D4: Return from ll_spawn_write (socket file descriptor 4), rc = 0 D4: Return from ll_spawn_read (socket file descriptor 4), rc = 0 D4: Return from ll_spawn_read (socket file descriptor 4), rc = 1 D1: 01/22 14:56:05.985464 returned from pm_spawn_ready. D1: Socket file descriptor for master 0 (f1n2) is 4 D1: pm_contact(): Preemption enabled, ignoring timeout D1: devnum flag is 1 D1: SSM_read on socket 4, source = 0, task id: 0, nread: 12, type:3. D1: Leaving pm_contact, jobid is 7121 D1: attempting to bind socket to /tmp/s.pedb.38900.13533 D4: Command args:<> D2: pm_respond: Input file ready! D3: EOF detected on stdin. Compiled pm_respond.c Jul 31 2007 13:07:17 D3: Message type 34 from source 0 D4: Task 0 pulse received,count is 0 curr_time is 1201035366 D4: Task 0 pulse acknowledged, count is 0 curr_time is 1201035366 D3: Message type 21 from source 0 0:INFO: 0031-724 Executing program: 0:D1: child 0: waiting for ll_task_inst_pid_update to complete 0: 0:D1: child 0: ll_task_inst_pid_update complete... 0: D3: Message type 21 from source 0 0:cpl: A file or directory in the path name does not exist. D3: Message type 22 from source 0 INFO: 0031-656 I/O file STDOUT closed by task 0 D3: Message type 22 from source 0 INFO: 0031-656 I/O file STDERR closed by task 0 D3: Message type 15 from source 0 D1: Accounting data from task 0 for source 0: D3: Message type 1 from source 0 INFO: 0031-251 task 0 exited: rc=127 D1: All remote tasks have exited: maxx_errcode = 127 INFO: 0031-639 Exit status from pm_respond = 0 D1: Maximum return code from user = 127 D2: In pm_exit... About to call pm_remote_shutdown D2: Sending PMD_EXIT to task 0 D2: Elapsed time for pm_remote_shutdown: 0 seconds D2: In pm_exit... Calling exit with status = 127 at Tue Jan 22 14:56:06 2008 real 0.52 user 0.18 sys 0.03 No match. No match. INFO: DEBUG_LEVEL changed from 0 to 4 D1: ./host.list file did not exist D1: node allocation strategy = 1 INFO: 0031-364 Contacting LoadLeveler to query information for batch job D1: 01/22 14:56:09.924245 Calling ll_init_job. D1: 01/22 14:56:10.032041 ll_init_job returned. D1: 01/22 14:56:10.032102 Calling ll_get_job. D1: 01/22 14:56:10.033390 ll_get_job returned. D4: 01/22 14:56:10.109404 LL_StepTaskInstanceCount returned by LL was 1 D1: 01/22 14:56:10.109488 Job key assigned by LoadLeveler = 7121 D4: 01/22 14:56:10.109541 LL_StepBulkXfer returned by LL was 0 D4: 01/22 14:56:10.109591 LL_StepNodeCount returned by LL was 1 D4: 01/22 14:56:10.109642 LL_NodeTaskCount returned by LL was 1 D4: 01/22 14:56:10.109692 LL_TaskTaskInstanceCount returned by LL was 1 INFO: 0031-119 Host f1n2 allocated for task 0 INFO: 0031-120 Host address 192.168.100.2 allocated for task 0 D4: 01/22 14:56:10.109831 LL_TaskInstanceAdapterCount returned by LL was 1 D4: 01/22 14:56:10.109888 Device returned... INFO: 0031-377 Using sn1 for MPI euidevice for task 0 D4: 01/22 14:56:10.109986 AdapterUsageRcxtBlocks returned value: 0 D4: 01/22 14:56:10.110037 Adapter info for task 0: proto=MPI, mode=us, window=13, name=sn1, addr=192.168.201.2, network ID=2, tag=0 D4: 01/22 14:56:10.110161 MP_RCXT_BLKS set to: 0 INFO: 0031-373 Using MPI for messaging API D4: 01/22 14:56:10.110249 Return for get_job_info() in batch D1: Entering pm_contact, jobid is 0 D1: Jobid = 7121 D1: Spawning /etc/pmdv4 on all nodes D1: 1 master nodes D4: LoadLeveler Version 3 Release 4 D1: 01/22 14:56:10.124865 Calling ll_spawn_connect for node 0, host name f1n2 D1: 01/22 14:56:10.125172 ll_spawn_connect returned for node 0, socket fd 4, host name f1n2 D4: ll_spawn_connect successful with f1n2 (task 0) D4: Calling pm_spawn_ready (number of nodes 1) D1: 01/22 14:56:10.125347 Calling pm_spawn_ready. D4: Return from ll_spawn_write (socket file descriptor 4), rc = 0 D4: Return from ll_spawn_read (socket file descriptor 4), rc = 0 D4: Return from ll_spawn_read (socket file descriptor 4), rc = -7 D1: 01/22 14:56:10.126187 returned from pm_spawn_ready. D1: 01/22 14:56:10.126187 Closing socket 4 for host name f1n2. (index is 0) D1: 01/22 14:56:15.126314 Retrying ll_spawn_connect for host name f1n2. D1: 01/22 14:56:15.126494 ll_spawn_connect returned for node (taskid) 0, host name f1n2, socket fd 4. D4: ll_spawn_connect successful after retry with f1n2 (task 0) D4: Return from ll_spawn_write (socket file descriptor 4), rc = 0 D4: Return from ll_spawn_read (socket file descriptor 4), rc = 0 D4: Return from ll_spawn_read (socket file descriptor 4), rc = 1 D1: Socket file descriptor for master 0 (f1n2) is 4 D1: pm_contact(): Preemption enabled, ignoring timeout D1: devnum flag is 1 D1: SSM_read on socket 4, source = 0, task id: 0, nread: 12, type:3. D1: Leaving pm_contact, jobid is 7121 D1: attempting to bind socket to /tmp/s.pedb.38778.13533 D4: Command args:<> D2: pm_respond: Input file ready! D3: EOF detected on stdin. Compiled pm_respond.c Jul 31 2007 13:07:17 D3: Message type 34 from source 0 D4: Task 0 pulse received,count is 0 curr_time is 1201035375 D4: Task 0 pulse acknowledged, count is 0 curr_time is 1201035375 D3: Message type 21 from source 0 0:INFO: 0031-724 Executing program: 0:D1: child 0: waiting for ll_task_inst_pid_update to complete 0: 0:D1: child 0: ll_task_inst_pid_update complete... 0: D3: Message type 21 from source 0 0:cpl: A file or directory in the path name does not exist. D3: Message type 22 from source 0 INFO: 0031-656 I/O file STDOUT closed by task 0 D3: Message type 22 from source 0 INFO: 0031-656 I/O file STDERR closed by task 0 D3: Message type 15 from source 0 D1: Accounting data from task 0 for source 0: D3: Message type 1 from source 0 INFO: 0031-251 task 0 exited: rc=127 D1: All remote tasks have exited: maxx_errcode = 127 INFO: 0031-639 Exit status from pm_respond = 0 D1: Maximum return code from user = 127 D2: In pm_exit... About to call pm_remote_shutdown D2: Sending PMD_EXIT to task 0 D2: Elapsed time for pm_remote_shutdown: 0 seconds D2: In pm_exit... Calling exit with status = 127 at Tue Jan 22 14:56:15 2008 real 5.63 user 0.18 sys 0.03 No match. No match. ----------- Among these, I noticed '0:cpl: A file or directory in the path name does not exist.'. How to locate it in script file? Thanks,