Manual:Run Processing
Scripts are available to help with extensive processing of data at the analysis clusters.
Processing a Single Raw Data File
To process a single raw data file, use the option '--run-nuwa' and specify the run number and file sequence number. The following command runs the job 'adBasicFigs' on sequence 1 of run 4378.
% runProcess.py --cluster=pdsf --run-nuwa -j adBasicFigs -r 4378 -s 1
You can also directly specify the raw data filename:
% runProcess.py --cluster=pdsf --run-nuwa -j adBasicFigs -f /eliza7/dayabay/data/exp/dayabay/2010/TestDAQ/NoTag/0810/daq.NoTag.0004378.Physics.SAB-AD1.SFO-1._0001.data
If your job generates a ROOT output file of histograms/trees/etc, it can be downloaded from the run summary webpage after the job completes.
Summing the Histograms for a Run
The option '--add-stats' will add your job output histograms to the total histogram file for the run.
% runProcess.py --cluster=pdsf --add-stats -j adBasicFigs -r 4378 -s 1
The total file for the run can also be downloaded from the run summary webpage.
Automatic Web Display of your Histograms
If you save your histograms these standard output locations, they can be automatically indexed and displayed on the run summary webpage.
/file1/diagnostics/run_0004378/ /file1/diagnostics/run_0004378/detector_SABAD1/ /file1/diagnostics/run_0004378/detector_SABAD1/channel_board10_connector08
This option '--summarize-run' will print each histogram and add it to the run summary web page.
% runProcess.py --cluster=pdsf --summarize-run -j adBasicFigs -r 4378
Batch Processing of a Job or an Entire Run
The '--batch' option will submit a job to the batch queue, instead of running it directly.
% runProcess.py --cluster=pdsf --batch --run-nuwa -j adBasicFigs -r 4378 -s 1
An entire run can be submitted with the option '--all-sequences'. If a sequence has already been processed or is in a 'failed' state, it will be skipped.
% runProcess.py --cluster=pdsf --batch --run-nuwa --all-sequences -j adBasicFigs -r 4378
After the all the sequences have been processed (check the job state as shown here), you can submit one job to add all the sequence histograms and generate figures:
% runProcess.py --cluster=pdsf --batch --add-stats --summarize-run -j adBasicFigs -r 4378
Testing a Command
The '--dry-run' option will show you what commands will be run, but will not actually execute them. It is good to run your command with this option first before actually submitting it.
% runProcess.py --cluster=pdsf --dry-run --batch --run-nuwa --add-stats --all-sequences -j adBasicFigs -r 4378 [Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0001_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0001_14336.err $PROCESSMANAGERROOT/scripts/batchNuWa.sh /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt 'runProcess.py --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 1' [Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0002_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0002_14336.err $PROCESSMANAGERROOT/scripts/batchNuWa.sh /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt 'runProcess.py --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 2' [Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0003_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0003_14336.err $PROCESSMANAGERROOT/scripts/batchNuWa.sh /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt 'runProcess.py --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 3' [Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0004_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0004_14336.err $PROCESSMANAGERROOT/scripts/batchNuWa.sh /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt 'runProcess.py --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 4' [Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0005_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0005_14336.err $PROCESSMANAGERROOT/scripts/batchNuWa.sh /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt 'runProcess.py --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 5' [Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0006_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0006_14336.err $PROCESSMANAGERROOT/scripts/batchNuWa.sh /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt 'runProcess.py --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 6' Process status: 0
Printing the Job Processing Status of a Run
The option '--print-state' will display the processing status of a job.
% runProcess.py --cluster=pdsf --print-state -j adBasicFigs -r 4378 Run 0004378: SUMMARY_DONE Seq 0001: RUN_DONE STATS_DONE Seq 0002: RUN_DONE STATS_DONE Seq 0003: RUN_DONE STATS_DONE Seq 0004: RUN_DONE STATS_DONE Seq 0005: RUN_DONE STATS_DONE Seq 0006: RUN_DONE STATS_DONE Process status: 0
The run status "SUMMARY_DONE" means the figures have been generated and indexed for the diagnostics webpage. The sequence status "RUN_DONE" means the nuwa.py job has finished processing the sequence. The sequence status "STATS_DONE" means the sequence histograms have been added to the total for the run.
Clearing a Failed Job
If a job fails, the job must be cleared before it can be resubmitted. Use the '--clear-sequence' option to clear the state and delete the ROOT output from a single sequence:
% runProcess.py --cluster=pdsf --clear-sequence -j adBasicFigs -r 4378 -s 1 % runProcess.py --batch --cluster=pdsf --run-nuwa -j adBasicFigs -r 4378 -s 1
Use the '--clear-run-total' option to delete the total histograms from the run, and clear the 'add-stats' state for remaking the run total:
% runProcess.py --cluster=pdsf --clear-run-total -j adBasicFigs -r 4378 % runProcess.py --cluster=pdsf --add-stats --summarize-run -j adBasicFigs -r 4378
Use the '--clear-all' option to completely clear the NuWa jobs results for each file, and clear the run total histograms:
% runProcess.py --cluster=pdsf --clear-all -j adBasicFigs -r 4378 % runProcess.py --batch --cluster=pdsf --run-nuwa --all-sequences -j adBasicFigs -r 4378 // After all NuWa jobs complete successfully, regenerate run total: % runProcess.py --batch --cluster=pdsf --add-stats --summarize-run -j adBasicFigs -r 4378
Auto-Cataloging Files
By default, the spade/ingest data transfer DybPython.Catalog is used to find the data files that exists for a given run number. This currently only works for real data files on PDSF. If you want to process simulated data, or data on other clusters, you can auto-generate a catalog by using the '--catalog=<path>' option. You can provide multiple directories, and all will be indexed. Files that do not match the data filename convention will be ignored.
% runProcess.py --catalog=/path/to/data/run4378 --catalog=/another/path/to/data/run4378 --batch --cluster=pdsf --run-nuwa --all-sequences -j adBasicFigs -r 4378