Manual:Run Processing

From Daya Bay
Jump to navigation Jump to search

Scripts are available to help with extensive processing of data at the analysis clusters.

Processing a Single Raw Data File

To process a single raw data file, use the option '--run-nuwa' and specify the run number and file sequence number. The following command runs the job 'adBasicFigs' on sequence 1 of run 4378.

% --cluster=pdsf --run-nuwa -j adBasicFigs -r 4378 -s 1

You can also directly specify the raw data filename:

% --cluster=pdsf --run-nuwa -j adBasicFigs -f /eliza7/dayabay/data/exp/dayabay/2010/TestDAQ/NoTag/0810/

If your job generates a ROOT output file of histograms/trees/etc, it can be downloaded from the run summary webpage after the job completes.

Summing the Histograms for a Run

The option '--add-stats' will add your job output histograms to the total histogram file for the run.

% --cluster=pdsf --add-stats -j adBasicFigs -r 4378 -s 1

The total file for the run can also be downloaded from the run summary webpage.

Automatic Web Display of your Histograms

If you save your histograms these standard output locations, they can be automatically indexed and displayed on the run summary webpage.


This option '--summarize-run' will print each histogram and add it to the run summary web page.

% --cluster=pdsf --summarize-run -j adBasicFigs -r 4378

Batch Processing of a Job or an Entire Run

The '--batch' option will submit a job to the batch queue, instead of running it directly.

% --cluster=pdsf --batch --run-nuwa -j adBasicFigs -r 4378 -s 1

An entire run can be submitted with the option '--all-sequences'. If a sequence has already been processed or is in a 'failed' state, it will be skipped.

% --cluster=pdsf --batch --run-nuwa --all-sequences -j adBasicFigs -r 4378

After the all the sequences have been processed (check the job state as shown here), you can submit one job to add all the sequence histograms and generate figures:

% --cluster=pdsf --batch --add-stats --summarize-run -j adBasicFigs -r 4378

Testing a Command

The '--dry-run' option will show you what commands will be run, but will not actually execute them. It is good to run your command with this option first before actually submitting it.

% --cluster=pdsf --dry-run --batch --run-nuwa --add-stats --all-sequences -j adBasicFigs -r 4378
[Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0001_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0001_14336.err $PROCESSMANAGERROOT/scripts/ /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt ' --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 1'
[Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0002_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0002_14336.err $PROCESSMANAGERROOT/scripts/ /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt ' --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 2'
[Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0003_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0003_14336.err $PROCESSMANAGERROOT/scripts/ /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt ' --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 3'
[Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0004_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0004_14336.err $PROCESSMANAGERROOT/scripts/ /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt ' --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 4'
[Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0005_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0005_14336.err $PROCESSMANAGERROOT/scripts/ /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt ' --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 5'
[Running: ] qsub -o /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0006_14336.out -e /project/projectdirs/dayabay/www/dybprod/logs/runs_0004000/runs_0004300/batch_adBasicFigs_run0004378_seq0006_14336.err $PROCESSMANAGERROOT/scripts/ /eliza7/dayabay/scratch/dandwyer/NuWa-trunk-opt x86_64-slc4-gcc34-opt ' --cluster=pdsf --dry-run --run-nuwa --add-stats -j adBasicFigs -r 4378 -s 6'
Process status:  0

Printing the Job Processing Status of a Run

The option '--print-state' will display the processing status of a job.

% --cluster=pdsf --print-state -j adBasicFigs -r 4378
Run 0004378: SUMMARY_DONE
Process status:  0

The run status "SUMMARY_DONE" means the figures have been generated and indexed for the diagnostics webpage. The sequence status "RUN_DONE" means the job has finished processing the sequence. The sequence status "STATS_DONE" means the sequence histograms have been added to the total for the run.

Clearing a Failed Job

If a job fails, the job must be cleared before it can be resubmitted. Use the '--clear-sequence' option to clear the state and delete the ROOT output from a single sequence:

% --cluster=pdsf --clear-sequence -j adBasicFigs -r 4378 -s 1
% --batch --cluster=pdsf --run-nuwa -j adBasicFigs -r 4378 -s 1

Use the '--clear-run-total' option to delete the total histograms from the run, and clear the 'add-stats' state for remaking the run total:

% --cluster=pdsf --clear-run-total -j adBasicFigs -r 4378
% --cluster=pdsf --add-stats --summarize-run -j adBasicFigs -r 4378

Use the '--clear-all' option to completely clear the NuWa jobs results for each file, and clear the run total histograms:

% --cluster=pdsf --clear-all -j adBasicFigs -r 4378
% --batch --cluster=pdsf --run-nuwa --all-sequences -j adBasicFigs -r 4378
// After all NuWa jobs complete successfully, regenerate run total:
% --batch --cluster=pdsf --add-stats --summarize-run -j adBasicFigs -r 4378

Auto-Cataloging Files

By default, the spade/ingest data transfer DybPython.Catalog is used to find the data files that exists for a given run number. This currently only works for real data files on PDSF. If you want to process simulated data, or data on other clusters, you can auto-generate a catalog by using the '--catalog=<path>' option. You can provide multiple directories, and all will be indexed. Files that do not match the data filename convention will be ignored.

% --catalog=/path/to/data/run4378 --catalog=/another/path/to/data/run4378 --batch --cluster=pdsf --run-nuwa --all-sequences -j adBasicFigs -r 4378