Nuwa.py

From Daya Bay

Jump to: navigation, search
Offline Documentation: [Offline Category] [FAQ] [Howto] [Reference] [Manual]


Offline Documentation
This article is part
of the Offline Documentation
Other pages...

Offline Category
FAQ
How Tos
Reference
Manual
Getting Started
Software Installation

See also...

General Help on this Wiki


The main program used for all offline jobs is nuwa.py.

Basic usage

One runs nuwa.py like:

nuwa.py [options] [-m Python.Module] [-m "Module.WithOptions opt1 opt2"] [-o output.root] [input1.root [input2.root]]

A brief help screen showing the options is available by running:

nuwa.py -h

Handling Long Command Lines

When jobs require very long command lines it can be inconvenient to type them out. nuwa.py support a feature of typing command line options into a text file. These are called "at" or "plus" files as they are specified to nuwa.py by preceding them with an "@" or "+" sign.

The rules for writing such files are simple:

  1. One main command line option and any arguments per line
  2. No blank lines (last line may end in a newline)
  3. No quoting unless expected by the module

For example:

shell> cat <EOF>cmdline.txt
-n 10
-m DybPython.TestMod1 'single string option with spaces'
-m DybPython.TestMod1 three individual options
-m DybPython.TestMod2
-m DybPython.TestMod3

This example might then be run like:

nuwa.py +cmdline.txt

or:

nuwa.py @cmdline.txt

Usage from inside a job module

This is somewhat advanced but it is possible to access the main NuWa object that is the heart of nuwa.py from a Job Option Module. See the topic on Accessing the NuWa Object from a Job Module.

I/0

You can specify input/output files in several ways including none at all. Examples are below. The extension of the file is important as it is used to decide how to handle the file.

Input format file extensions
root 
standard offline format using RootIO
data 
online byte stream using RawDataIO
rraw 
online byte stream packed into Root RAW format using RawDataIO (obsolete, raw data can be converted to standard .root format)
list 
a list of files with the above extensions, one per line. Blank lines or lines starting with "#" are ignored.
Output format file extensions
root 
standard offline format using RootIO

Single file output

To put all data into one single output file use

nuwa.py -o output.root [...]

Single or series of input files

One can read in a single file or a series of files by simply specifying them on the command line.

nuwa.py [...] input1.root input2.root

Input and Output Stream Maps

Available for RootIO (.root) files only. RootIO maps locations in the TES to trees in files. These are called "streams". Normally, all streams are associated with a single file (or series of input files). However, it is possible to explicitly associate some streams to files. This is done with a Stream Map. One can specify stream maps to nuwa.py like:

nuwa.py [...] [-O "{<OUTPUT STREAM MAP>}"] [-I "{<INPUT STREAM MAP>}"]

Note, the options are capitalized. The stream maps themselves are written as Python dictionaries and map a TES location to a file. An example (output) stream map is:

-O '{"/Event/Gen/GenEvent":"genevents.root", \
     "/Event/Sim/SimEvent":"simevents.root", \
     "default"            :"all-the-rest.root"}'

An input stream map would look the same but use "-I" flag. Note there is a special "stream" name called "default". When a stream map is specified, any stream not listed will be associated with the "default" file. You can also specify this "default" with "-o default.root" for output. For input, listing one or more files on the command line will serve as defaults.

A caution about the RegistrationSequence stream

The I/O system keeps the various streams in sync by using a special RegistrationSequence stream. Currently you must choose one and only one file to which the RegistrationSequence stream is associated. You will not be able to read streams if there is no RegistrationSequence stream! So, if one wants to split streams into separate files, keep this in mind.

One strategy is to put the RegistrationSequence stream in its own file so it can be used with any other files that have been split.


-O '{"/Event/Gen/GenEvent":"genevents.root", \
     "/Event/Sim/SimEvent":"simevents.root", \
     "/Event/RegistrationSequence":"rs.root", \
     "default"            :"all-the-rest.root"}'

Then, you can read back, say, just the SimHeader stream with:

-I '{"/Event/Gen/GenEvent":["genevents.root"], \
     "/Event/RegistrationSequence":["rs.root"]}'

Input and Output of histogram files

We use the StatisticsSvc to manage our histograms and ntuples.

Configure from a Job Option Module

See this topic.

Configure from command line

Alternatively you can tell the StatisticsSvc what file to use via the command line.

nuwa.py --output-stats='{"file1":"myhists.root"}' ...

Again, "file1" must match what your algorithm code has assumed.

Input and Output

StatisticsSvc can load preexisting files so that your algorithm code can read in, and possible update their objects. For input the two methods above are used by one changes "output" to "input".

Filtering, Pruning and Stripping

There are several method to reduce the amount of data in a file.

Filtering
removes individual objects on a case by case basis (eg, remove any ReadoutHeader that does not form a delayed coincidence with another). See the topic on Filtering Data.
Pruning
pruning means removing some sub-objects from a HeaderObject. For example, you can remove all SimHits from a SimHeader, but leave the header object and its particle history and unobservable statistics in place. See the topic on Pruning Data.
Stripping
stripping means removing all header objects of some type. For example, one may want to throw away all MC truth and only save the final ReadoutHeaders. See the topic on Stripping Data.

More Information

DocDb:

Offline Software Documentation: [Offline Categories] [FAQ] [Offline Manual Category]
Personal tools