Table of Contents
During the past decade, many HEP experiments have based their interactive data analysis on the following steps (see the top half of Figure Figure 2.1.).
Raw and reconstructed data are stored in banks in an experiment-specific hierarchical format. Most of the time one uses many different files on several distinct hosts.
These data are distilled and reclustered to obtain a more compact and thus more efficient representation. This permits a significant speed-up for the down-stream analysis compared to using the data in their raw form as described in point 1. This format corresponds to the so-called HBOOK Ntuples. One drawback of this method is that the direct relation to the raw event data is lost.
Ntuple files are analysed interactively with programs like PAW. Plots of physics variables are produced by extracting information contained in one or more of the Ntuple rows or columns, binning them in HBOOK histograms and then operating on these histograms to obtain the best representation.
The advantage of Ntuples is that their format is known and simple enough, so that a general purpose analysis tool, such as PAW can cope with data coming from any experiment. On the other hand, since no link to the original data exists, Ntuples impose a limitation on the structure of the data physicists can use for their analysis. On top of that, since the data were copied from the original files into a dedicated Ntuple file, each time original dataset changed most Ntuple files had to be regenerated.
Two kinds of Ntuples exists. Row-Wise Ntuples transform a complex data structure into a simple tabular form. Column-Wise Ntuples on the other hand improve the flexibility of the Ntuple data model by allowing the definition of variable-length items, but they still are difficult to use to describe complex data structures, like those of the reconstructed data. Moreover, the Ntuple Query language is rather non-intuitive and complex to master.