EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH
CERN/LHCC 98-11
LCB Status Report/RD45
6 April, 1998
The RD45 collaboration
CERN, Geneva, Switzerland
This document has been produced for the April 1998 LCB review of the RD45 project. In this paper, we present the status of the project, including a summary of the responses to the milestones set at the 1997 review by the LCB, suggestions for future activities and a revised risk analysis of the current RD45 strategy.
In addition, we describe activities undertaken within various experiments and projects, including NA45, COMPASS, ATLAS, CMS, ALICE, LHCb, BaBar, BELLE and Zeus.
RD45 documents may be obtained through the Web (see http://wwwinfo.cern.ch/asd/cernlib/rd45/index.html) or via e-mail request to the spokesman.
The RD45 Collaboration
David Malon
Argonne National Laboratory, Argonne, Illinois, USA
Martin Purschke
Brookhaven National Laboratory, USA
Julian Bunn, Harvey Newman, Rick Wilkinson
Caltech, USA
Eva Arderiu Ribera, Antoni Baranski, Javier Conde, Dirk Düllmann, Bernardino Ferrero Merlino, Gunter Folger, Marcin Nowak, Jamie Shiers (spokesman)
CERN/IT
Geneva, Switzerland
Pavel Binko, Koen Holtman, Vincenzo Innocente, Arther Schaffer, Lucia Silvestris, Lassi Tuura, Ian Willers
CERN/EP
Geneva, Switzerland
Martin Gasthuber
DESY/Hamburg, Germany
Andreas Pfeiffer
University of Heidelberg, Germany
David Quarrie
Lawrence Berkeley National Laboratory
Berkeley, CA, USA
Youhei Morita
KEK, Oho,
Tsukuba, Ibaraki, 305 Japan
Stansislaw Jagielski
Faculty of Physics and Nuclear Techniques,
UMM Krakow, Poland
Alexei Klimentov
MIT, USA
Christian Arnault
Laboratoire de l'Accelerateur Lineaire
Orsay, France
Jacek Becla, Gabriele Cosmo
Stanford Linear Accelerator Center, CA, USA
Sunanda Banerjee
Tata Institute of Fundamental Research
Bombay, India
Simona Rolli, Krzysztof Sliwa
Tufts University, USA
Anwarul Hasan
ETH Zurich, Switzerland and University of Cyprus, Cyprus
Elisa Cargnel
University of Venice, Italy
TABLE OF CONTENTS
1 Executive Summary
*1.1 Summary of Activities During Third Year *
1.2 Conclusions *
2 Overview of Past Activities
*2.1 Activities of the First Year *
2.2 Activities of the Second Year *
2.3 Activities of the Third Year *
2.4 Conclusions *
3 Milestones from the March 1997 LCB Review
*4 The Project Execution Plan (PEP)
*4.1 ATLAS Addendum *
4.2 CMS Addendum *
5 Risk Analysis
*5.1 Introduction *
5.2 Key Risk Factors *
5.3 Conclusions from Risk Analysis *
5.4 The Use of Commodity Solutions *
6 Work on Referees' Recommendations
*6.1 Data Replication *
6.1.1 Tests Across Heterogeneous Platforms *
6.1.2 Tests With Large Numbers of Images *
6.1.3 Wide-Area Tests *
6.1.4 Enhancement Requests *
6.1.5 Conclusions on Replication *
6.2 Mass Storage Interface *
6.2.1 Requirements for Proof-of-Concept Prototype *
6.2.2 Requirements for Production Version of Objectivity/DB - HPSS Interface *
6.2.3 Conclusions on MSS Interface *
6.3 Evaluation of an Alternative ODBMS *
6.3.1 Scalability Issues *
6.3.2 Architectural Issues *
6.3.3 Performance Comparisons *
6.3.4 Porting Issues *
6.3.5 Conclusions on an Alternative ODBMS *
6.4 ODBMS-based Data Analysis *
6.5 Novice Guide for End Users *
6.6 Objectivity/DB Training *
6.7 ODBMS-based Applications *
6.8 Conclusions *
7 Milestone 1
*7.1 Requirements *
7.2 Scalability *
7.3 Simulation *
7.4 Reconstruction *
7.5 Analysis *
7.6 Simulation of Multi-User Analysis *
7.7 Data Reclustering *
7.8 Data Import/Export *
7.9 Production Database Services *
7.10 CMS *
7.11 NA45 *
7.12 BaBar *
7.12.1 Introduction *
7.12.2 Reprocessing *
7.12.3 Data Distribution *
7.13 Conclusions *
8 Milestone 2
*8.1 Introduction *
8.2 Naming *
8.3 Collections *
8.3.1 Introduction *
8.3.2 A single class for the user interface *
8.3.3 STL-like interface, including a forward iterator *
8.3.4 Support for collections of up to 109 - 1011 events *
8.3.5 Support for a "description" of the collection *
8.3.6 Set-style operations based on a unique event identifier *
8.3.7 Conclusions on Collections *
8.4 Physics Analysis by End Users *
8.4.1 The Traditional Analysis Model *
8.4.2 The LHC++ Analysis model *
8.4.3 User Defined Analysis Attributes *
8.5 Data Analysis - a Physicist's Perceptive *
8.6 Experience at ZEUS *
8.7 NA48 *
8.8 Schema Handling Issues *
8.8.1 Schema Consistency between Separated Federations *
8.8.2 Named Schema *
8.8.3 Private User Schema *
8.8.4 Dynamic Schema Binding *
8.8.5 Conclusions on Schema Handling *
8.9 Conclusions *
9 Milestone 3
*9.1 ODBMS - MSS Interface *
9.2 The Objectivity/DB - HPSS Interface *
9.2.1 Introduction to HPSS *
9.2.2 Control and Data Flow in HPSS *
9.2.3 The Objectivity/DB - HPSS Interface *
9.2.4 The Objectivity/DB - HPSS Installation at CERN *
9.2.5 The Objectivity/DB - HPSS Configuration at SLAC (BaBar) *
9.2.6 Functionality Tests *
9.3 Trace of Objectivity/DB I/O Operations *
9.3.1 Performance Measurements *
9.3.2 Alternative Interfaces *
9.3.3 Conclusions on MSS Interface *
9.4 CMS Test Beam Experiences *
9.4.1 Introduction *
9.4.2 The H2 Test-Beam *
9.4.3 The X5B Test-Beam *
9.4.4 Conclusions on CMS Test Beam Activities *
9.5 Requirements for 1998 and Beyond *
9.5.1 COMPASS *
9.6 Conclusions *
10 Extensions to ODMG-compliant Databases
*10.1 HepODBMS Extensions *
10.2 HepExplorer Modules *
10.3 Calibration Database Prototypes *
10.4 Database Administration Issues *
11 Tests of the Objectivity/DB Java Binding
*11.1 Introduction *
11.2 Impact on application development *
11.3 Impact on DB/types *
11.4 Tests of Java Agents *
11.5 Summary *
12 Objectivity/DB Enhancement Requests
*12.1 Support for STL-based Collection Classes *
12.2 ODBMS to MSS Coupling *
12.3 Architectural Changes to Support VLDBs *
12.4 Schema Handling Enhancements *
12.5 Access Control Support *
12.6 ODMG Compliance *
12.7 Support for the Linux Operating System *
13 Standards Activities
*13.1 ODMG-related Activities *
14 General Database Activities
*14.1 Objectivity/DB Workshops *
14.2 Objectivity/DB User Meeting *
14.3 Licensing Issues *
14.4 Objectivity/DB Support *
15 Other Database Developments
*16 Use Of Objectivity/DB in HEP
*16.1 AMS *
16.2 ALEPH *
16.3 ALICE *
16.4 ATLAS *
16.5 BaBar *
16.6 BELLE *
16.7 CDF *
16.8 CHORUS *
16.9 CMS *
16.10 COMPASS *
16.11 LHCb *
16.12 LEP Data Archive *
16.13 NA45 *
16.14 NA48 *
16.15 RHIC Experiments *
16.16 ZEUS *
17 Future Activities
*17.1 Executive Summary *
17.2 Introduction *
17.3 Production Services *
17.4 Research Activities *
17.5 Summary *
18 Proposed Milestones for 1998-1999
*19 Conclusions
*20 Previous Milestones and Recommendations
*20.1 Milestones at the end of the first year (1996) *
20.2 Initial Milestones and Recommendations (1995) *
21 Glossary
*22 References
*Executive Summary
Since 1995, the RD45 project has been investigating solutions to the problem of providing persistency to physics data of the LHC experiments, assumed to be in the form of (collections of) objects. At the end of the first year, a potential solution, based on standards-conforming products, was presented. During the second year, this possible solution was studied further - performance comparisons with existing systems and tests of functionality and scalability were carried out and production demonstrations were made.
The proposed solution is based on a combination of two commercial products, namely Objectivity/DB and HPSS, together with a small amount of HEP-specific code, distributed via LHC++.
In this report, we summarise the activities of the RD45 project during the past year, including progress on the milestones set by the LCB, experience with experiments such as NA45 that have already adopted elements of the solution and other projects such as GEANT-4, together with proposals for future activities.
Objectivity/DB based solutions are being used in production by a growing number of experiments at CERN and outside, and have been selected for a number of high-data volume experiments starting around the year 1999 (BaBar, COMPASS etc.). Based on the results of our research and the adoption of the proposed solution by the community, it is our conclusion, therefore, that the primary goals of the RD45 project have been reached. Further work is clearly required, such as the setting up of full-scale, general purpose, production services. We propose that such services be established in close collaboration with the HPSS-based services offered by IT/PDP group. Associated with such services will be on-going discussions with Objectivity concerning future enhancements, and the provision of HEP-specific class libraries and utilities. In addition, a number of further R&D activities will be required. The precise activities and their respective timescales are currently being discussed with the LHC experiments.
During the past two years, the RD45 collaboration has focused on ODMG-compliant solutions, and has demonstrated the use of a standard, off-the-shelf ODBMS product for storing and managing HEP event data in a production environment.
Despite the focus on ODBMSs, RD45 continues to follow progress in other areas, such as persistent object managers, Object-Relational Databases, including object-oriented offerings from the traditional relational (RDBMS) vendors and so forth.
RD45 continues to participate in the Object Database Management Group (ODMG) - the standards body that defines and maintains the various standards for ODBMSs, as well as the Object Management Group (OMG), and the IEEE Computer Society Executive Committee on Mass Storage Systems (IEEE MSS EC).
During the past year the RD45 collaboration has:
The RD45 collaboration has evaluated and compared a number of potential solutions to the problems of providing object persistency services to the event data of the LHC experiments. The preferred solution, based on the use of Objectivity/DB and HPSS, together with a small number of HEP-specific extensions, has been adopted by a number of experiments, both at CERN and at other laboratories. Objectivity/DB - based solution have been used in production by a number of experiments at various laboratories. The next logical step is to offer production services based upon Objectivity/DB and HPSS, and use the experience gained with these services as input to a further round of research and development prior to the final decisions regarding the initial LHC computing models. It is proposed that further HEP-specific developments and enhancements to Objectivity/DB itself be handled as part of the day to day operation of these production services.
The RD45 project has now reached the end of its third year of research. The bulk of its activities have been oriented towards the potential use of standards-conforming, widely-used solutions. The key activities of the past 3 years are listed below.
During its first year, the RD45 collaboration investigated several different approaches to solving the object persistency problem, including language extensions, persistent object managers and ODMG-compliant ODBMS products.
Using the definition of an ODBMS from the "Object-Oriented Database Manifesto" [15], it was our conclusion that HEP requires a system offering all of the facilities listed as mandatory in this manifesto, all of the features listed as optional, and indeed several others besides!
RD45 was also able to identify an ODBMS product with an architecture offering the required scalability, namely Objectivity/DB and this product has been used for the majority of the prototypes built since, as well as for NA45 physics production runs.
The activities of 1995 can thus be summarised as follows:
Further details of the activities during the first year can be found in the RD45 status report for 1995 [11].
During the second year, the activities of which are described in [4], RD45 continued to focus on ODBMS-based solutions, whilst maintaining a watch on potentially alternative solutions, such as light-weight object managers and object-enhancements to relational databases. The major milestones during the 2nd year were:
During the past year, the RD45 collaboration has concentrated on demonstrating the feasibility of using an ODBMS-based solution for object persistency in typical production scenarios. This includes the complete chain from data acquisition or simulation through to end-user analysis. Although the data volumes involved have clearly not been at the level anticipated at the LHC, the basic functionality has been shown to work successfully in a production environment.
In addition to these activities, there have been a significant number of changes that are worthy of note. Most importantly, several key experiments that are due to take data in 1999 and beyond, have formally decided to use Objectivity/DB (and HPSS) as key elements of their offline strategy. These experiments include COMPASS and NA45 at CERN, BaBar at SLAC, and the RHIC experiments at Brookhaven. All of these experiments will take very significant quantities of data - typically in the 100-500TB/year range - at data rates of around 35MB/second. In other words, they present a challenge at least as significant as that posed by the LHC experiments - similar data volumes and data rates, but much earlier in time. In other words, they will not be able to benefit from the evolution in technology that one can expect between now and the startup of the LHC, but will have to develop solutions based on today's technology. Clearly, the lessons learnt from these experiments will be of great relevance to the preparation for the LHC.
In addition to these "future" experiments, there is a ever growing number of existing experiments that have already deployed production applications on Objectivity/DB. Today, this list includes AMS, CHORUS, NA45, NA48, OPAL and ZEUS, with more expected. In particular, the recently proposed LEP data archive project current plan to store some tens of TB of LEP data in the assumed "LHC solution", namely Objectivity/DB. At least two of the LEP experiments, ALEPH and DELPHI, have already started projects to investigate the use of Objectivity/DB and the other LHC++ components for current analysis.
Hence, it is our conclusion that the major goals of the RD45 project have been achieved. A significant amount of work clearly remains - not least, the delivery of the main outstanding enhancement requests and the setting up of full production services. In addition, one would expect many new and exciting developments in the areas of optimisation, distribution, parallelism and so forth. These topics, very worthy of detailed research, could form the basis of a follow-on project, but maybe best addressed in the light of the initial experience of experiments such as BaBar and COMPASS.
The RD45 project was reviewed by the LCB in March 1997, and recommended for continuation for a further year, with the following milestones and comments:
"The RD45 project has continued to make excellent progress in identifying and applying solutions for object persistence for HEP and the LHC collaborations have shown great interest in their work.
RD45 has successfully addressed the milestones set by the LCRB for 1996 and the LCB recommends that the project be approved for a further year during which the details of the work-plan should be defined in conjunction with the LHC collaborations.
The LCB agrees with the program of work outlined in the RD45 status report and sets the following milestones for the third year of the project:"
In addition, the project is asked to include the following activities in its work-plan:
Work on these milestones and recommendations is covered in detail below.
The Project Execution Plan (PEP)
At the request of the project referees, a detailed project execution plan [3] for the current year was prepared. This PEP describes the scope of the project, the work-plan, methodology, assumptions, detailed sub-tasks and resources. We repeat below the proposed schedule that was established, together with the actual completion date of the various tasks.
|
TASK |
SCHEDULED COMPLETION DATE |
ACTUAL COMPLETION DATE |
COMMENTS |
|
Install Objectivity/DB V4.0.2 |
April 1997 |
April |
Installed in LHC++ tree in AFS |
|
Test Replication (DRO) across heterogeneous platforms in LAN |
May 1997 |
April |
|
|
Setup problem tracking system for bug reports and requirements |
May 1997 |
June - a common system was setup for all LHC++ components |
Based on GNATS |
|
Joint meeting with HPSS consortium and Objectivity to define implementation schedule for Objy/HPSS interface |
May 1997 |
May + follow up meetings in September and November. Proof of concept prototype delivered prior to SuperComputing 97. |
Including representatives from other interested HEP laboratories (Caltech, LBL, SLAC etc.) |
|
Participate in Objectivity/DB user meeting |
May 1997 |
May |
|
|
Test schema evolution in scenarios typical of HEP |
June 1997 |
July |
|
|
Feed HEP requirements into ODMG meeting |
July 1997 |
July |
Set post-V2.0 directions |
|
Test WAN DRO |
July 1997 |
July |
Requires installation of DRO at remote sites plus local Objectivity/DB expertise |
|
RD45 workshop on HepODBMS |
July 1997 |
July |
Define contents/schedule of alpha release. 2 day workshop including working sessions |
|
PEP regarding CMS CDR project |
July 1997 |
December |
ATLAS & CMS PEPs presented to December LCB |
|
Participate in Kyushu field test |
August 1997 |
July - October |
Assumed that Kyushu field test begins in May 1997. Start delayed until July. |
|
Investigate porting of applications from Objectivity/DB to Versant |
September 1997 (V5.0 release date + 2 months) |
Delayed one month due to unavailability of Versant consultant |
Assumes Versant V5.0 is released in July/August 1997 |
|
PEP regarding ATLAS project(s) |
September 1997 |
December |
See above |
|
Alpha release of HepODBMS |
October 1997 |
Alpha releases made in July and November (LHC++ roadshow) |
|
|
Perform performance and scalability tests with Versant |
November 1997 |
Versant training postponed until October. First results presented in December |
Versant V5.0 release + 4 months |
|
Test the use of multiple federations and offline partitions for handling (semi-) private schema and data |
December 1997 |
July |
Initial results already at July workshop |
|
Participate in Lassen field test |
December 1997 |
Lassen merged with V5 - beta did not take place. |
Assumes that Lassen field test begins in September 1997 |
|
Deliver prototype meta-data browsers and search engines |
December 1997 |
Available in Java |
|
|
Demonstration of prototype Objy/HPSS interface |
December 1997 |
November 1997, IBM only |
Restricted to certain platforms, e.g. IBM? |
|
Beta release of HepODBMS |
March 1998 |
||
|
Meet LCB milestones |
March 1998 |
Requires appropriate input from experiments |
Table 1 - detailed project execution schedule
As indicated in the above table, an addendum to the RD45 PEP was produced, covering the milestones related specifically to activities within the ATLAS collaboration. These milestones are given below.
|
# MIF code [0200] repeat [00] <TblTag `Format B'>Milestone |
Description |
Date |
|
1 |
Geant-3 digits for all detector systems in ODBMS |
January 1998 |
|
2 |
first version of a detector description database available This should be interfaced to the current simulation and reconstruction as well as ATLAS Geant-simulation and OO reconstruction |
June 1998 |
|
3 |
Geant-4 event data available for parts of the ATLAS detector |
December 1998 |
Table 2 - Summary of ATLAS milestones for ODBMS-related work
|
Milestone |
Description |
Date |
|
Data retrieval |
Prototype reclustering algorithm based on HepODBMS |
June 1998 |
|
As above |
Prototype of strategies for data organisation and access |
December 1998 |
|
Event data model, reconstruction and analysis framework |
Prototype of a reconstruction framework for test beam and simulated data |
June 1998 |
|
As above |
Release of a "User analysis environment" |
December 1998 |
|
Calibration database prototype |
Integration of the present prototype with Objectivity Version 5 |
March 1998 |
|
As above |
Release of a class library and documentation |
June 1998 |
Table 3 - Summary of CMS milestones for ODBMS-related work
The main risks of the RD45 strategy were analysed as part of the work reported on at the March 1997 review and are described in the associated status report [4]. The main risk factors identified at that time are listed below.
The above risk analysis suggest that there are three main areas of concern, which need to be addressed. These are:
We consider the provision of a suitable MSS to be outside the mandate of the RD45 project and rely on an appropriate system being identified, acquired and run by IT/PDP group. Coordination between IT/PDP group and the RD45 project is ensured by regular meetings, which currently focus on Objectivity/DB - HPSS issues. Nevertheless, it is worth pointing out that the mass storage area continues to evolve. As was discussed at CHEP '97 in Berlin, HPSS is not the only MSS to be successfully deployed in HEP. A number of other sites make production use of the Lachman/Legent/Computer Associates Open Storage Manager (OSM) product, although there are concerns as to whether this will scale to the PB region. SGI is also known to be working on a high-end MSS, which presumably re-uses some components from the Cray Data Migration Facility. Most important of all, however, are analyses that suggest that PB stores will simply be common-place in the year 2005.
To address these issues, we have built a number of database administration (DBA) tools (see section 10.3 on page *) and are studying closely the issues involved in an eventual migration from the product of one vendor to another, including an evaluation of the currently preferred fall-back solution. These topics are covered in more detail below.
The adoption of commodity solutions is perhaps the best protection against the risks that are described above. As an example, we cite the case of a Web browser.
A Web browser is clearly a tool that is expected on every machine; this is not limited to the HEP community, but is valid world-wide. As such, we can be confident that such functionality will be provided, perhaps in a different guise, in the future. Whether a particular browser dominates the market is, personal preferences aside, irrelevant. The feature set of the different browsers is now so similar that there is little to choose between them.
Clearly, the same cannot yet be said of object databases - they are perhaps in the situation that browsers were in prior to release 3 of Internet Explorer. However, the ODBMS market is continuing to grow, and includes segments such as the telecoms industry, which provides the demand for performant, distributed solutions. Areas such as digital libraries will create the demand for highly scalable solutions. Finally, their match to C++ and Java - and the Web itself - should help to maintain demand.
In addition to the milestones set for the past year, the RD45 project was asked to:
These issues are covered in more detail below.
During the second year of RD45, detailed studies of the data replication capabilities of Objectivity/DB were made. However, these tests, which are described in detail in [6], were performed with pre-release software on a single platform (NT), and were restricted to LAN tests. We have since extended these tests to include large numbers of heterogenous hosts both in the LAN and WAN. We report on these results below, together with an analysis of how replication, and other data distribution tools in Objectivity/DB, could be used to solve data distribution scenarios typically of HEP.
User-data replication has been supported in Objectivity/DB since release 4.0 in early 1997. Replication works at the level of a physical database - that is, a database is either replicated or not, each replica being termed an image. To use the replication feature, one must first partition the federation into a number of autonomous partitions, using the Objectivity/DB Fault Tolerant Option (FTO). Each partition has its own lock server, replica of the system information (schema and catalog) as well as one or more database servers and associated databases. For example, one could imagine that CERN and each regional centre would operate as separate partitions. In such a scenario, each database could be replicated to one or more regional centres.
A full description of the replication option of Objectivity/DB is provided in [6] and will not, for space reasons, be repeated here.
Since the official release of Objectivity/DB version 4, user data replication has been available across all supported platforms. It has been tested across 7 different operating system/platform combinations, namely Digital Unix, HP/UX, IBM, SGI, Solaris and NT running on both Alpha and Intel architectures. These tests, which were largely confined to the local area, functioned correctly.
In certain scenarios, such as the replication of calibration or tag data, it is likely that large numbers of replicas (images) would be involved. Comparisons with HEPDB [29] suggest that some 10-15 images might be required for calibration data, but many more for tag data - perhaps as many as the number of institutes involved in an LHC collaboration. We have therefore tested replication up to 100 images - the limit arriving from the number of nodes that could conveniently be used for this purpose and not from any limitation in Objectivity/DB.
As shown in the figure on page *, the time taken to both create persistent objects and commit the corresponding transaction increases with the number of images involved. The latter is expected - not only does the transaction not complete until the data involved has been safely written to disk on all servers, but more network traffic is involved. By default, Objectivity/DB will send the data to 4 servers in parallel, although this may be increased by a configuration parameter. When an attempt to create new objects is made, the database will dynamically contact all servers involved and only permit the operation to continue if sufficient quorum is obtained. This technique, similar to that deployed in VMSClusters, ensures that database consistency is maintained.
Figure 1 - An Objectivity/DB Federation with a Single Partition

Figure 2 - An Objectivity/DB Federation with 2 Partitions
The tests described above where initially made using just systems at CERN. They were later extended to include machines in the wide-area, including nodes in Caltech. In these tests just 3 images were involved: two at CERN and the third in Caltech. The tests, made to simulate the update frequency and data volume of a calibration database, involved updating 1KB of data every 5 minutes. As the following figure shows, the data rate was strongly correlated to the hour of the day. During peak hours, when the link is essentially saturated, a relatively low data rate of around 2Kbit/second was achieved. However, during off-peak hours, data rates of 20Kbit/second were observed. Under such conditions, remote replicas behave essentially identically to local ones.
Figure 3 - Replication using Large Numbers of Images

Figure 4 - Wide-area Test Configuration
Figure 5 - Tests of Wide-Area Data Replication
A number of the enhancement requests made by CERN have already been addressed by Objectivity. For example, improvements have been made in the area of software control of offline images and in the re-sychronisation of images. Other enhancement requests remain pending. These include the ability to replicate complete databases by offline media, i.e. tape, rather than the network, and to provide more flexibility in the selection of which image is used - currently the product differentiates only between local and remote, but has no concept of nearest or least loaded server. In addition, improvements have been requested to the underlying Fault Tolerant Option (FTO). In the current version, catalog changes, such as modifications or additions to the schema or the addition of new databases, are only possible if all partitions are online. We have requested that the same quorum mechanism be utilised to permit updates to "system" data even if one or more partitions are unavailable, as is currently the case with user data. Appropriate enhancements will be made to a future version of Objectivity/DB.
We have verified that the basic functionality offered by the Data Replication Option (DRO) of Objectivity/DB behaves as documented. A number of bugs exist in the version that was tested (4.0.2) which are scheduled to be fixed in the production release of version 5. We believe that, once the outstanding enhancement requests have been satisfied and we have been able to verify that the outstanding bugs have been corrected, this functionality satisfies our key requirements for data distribution. However, it is important to stress that the required network bandwidth must be made available - it is not realistic to replicate large data volumes, e.g. in the TB range, over the networks that are typically in use in HEP today. Thus, in the short term, "offline replication" remains the most appropriate option for large data volumes. Replication remains a viable solution, however, for smallish data volumes, such as in the case of calibration data.
As described in the March 1997 status report [4], an interface between Objectivity/DB and HPSS at the level of the Objectivity/DB server is the preferred method of coupling these two systems. The Objectivity/DB server, which deals with database pages and has no knowledge of objects, uses standard I/O calls to read/write data from databases - exposed to the operating system as files. This matches well to the HPSS client API, which offers a similar interface: I/O calls such as open(), read(), seek(), etc. have equivalents such as hpss_open(), hpss_read(), hpss_seek(). A series of meetings between representatives of the HEP laboratories planning to use both HPSS and Objectivity/DB and Objectivity themselves resulted in an agreed timescale for the delivery of a production interface between the two products. The basic milestones were:
The requirements for both prototype and full production versions are listed below.
The actual interface has been designed such that an alternative Mass Storage System can be used, if required. The interface layer has been defined - Objectivity will deliver a linkable version of their server, which can then be interfaced at the client site to the MSS of choice. This also allows for additional monitoring code, or for example access control checks, to be inserted at the level of database (file) open/close. The proof-of-concept prototype - currently limited to IBM platforms only (the only platform on which the HPSS client is currently supported, although ports to Sun and DEC are under way) will be stress-tested at SLAC, CERN and other laboratories throughout 1998.
Justification: CERN plans to use this interface during 1999 for production with some 300-400TB of data for two experiments: COMPASS and NA45. In addition, we anticipate pre-production usage by several other experiments, including the LHC experiments themselves. Other HEP laboratories, such as SLAC, also have plans for full-scale production in the same time frame.
Justification: the choice of data management software should not restrict customers to one or even a few hardware vendors. It is essential that we are able to obtain commodity hardware offering the best price/performance at any given time.
Justification: Extensive data-intensive testing will be required to ensure that the product is production-ready prior to committing many hundreds of TB of data to the system.
Justification: HEP experiments plan to use this software for up to 25 years! The longevity of the software must be guaranteed.
Justification: The operation of the modified AMS server should be totally transparent to Objectivity client applications, apart from timing and throughput. Specification of the federated database configuration and location should be through a normal Objectivity boot file.
Justification: The Objectivity client should be unaware of the location of databases in a distributed environment. Accommodating the possibly long time-scales involved in HPSS operation should not adversely impact the timeouts and hence security of non-HPSS operations.
Justification: Some fraction of the federation, such as meta-data, should never be migrated off disk, as it will be required for fast access to other parts of the data. In addition, some databases may reside on private workstations/PCs, or be distributed in the WAN. A solution that required all data to be managed by one or more HPSS-instances, and hence ruled out e.g. databases/partitions on the desk-top, would not be acceptable.
Justification: The purpose of providing the interface to HPSS is to increase the size of federations that are possible. No reduction of functionality should occur.
Justification: The Objectivity/DB - HPSS interface should not introduce bottlenecks or adversely affect performance.
Justification: Objectivity/DB will be used together with HPSS to store many tens of PB of data - some 5PB/year for the LHC experiments. A demonstration of a 1TB federation is an acceptance test for the interface.
Although the system configuration will clearly have a strong impact on the feasibility of satisfying these requirements, the actual interface between Objectivity/DB and HPSS should be written in such a way as to minimise overhead. A possible example would be the use of the "advanced" HPSS API, rather than the basic API, which avoids routing all I/O calls through the HPSS server.
Justification: access to HPSS-resident data will only be supported via the AMS. To reduce network load, these data are expected to be stored on the same machine as the AMS.
Justification: certain characteristics for HPSS-resident data should or must be specified at file-creation time. In addition, to optimise performance, e.g. to allow the server to pre-fetch data blocks, a mechanism must be provided whereby such information, known only to the client, can be passed to the server.
The need for an interface between Objectivity/DB and HPSS has been identified. Agreement has been reached between members of the HEP community, Objectivity and the HPSS consortium as to how such an interface could be provided, namely via interfacing the Objectivity/DB server to the HPSS client API. A prototype of such an interface has been developed and is currently under test at a number of HEP institutes, including Caltech, CERN and SLAC. A production version of the interface - with enhancements as identified during the test period - is scheduled for deliver by the end of 1998. Tests of this prototype interface are described in section 9 on page *.
One of the largest challenges facing today's very long-lived HEP experiments is that of adapting to the highly dynamic computing environment. The lifetime of a typical LHC experiment is very long compared to the timescales of the computing industry. This is clearly demonstrated by looking back some 20-25 years. During this period, we have seen dramatic changes, and changes no less significant must be expected between now and the end of LHC data taking. At CERN, we have seen the introduction of general-purpose networks, of interactive computing and of distributed computing. However, collaborations were largely able to choose a computing environment that remained relatively stable throughout the entire lifetime of the experiment. This was no longer true with LEP, which had to face major changes such as the migration from CERNVM to Unix.
Thus, irrespective of issues relating to the long-term survival of any hardware or software vendor, it is essential to be prepared for change. As such, we have investigated the issues related to migrating between the products of two ODBMS vendors, namely Objectivity/DB and Versant. This evaluation comprised several steps:
In Versant, a distributed database may consist of up to 216 databases. Each of these databases may consist of up to 216 volumes, which in turn map to files. The primary, or system, volume, contains the schema for that database - as noted below, there is no built-in mechanism whereby consistent schema across the set of databases can be ensured. The current version of Versant appears to be limited to 2GB/volume. However, a more likely limit is the available disk space on a given server, assuming that all volumes belonging to a given database are stored on the same server.
Tests have been made up to 35GB databases and up to 1300 volumes per database. Using multiple databases, a distributed database of roughly 0.5TB has been created.
|
Limit |
Objectivity/DB |
Versant |
Comments |
|
Object ID |
64 bits |
64 bits |
|
|
# of Databases |
216 - 1 |
216 |
|
|
Maximum DB size |
Filesystem limit |
N/A |
Versant maps volumes to files |
|
# containers/DB |
215 - 1 |
216 |
"volumes" in Versant (£ 2GB) |
|
# pages/container |
216 - 1 |
216 |
|
|
Page size |
£ 64K |
16K (fixed) |
Figure 6 - Limits of Objectivity/DB and Versant Architectures
Versant is built upon a very different architecture to Objectivity/DB. For example, whereas Objectivity/DB implements a "fat-client/thin-server" model, Versant provides the opposite. Again, whereas the object identifier (OID) in Objectivity/DB has a direct physical mapping, Versant has a logical object identifier (LOID). From the HEP point of view, we believe the most important issues to be the following:
Of course, there are areas, such as the provision of access control, where Versant is superior to Objectivity/DB. However, the issue is not so much to compare the features of the two products, which clearly must be done in the light of the specific requirements of the applications that will be built on top. Rather, it is to understand whether Versant can be considered a viable alternative, e.g. as the basis of a fall-back strategy. It is clear that these products - and the other ODMG-compliant offerings - are not different implementations of an identical specification. Many features are common, but others, such as the above-mentioned clustering strategies, differ significantly.
A number of basic performance measurements were performed using Versant and compared with Objectivity/DB. In particular, the read and write performance with varying object size was measured. Both products show very similar dependencies on the database page size, with a dip in performance around the page size. The tests were done with local data which, in the case of Objectivity/DB, results in direct I/O from the client to the database. Hence, although the results show consistently better performance in the case of Objectivity/DB, we would expect the advantage to be less in the case of remote data. On the other hand, Versant's finer grain locking mechanism - object rather than container level - results in more lock traffic and hence reduced performance.
A number of relatively small applications, such as the RD45 limits and scalability tests and the Caltech "stars" benchmark, have been reimplemented and/or ported to Versant. Whilst these activities have confirmed that it is indeed feasible to move applications, the difficulties in moving a large user community and/or large data volumes should not be underestimated.
Given the architectural differences described above, class libraries such as HepODBMS, which implement clustering strategies, would have to be redesigned to work with Versant, rather than Objectivity/DB. There would also be implications for applications such as HepExplorer, as the various modules would have to explicitly open the required databases and not simply connect to the federation. Moreover, the issue of data conversion between the two systems would need to be studied. We believe that these issues would be best addressed by taking a "non-trivial" application and moving both the data and code to a different ODBMS. A possible example of such an application and data could be the CMS H2 test-beam analysis applications. The porting of these applications would also require the conversion of some 80GB of associated data, currently spread over more than 1000 databases. However, the number of classes involved (of the order of 10) is significantly lower than expected for the full object model of an LHC experiment, where of the order of 1000 classes are anticipated.
We believe that the performance and scalability measurements of Versant confirm our primary choice of Objectivity/DB, whilst nevertheless demonstrating that Versant is a possible alternative should, for example, Objectivity/DB cease to exist. However, it should be stressed that migrating from one solution to another is a significant undertaking that is likely to have implications on the physical location of data, if not also the logical object model.
In the context of LHC++, a data analysis framework based on industry-standard tools, together with HEP-extensions, has been built up. These tools, which have been demonstrated in a series of roadshows at CERN and outside, are largely based on the use of IRIS Explorer [32], together with HEP-specific extensions (see chapter 10 on page *.)
A more detailed description of ODBMS-based data analysis is given in the discussion on milestone 2 (see section 8 on page *).
An introduction to the overall LHC++ framework has been prepared as part of the initial LHC++ roadshows, which concentrated on end-user tasks such as histogramming and data analysis. This guide, available in both printed form through the CERN Program Library office and via the LHC++ Web pages, covers all aspects of Objectivity/DB that need to be exposed to end users. In fact, as this guide was prepared prior to the setting up of production servers - scheduled for the first half of 1998 - some additional complications, related to the use of Objectivity/DB with AFS have had to be exposed. In the medium term, these issues will be greatly simplified and users will find that the default environment is such that they can use Objectivity/DB-based applications with no additional setup. However, a general knowledge of some of the ODBMS terms is likely to be useful - just as an overview of the basic ZEBRA terminology (banks, stores, divisions etc.) was in the past.
Clearly, the use of an Objectivity/DB federated database for the storage of physics data, calibration data and histograms has an impact on the environment that users see. Notwithstanding the differences in functionality, it is perhaps useful to compare some key features of the new environment with that required for existing CERNLIB packages, such as FATMEN and/or HEPDB.
In the case of Objectivity/DB, an environment variable, OO_FD_BOOT is used to locate a so-called boot-file, which contains information on the federated database, such as the lockserver host, the location for journal files and the federated database catalogue. This is very similar to HEPDB: here, the CDSERV environment variable points to a directory which contains a file hepdb.names, which gives the location of journal files for the various calibration (or other) databases, the locations of these databases, the names of remote servers, and so on.
Users of Objectivity/DB may work with a copy of the boot-file, e.g. if they are at an outside institute - this copy is kept in sync with other copies by the database system. Again, similar functionality is to be found in both FATMEN and HEPDB. In Objectivity/DB, such a "remote institute" is referred to as a partition - it has its own consistent copy of the database catalogue and boot-file, its own lockserver, and its own set of database servers.
In contrast to existing systems in HEP, however, a user need not manipulate or concern him/herself with files - the files used by Objectivity/DB to store persistent objects are managed automatically and transparently by the system.
Training on the C++ interface to Objectivity/DB has been offered through the CERN training programme for some time. As this course is best suited to those people who will develop persistent C++ applications, and is less oriented towards end-users, we will supplement this training with an end-user course, based upon the lectures being developed for the 1998 CERN School of Computing.
As described above, numerous experiments at CERN and outside have begun to build applications using Objectivity/DB. The RD45 project has assisted these efforts via informal consultancy (user support) and by providing a centralised installation of Objectivity/DB, information on installation and testing and documentation. New developers are recommended to follow the Objectivity/DB C++ Developers' Course, offered as part of the CERN training programme. A calibration database prototype has been developed within the CMS collaboration. A similar package has also been developed within the BaBar collaboration. During the coming months, we will evaluate both of these tools with the aim of offering an experiment-independent tool with HepODBMS.
The recommendations of the LCB referees of the RD45 project have been addressed and the related work described above. In several areas, such as the issue of the ODBMS - MSS interface, there are follow-on activities for the coming year. These will be addressed primarily in conjunction with the proposed production services based upon Objectivity/DB and HPSS.
The first milestone set at the March 1997 review of the RD45 project was as follows:
"Demonstrate that an ODBMS can satisfy the requirements of typical simulation, reconstruction and analysis scenarios with data volumes of up to 1TB."
The general requirements for this milestone are based upon those stated in the ATLAS [27] and CMS [28] Computing Technical Proposals, and on discussions at RD45 workshops.
The main requirement for reconstruction is that the ODBMS should be able to keep up with the rate at which data is acquired: 100MB/second for ATLAS and CMS and 1.5GB/second for ALICE. It was agreed that it was not necessary to demonstrate these data rates now, but rather show how such rates could be supported in the future, assuming appropriate hardware resources. Furthermore, it was agreed that an acceptable fall-back solution for ALICE would be to reconstruct in "play-back" mode. In other words, the period when data was not being taken would be used to perform the reconstruction. This results in an effective data rate for reconstruction similar to that required for ATLAS and CMS.
All experiments, as now, plan to perform production reconstruction using a farm. The estimated number of nodes required various from around 100 to 500. It has already been demonstrated in NA45 that up to 32 streams can write into a single Objectivity/DB federation using a lock-free strategy. An extrapolation to 100 streams by 2005, if not much before, seems not unreasonable and hence requires a data rate of just 1MB/second per stream. Even in the case of ALICE, a data rate of 15MB/second per stream would be required, assuming a farm of 100 nodes. Even if the rawdata is entirely rewritten, e.g. for reclustering purposes, rather than added to, the data rates appear rather modest. Hence, the I/O rates for reconstruction purposes are not considered to be a problem.
For analysis, it is assumed that some 150 users will be performing analysis concurrently at any one time per experiment. The data volumes that will be read per analysis are harder to estimate. However, we should clearly exploit the ability of reading just the needed data, and hence minimise I/O. It is also assumed that the data will be distributed over multiple servers - perhaps 100 servers each with 1TB of disk cache and a non-negligible amount of memory. It will be important to exploit the natural parallelism of the database and both disk and memory caching. Simulations of multi-user analysis loads are described in more detail below.
Simulation is not believed to have any special requirements that are not automatically provided for by meeting the requirements of reconstruction and analysis.
The scalability of Objectivity/DB's architecture was covered in detail in the March 1997 status report [4]. At that time, it was not possible to create individual databases (DB) - of which up to 216 are permitted in a federation - larger than 2GB. The current release permits databases up to the limit imposed by the underlying filesystem - essentially unlimited on 64-bit filesystems. We have verified that the 2GB limit no longer exists, by creating databases up to 25GB in size. We have also verified that it is possible to generate federations up to the current limit of 216 databases. The largest test federation that has been created to date has been of the order of 0.5TB - limited by the available diskspace. Plans to build a disk-resident federation of 1TB have been shelved, as the required resources (disks, servers) could not be purchased out of the funds made available. Attempts to build very large federations using HPSS-managed storage are currently under way, and are described further in section 9 on page *.
Although we have not built a federation larger than 0.5TB, many federations containing over 1000 DBs have been built. Using 25GB DBs, just 40 are required to build a federation of 1TB. Just as, in today's environment, the number of entries in a FATMEN catalogue is entirely independent of the size of the files that are catalogued, the number of databases in an Objectivity/DB federation is totally decoupled from the size of these databases. Thus, the number of databases per federation demonstrated in the various production federations that were built during the last year, such as those generated in CMS test-beam activities (see section 9.4 below), show that federations as large as some 25TB are easily achievable. In practice, building federations much larger than a few hundred GB requires an interface to a mass storage system, as discussed in section 6.2 above. Performance and functionality tests of the current prototype are discussed in section 9.1 below.
We are therefore confident that the requirement of providing federated databases up to 1TB for reconstruction, simulation and analysis can be met.
Neither event nor detector simulation programs generate high I/O rates, nor are these multi-user applications (although multiple analysis jobs may be running in parallel). Hence, the requirements from the point of view of simulation are rather straightforward - the ODBMS must simply support the persistent object model of the application. Although an effort to rewrite the Lund family of event generators has been recently approved, the only simulation program that currently uses an Object Database is GEANT-4. Experience in adding persistency to GEANT-4 was reported on at the last LCB review of RD45. Since that time, further progress has been made, as described below.
The strategy for persistency in the GEANT-4 project splits the persistent data model into two parts. The run based persistent objects, which have to be stored for each run, and the event based persistent objects kept for each event.
The run based information includes run conditions, the geometry definition and the physics process conditions during a simulation run.
The event based part stores information about the primary physics process, a persistent representation of particle trajectories, simulated hits and the resulting digitisation data. An overview of the event based object model is show in the figure below.

Figure 7 - GEANT-4 Persistent Classes
It should be noted that storage size of classes like trajectory points or single digitisations and hits is rather small in comparison to the overhead for making an object persistent on its own. In the current object model one has therefore chosen to make instances of this relatively small classes persistent by containment in a persistent container. For example the trajectory points themselves are transient objects stored in a persistent trajectory object. Since the object model in the area of event based persistency is relatively simple and the performance penalty introduced by direct persistency has been shown to be negligible, the simulation code will directly make use of persistent objects.
In the area of run based persistency, the GEANT-4 object model is more complex and makes use of many cross-references between the involved objects. The naive approach of simply making some of the classes in the transient model persistent would lead to problems with respect to object lifetimes. In some cases persistent objects would contain references to transient objects that are invalid when retrieved later by another process. In other cases the persistent objects would contain (and therefore store) attributes which are only valid during the actual simulation processing.
On the GEANT-4 mini-workshop on persistency issues held in January 98, it was therefore agreed that a strategy using a parallel hierarchy of persistent objects storing only the relevant part of the attributes of the original transient model is more appropriate. In this approach a persistent store manager object, which can be extended by the GEANT-4 user, is responsible for mapping a complete tree of transient objects onto an appropriate persistent object tree.
The weaker binding between the transient objects used by the core simulation code and their persistent counterparts will help to decouple the transient and persistent object model models and will reduce the dependency of the core simulation code on the persistency mechanism.
The main drawback of this strategy is the need for a manual synchronisation of both models. If a new attribute is introduced in the transient object model, which also needs to be stored, then the persistent data model has to be extended as well and the mapping code which converts both models into each other has to be updated.
In summary, it is the users of GEANT-4 who will actually design the persistent-capable object model. GEANT-4 itself will provide a skeleton and guidelines based on those of RD45, of how to transfer data from the GEANT-4 transient objects to persistent objects.
Production demonstrations of using an ODBMS to provide object persistency for event reconstruction have been made since 1996, by the NA45 experiment. These demonstrations have shown not only the feasibility of using an ODBMS for this purpose, but also some of the advantages of such a system, such as the ability to support multiple concurrent writers, whilst maintaining full consistency. Demonstrations have also been made by CMS, as part of their H2 and X5 test-beam activities in 1997. This work will be continued in 1998, when both ATLAS and CMS plan further test-beam activities based on Objectivity/DB.
Other important activities planned for 1998 include tests by NA45 and COMPASS, in preparation for their high-volume runs in 1999, and the "mock data challenge II", planned by BaBar to run during the second half of 1998.
These activities are described in more detail below.
The issue of data analysis is very closely related to milestone 2, where we address all end-user issues (see section 8 on page *.) In this section, we address the requirements given in the ATLAS [27] and CMS [28] Computing Technical Proposals. The main requirements stated in these proposals are:
These general requirements are somewhat harder to quantify, as they all depend heavily on the computing model adopted. We therefore make a number of assumptions, based upon current thinking.
It is assumed that some 150 physicists will be actively performing analysis at any time of day or night. Here, "active" is taken to mean the number of users who are reading or writing data to the database within a given time interval - say one hour. Given Objectivity/DB's federated database architecture, it is clearly possible to create a situation whereby many more concurrent users are supported. Theoretically, a total of 216 databases could be stored on separate servers, handling just one user. As the locking granularity is at the level of a container, each single database could support 216 parallel writers without any lock conflicts! In reality, many of the users are likely to be accessing the same data - the "hot" events. Clearly, an important issue will be designing a computing environment such that the individual database servers can provide sufficient bandwidth and whereby the data is efficiently clustered. Current thinking at CERN suggests that data servers will serve a few hundred GB of disk, so as not to limit the I/O throughput. The first such data servers will be installed during 1998. Plans at BaBar are for somewhat more powerful servers, perhaps supporting 1-2TB of diskspace. Nevertheless, the filesystems on these servers should be capable of sustaining data rates in excess of 10MB/second.
Assuming that the primary mechanism by which events are selected is via an event "tag", as described in section 8.4 on page *, and that an event tag is of the order of 100 bytes, 100GB is then sufficient to store 109 event tags. Clearly, this will not be a problem in terms of overall data storage, but both efficient clustering and caching will be needed, to support large numbers of concurrent users. A further option is that of data replication, which could be used both in the local and wide area. Rather than access a remote collection, a user could access a replicated tag database. This would not only reduce the load on any central servers, but would also minimise network traffic. As, in general, we assume that the event data would not be replicated, it would only be necessary to access remote data for the small fraction of events that passed the initial cut on the event tags. Clearly, for certain rare and interesting channels, it would be possible to replicate the full event data too, further reducing the overhead on the wide-area network. Other options, which clearly need to be studied further, include the possibility of using mobile "agents" to perform the analysis close to the data: a classic example of moving the query to the data and not vice-versa. Preliminary studies of using agents in combination with Objectivity/DB have been made, and this area is clearly worthy of further study.
Tests of the ability of Objectivity/DB to support multiple concurrent users were made using the 256-processor HP Exemplar system at Caltech. This system is built out of 16 nodes, each of which are a 16-processor SMP. Each node is connected to a striped disk array, capable of delivering some 22MB/second. In the tests, the data was stored on two of the nodes (10GB each). The I/O-intensive clients were run on the nodes where the data resided and the CPU-intensive clients were run on the other nodes.
The load was comprised of the following:
Using the above mix, it was possible to run more than 100 concurrent clients on the system with no degradation in I/O performance. The combined throughput of all clients was essentially constant, at around 18MB/second, up to 100 concurrent clients. These initial results suggest that scaling to 150 concurrent analyses - even without resorting to multiple database partitions and replicas - is achievable today. Support for many more active users will clearly be possible by the time of the LHC production phase.
When using an ODBMS based on a page-server architecture, as is the case with Objectivity/DB, efficient data clustering is important to maximise data rates. In other words, it is important that each data page that is transferred to the client contains a high percentage of objects of interest. The same is true on a coarser scale, when data are cached from tape to disk, by the mass storage system. Not only is it inefficient use of the available disk space to cache unneeded data, but it also results in wasted bandwidth when transferring the data from tape.
Although it is clearly important to cluster data efficiently when it is first stored - both at the page level and also at the level at which data are transferred to/from tape - there will be times when re-clustering is required, to re-establish efficient data access.
The issue of data re-clustering has been studied in both ATLAS and CMS. To study the potential performance gains of re-clustering, a prototype has been developed in CMS. This prototype is based on a mechanism for clustering data into collections, and accessing these collections with read-ahead optimisation. The read-ahead optimisation allows the clustering of different types of objects to be managed in an independent way, and also makes it possible for the "batch reclustering" operation to conserve the database size while preserving optimal throughput. The objects are retrieved through a fast access engine, which uses a schedule to optimise throughput. The schedule only needs to be computed once for every job, and this allows the optimiser to use fairly complex computations.
Subjects for future research are to increase the scalability in the number of access patterns, and an extension of the optimiser to cover migration of collections between tape and disk store.
Objectivity/DB provides a number of facilities for data import/export. For example, a database may be moved from one server to another, within the same federation. This may be performed over the network, if sufficient network bandwidth is available, or offline, e.g. via tape. Applications where this may be useful are for the centralisation of the output of simulation runs. As described above (see section 6.1 on page *), Objectivity/DB also permits data to be replicated, which could be applicable to calibration data, event tags and so on. However, there may be occasions when the use of a single federation may not be appropriate or the network bandwidth inadequate.
For example, it is likely that developers will wish to work with a private federation, containing a subset of the data in the production federation. This could be required to insulate the production changes from the development environment, or to enable the developer to work disconnected from the network.
In Objectivity/DB, it is possible to copy a database and then attach it to another federation. It is required that the target federation be compatible, i.e. have consistent schema for at least the subset of objects in the databases that are copied, and share database parameters such as the database page size. A copied database may be (re-)attached with a new database ID, in which case the object identifiers of all contained objects are automatically updated.
However, should it be necessary to copy objects which have associations to objects stored in different databases, a more complex strategy is required. The external associations can be simply dropped. This could be appropriate if, for example, the analysis objects are being copied. In this case, it would be possible for any associations to say the raw data to be dropped and tested for by analysis applications.
A more complete solution, however, is to provide a "deep copy" utility, which copies objects and any objects that are referenced. Such a tool has been developed by BaBar to assist in their data import/export, under the assumption that existing networks do not offer sufficient bandwidth to support wide-area data replication (at least not for raw data!). An associated problem is that of maintaining consistency between federations. Here, BaBar intends to use a simple database-ID allocation scheme, which ensures that the database Ids used by different federations are compatible.
In today's environment, most of the prototyping and testing using Objectivity/DB is performed against private federations - there is little coordination of issues such as federation identifier, lock-server and so on. Although developers are likely to continue to work with their own federations, this is clearly not appropriate for end users. As part of the 1997 COCOTIME review of computing resources at CERN, it was agreed to establish a number of production database services during 1998. These services would be based around the following components:
Initially, such services will be established for ALICE (NA45), ATLAS, CMS and COMPASS. These services will help us to gain experience of running Objectivity/DB together with HPSS in a production environment.
As described in section 9.4 on page *, CMS have evaluated the use of an ODBMS at all stages of production including data taking, reconstruction and analysis. Federations of some 1000 databases have been generated at two test beams in 1997, and further work with larger data volumes are planned for 1998.
The NA45/CERES experiment is looking for low-mass e+e- pairs in heavy-ion collisions at the CERN SPS. At the end of 1995, they took the decision to move to C++ and began to redesign and rewrite their offline software. At the same time, they began to use Objectivity/DB for the storage of some of their data, and are hence the first HEP experiment to have used an ODBMS in production. Although their main production was performed on the CERN CS-2 system, a platform not officially supported by Objectivity, they were able to complete successfully a series of production runs. These runs demonstrated a number of important features of Objectivity/DB, including the ability to support multiple parallel writers. A total of 32 processors were allocated to the NA45 production runs, and each processor was able to write into the database in parallel. This is considered to be an important demonstration of a capability which will be fundamental to supporting the reconstruction farms of the LHC experiments.
Based on the 1995 data, a federated database of approximately 20GB was created, from some 0.5TB rawdata. The 1996 data was similarly processed, resulting in some 50GB stored in Objectivity/DB.
The CERES detector is currently being upgraded to include a TPC, which will improve momentum resolution. A further data taking run is foreseen for 1999, at which stage some 50TB of data will be acquired. It is currently planned that these data will be stored directly in Objectivity/DB, with the data store managed by HPSS. A test run is planned for the end of 1998, by which stage a reliable version of the interface between Objectivity/DB and HPSS is required.
The BaBar experiment at SLAC will take some 200TB of data per year, starting in 1999. They intend to use a combination of Objectivity/DB and HPSS on which to base their event store.
Events are represented as a hierarchy of objects based upon an event header. This header contains references to multiple child objects, each corresponding to a particular processing stage. The information for each processing stage will itself typically be organized as a stage header with references to further child objects. In addition, summary or tag objects are designed to allow for rapid but simple tests to be made on a small subset of the attributes of the children objects without having to access the latter directly. A single event tag is proposed, together with stage tag objects at each of the stages. There is a trade-off between performance and complexity of such queries. Extremely high performance queries can make selections based on the event tag, while less performance but more complexity can result from accessing the stage tags and finally the data from each stage. This strategy can result in dramatic performance gains if appropriate tags are created.

Figure 8 - the BaBar Event Data Model
The data for each event is typically stored across multiple databases, corresponding to the different components. For example, the raw data, reconstructed data, ESD, AOD and simulated truth would be stored in 5 sets of databases, with the corresponding headers stored in a further 5.
The event header, the stage headers & the various tag objects are located separately from the bulk of the data that makes up the event in order to take advantage of clustering prefetching. In order to allow for separate migration to the mass store, these should be in separate database files rather than separate containers within the same databases. In addition, mapping between groups of file systems (e.g. slow, fast, HPSS) and components gives the possibility of distributing different part of events in a most efficient way depending on the site.
The processing framework consists of an input module that iterates over an input event collection and acts as the head of a chain of processing modules. At the end of this chain, the output module is responsible for the output of the processed information. Conceptually, the output module can be associated with the same or another event collection as the input module.
Processing and reprocessing depends on the prior presence of processed data in the output collection.

Figure 9 - Reprocessing: same collections

Figure 10 - Reprocessing: different collections
The BaBar collaboration is establishing a number of regional centres. It is planned that IN2P3 in Lyon will have a complete copy of all of the raw data. Other regional centres, such as RAL and INFN, will have partial copies of the data. Given that some 200TB of rawdata are expected per year, the current network connections between the various sites do not permit distribution of data via this manner. Thus, it is planned that bulk data transfers will be performed by tape, as in the past.

Figure 11 - Data Distribution Strategy
An ODBMS has been used in production for data taking, reconstruction and preliminary analysis by a number of experiments with federated database sizes ranging from 1-100GB. Additional experience has been gained with the use of Objectivity/DB for providing persistence in GEANT-4, although clearly full production tests cannot be made until at least the open-beta, if not first production version, of GEANT-4 are made available. Both the open-beta and first release are scheduled for 1998. Production federations containing over 1000 databases have been demonstrated, and the scalability of individual databases up to 25GB has been shown. Thus, we believe that an Objectivity/DB federation can today easily scale to at least 25TB, if not well beyond. Scheduled enhancements, including the interface to HPSS, should remove the practical limitations associated with building such large federations. Performance tests, described in section 9 below, have shown that data can be written into Objectivity/DB at close to raw disk speeds per stream. Production tests with NA45 have demonstrated the use of up to 32 parallel streams, making data rates in excess of 100MB/second fully achievable today.
The second milestone set at the March 1997 review of the RD45 project was as follows:
"Investigate the impact on the every-day work of the end-user physicist when using an ODBMS for event data storage. The work should address issues related to individual developers' schema and collections for simulation, reconstruction and analysis."
In the current model, the ODBMS that is used for the event data storage is just one component of the overall offline environment. To a large extent, end-users should be unaware of the details of this storage - just as they will be unaware of the storage management software, presumably HPSS, upon which Objectivity/DB will be built. As described in more detail below (see section 6.5 on page *), access to a given Objectivity/DB federation is determined by the use of an environment variable. By default, a user could even be automatically connected to the production database of the appropriate experiment. Helper classes, distributed as part of the HepODBMS class libraries (see section 10.1 on page *), reduce the amount of code needed to perform frequently used operations, such as initialising a database session. In the case of interactive analysis, browsers are being developed to permit users to navigate through the database, find an appropriate collection of events, either by name or by characteristics, and use these collections as a starting point for further analysis. Given the power and flexibility of the system, a user will have to know much less, e.g. no knowledge of the tape staging system, the run book-keeping system and so forth, but nevertheless be able to work with considerably large data volumes with greater ease.
Users wishing to write or modify persistent applications will require additional knowledge, but again this is much less than in the past. The ODMG language bindings form a natural extension to the language in question - only a few basic principles need to be learnt. This is far from true today, when expert knowledge is required to create and manipulate ZEBRA data stores and data structures.
Using an ODBMS the main access method for the end user to retrieve a persistent object is navigation. In other words, the application code directly follows a persistent reference from an event collection object that points to an associated event. Afterwards the user might follow another association from the event object to the raw data for this event. Obviously this approach needs some starting point - in other words, the first persistent object has to be found using another technique than navigation. For example, when a user wants to retrieve a particular collection of events from the database, this could be done by a collection name, which is maintained by the system for all persistent collections. Traditional systems use such a naming technique implicitly through file names for event directories. Since a single flat naming space tends to be difficult to maintain for a larger set of entities, often the hierarchical name space typically provided by filesystems have been used. In other cases a similar hierarchical name space has been implemented on the application level (e.g. ZEBRA directories for histograms).
ODMG compliant databases typically provide naming facilities on the object level. These facilities allow key objects to be located in the system by a simple name lookup. The implementation of these naming facilities in Objectivity/DB allows using another persistent object, a container, a database or the whole federation as the scope of such a name. The concept of a scope provides a lot of flexibility in object naming. For example, nested and/or overlapping naming schemes are possible. In other words, a single object may be referenced by different names in different scopes. To implement a tree-like structured naming space - like the filenames in a filesystem - is relatively straightforward. A first prototype of such a naming tree is included in the HepODBMS class libraries (see the classes NamedTree and NamedNode). This prototype implements a heterogenous tree of named objects. Users can create "directories" in their private databases and populate them with named objects like histograms or event collections.
The directory structure of the naming tree facilitates the organisation of analysis results from different job runs and simplifies subsequent browsing or visualisation.
Although it is clear that such a simple naming scheme can not solve the problem of how to structure the various meta-data items relevant to analysis, it has the advantage of being familiar to end users.
Collections of persistent objects, especially event collections, are of central importance to both the end user and the database administrator.
End users will typically specify their analysis tasks as algorithms applied to a set of input events, such as a named event collection, using a scheme such as that described above. In order not to perform the same selection over and over again, they may also produce a new event collection as result of an analysis job.
Database administrators will try to optimize the overall system performance by redefining the physical clustering of large event collections shared by one or more working groups.
The functional requirements for collections are expected to vary significantly between different tasks. For example, some analysis job will process relatively small collections (less than a few thousand events) but will require maximum speed for the access to individual events. Larger production jobs may need to deal with very large collections (up to 109 – 1012 events) but have less stringent requirements in terms of speed of access.
In most end user cases collections will have to support "overlapping collections". In other words, the same event may "belong" to many collections. These collections will typically be implemented by "reference". In other words, the collection will store persistent references to the events rather than complete copies of events. Very large collections, such as "the collection of all events gathered this year" may have to use a different collection implementation, such as by containment, in order to avoid unnecessary large reference lists. The collection in these cases could be defined as all event objects that are contained in a set of database files.
In order to permit multiple different implementations whilst preserving the same user interface, the use of the "strategy" pattern is proposed [36]. The choice of actual implementation is then transparent to the user, who does not need to know about the internal details in order to use a collection. On the other hand, the creator of new collections will have sufficient flexibility to guide the system by suggesting the most appropriate implementation to be used for a particular application.
The collection interface is designed to closely resemble that of the standard C++ collections or STL (see for example [37].) This will permit algorithms developed for STL containers to also function on event collections. As such, each implementation of an event collection would have to provide at least a basic iterator and a constant iterator, which as a minimum implement the interface of an STL forward iterator.
The "class extent" - the collection of all objects of a given class in the federation - could easily be modelled using a collection by containment.

Figure 12 - Class Diagram for Event Collection Prototype
A workshop on event collections was held at CERN during February 1998. The goals of the workshop were to agree on the interface to event collections and on which physical implementations should be made. The target date for an implementation is the 98A release of LHC++, scheduled for June 1998.
The key requirements for event collections identified at the workshop are given below.
For the purposes of the following we will use the term event to refer the persistent object class on which the analysis is based. For some experiments this might actually be a pair or track object. The intent is to implement a collection template which allows the concrete class to be supplied as a template parameter, e.g. List<CMSEvent>.
We assume that collections of transient objects, collection of objects within one event as well as transient collections of events are handled adequately by the normal STL and persistent STL implementations. The aim of this implementation is therefore to provide persistent collections of events.
The user should only have to use one single class for both the collection itself and the associated iterator. Different implementations of the physical store will be handled using a strategy object. At collection creation time a hint may optionally be provided, to choose the appropriate implementation with respect to e.g. the expected size.
The collection and the iterator should conform to the collection interface as defined by the STL. At least forward iterators (mutable and const) have to be provided.
The complete set of events of a given experiment might be very large. For example, the number of events that will be collected by the BaBar experiment will be this order of magnitude. It might not be possible to implement all operations (such as sorting) for very large collections.
It should be possible to decorate the persistent collections with a set of attributes, which further describe it, e.g. a collection name or the selection that was performed to produce the collection.
Each event provides a unique identifier for which a less operator is defined. This operator will be used, for example, to sort a collection to perform set style operation. To reach acceptable performance we assume that large collections will obey weak set insertion semantics.
The following set operations have to be provided:
A first prototype of event collection classes, based on the requirements agreed at the February 1998 workshop, will be developed in time for the April 1998 workshop. The goal is to incorporate a version of these classes in the 98A release of LHC++, scheduled for June/July 1998.
Analysis of physics data is traditionally performed in one of two processing modes loosely labelled as "batch processing" and "interactive analysis". Batch processing is typically used in an initial pass over a large input data set to extract or derive the quantities that will be used for the subsequent physics analysis. This stage may well involve re-running parts of the reconstruction program, possibly with different or improved algorithms or constants. Often the analysis code at this stage selects only a subset of all input events for further analysis. In order to keep the results of such a selection many experiments employ so-called "event directories", which allowed collections of events to be defined and offered facilities for further sub-selection according to a small fixed set of criteria. These criteria are often represented as a bit pattern - the meaning of the bits may even change with time. The main output of batch processing programs was typically one or more Ntuples, which are subsequently used in the "interactive" stage with PAW.
Although both concepts - event directory and Ntuples - offer considerable advantages over what was previously available, a new approach is possible if the data are stored in an ODBMS. For example, if it is realised that an additional variable is required in the Ntuple, the entire Ntuple-generation stage must be rerun. This can be extremely time-consuming, and is somewhat inflexible. Similarly, it is non-trivial to navigate from the data in the Ntuple to the full event data. This can be needed if one wishes to perform a full event display of certain anomalous events.
In summary the main services that have to be provided by an analysis system are:
In addition to allow the end user to conveniently access predefined sets of events and to store private event selection for repeated use, such a system needs to support private and common event collections.
A scheme for performing interactive data analysis has been developed in the context of LHC++. The overall LHC++ strategy, which includes a number of widely-used commercial components, supplemented by HEP-specific extensions, as necessary, offers functionality similar to that of today's CERNLIB. As such, it contains a set of histogram classes, which may be purely transient, written to files or, more conveniently, stored in the same federation as the associated event data. Tools, both batch and interactive, for manipulating these histograms are provided, included mechanisms for fitting the data using Minuit and other minimisation programs. Data can be visualised using industry-standard graphics packages, such as OpenInventor, or manipulated interactively be means of IRIS Explorer, a modular visualisation framework built upon OpenInventor and using components of the NAG numerical libraries.
The following figure shows schematically the difference between the PAW model based on Ntuples and the LHC++ model based on a tagDB. In the latter case, the tag information is still connected to the main event data, and it is possible to navigate from one side to the other.

Figure 13 - Tag vs Ntuple Model
The support for user defined attributes and for efficient selection of data from an ODBMS is based on the tagDB model. The HepODBMS package in LHC++ currently provides two prototype implementations of tag classes, the so-called "GenericTag" and "ConcreteTag".
Both implementations share a common interface to the interactive visualisation framework (a set of IRIS Explorer modules) allowing the end user to produce interactively distributions on sub-selected data.
Generic tags are aimed at not too large collections owned by single end users. They provide a simple user interface for creation of tag quantities and eliminate the inconvenience for the end user to formally define a new persistent class when the set of user quantities changes. Generic tags may contain attributes of type float, double, short, long, char. Additional attributes may be added after the initial definition. This flexibility makes the generic tag especially useful during the first development phase of an analysis, when the set of quantities used in the analysis tends to change more frequently. Although they carry a slight performance penalty with respect to concrete tags, they are more convenient for end users and avoid many of the schema handling issues, described in section 8.8 on page *.
Concrete tags do have their own schema and are oriented towards large, shared collections, such as collaboration-wide or work group-wide event collections.
Both implementations share a common interface, which is entirely decoupled from the physical storage model. This permits the implementation of different clustering strategies, such as attribute-based clustering - as in column-wise Ntuples - without affecting the user interface.
In the LHC++ model, before one can start to visualise data, one has define and fill a collection of tags, as shown below. In this example, the generic tags are used.
|
// create a new tag collection GenericTag simTag("simulation tag"); // define all attributes of my tags TagAttribute<long> evtNo(simTag,"event number"); TagAttribute<float> et (simTag,"Et particle1"); TagAttribute<float> theta(simTag,"theta particle1"); TagAttribute<short> pid (simTag,"id particle1"); |
Figure 14 - Creation and Definition of a New Event Tag
These tags are then filled in a typical event loop, as shown below. It is important to note that the tag attributes are handled just like normal C++ variables, adhering to the ODMG philosophy.
|
while ( evt = geant->nextEvent() ) { simTag.newTag(); // create a new tag et = evt->getPart(1).et; theta = evt->getPart(1).theta; pid = evt->getPart(1).pdg_code; } |
Figure 15 - Filling a Previously Defined Tag
As has been described above, a fundamental feature of this strategy is the ease in which the full event data can be accessed. This is a significant piece of new functionality that was not possible using PAW+Ntuples.
|
while (atlasTags->next()) { if (et > 4.5 && sin(theta) > .5) // for selected events… { // … fill histograms from the tag… cout << "event: " << eventNo << endl; etHisto->fill(et); thetaHisto->fill(theta);
// … but also using data from the event. nTracks = atlasTags->event->tracking->trackList.size(); nTracksHisto->fill(nTracks); } } |
Figure 16 - Accessing the Event Data from the Tag
Having populated a collection of tags - typically, but not necessarily, performed in batch, these data can then be visualised using IRIS Explorer.
As described above, IRIS Explorer is a modular tool-kit. An application can be built visually, or can be predefined, out of the basic building blocks, referred to as modules. These modules can be those that are provided with the system, HEP-specific modules, or those from other IRIS Explorer user communities.
A simple application, or map, to provide similar functionality to that offered by PAW's Ntuple "Plot" and "Project" commands, requires three separate modules. The first module, a database browser, allows a user to select a previously defined collection of tags. Data are then "passed" to a second module, where further cuts are made, and further data can be derived, or associated data retrieved from the database. Finally, the needed quantities are stored in a histogram, for subsequent visualisation. In fact, the data do not flow from one module to another - the Objectivity/DB object identifier (OID) is passed between modules, using shared memory or TCP/IP sockets, depending on the nodes on which the modules are run - minimising data copies.

Figure 17 - Interactive Analysis using Explorable Collections of Tags
A paper presented at the 1996 HepVIS workshop on data analysis and visualisation in HEP identified many of the key problems of today's systems and attempted to define requirements for a future analysis environment. From a physicist's point of view, the requirements were seen as:
Although it cannot be claimed that the use of an ODBMS addresses all of these issues, it can certainly have a major impact on the issue of homogeneity. In today's environment, there are a wide variety of data formats in common use. Even when the underlying system is the same, there can be significant difficulties in accessing the data stored using different packages. ZEBRA alone has both "sequential" (FZ) and "random access" (RZ) formats, although both of these have their own variations (FZ native or exchange file format, binary or ASCII data etc.). In addition, the many packages built on top of ZEBRA (DBL3, FATMEN, HEPDB, HBOOK, OPCAL etc.) have inconsistent interfaces. It is not possible to "make a link" (ZEBRA terminology) between data in say a ZEBRA FZ file, associated calibration constants stored in HEPDB, histograms stored in an HBOOK file and data stored in an Ntuple (even if in the same HBOOK file as the histograms). Attempting to scale such "confusion" to data volumes several orders of magnitude greatly is clearly unlikely to succeed.
On the other hand, an ODBMS permits all of the above data to be stored in a consistent manner, even if physically located in separate containers and/or databases on different servers. The user is exposed to the logical, not physical, view.
Like many existing experiments, the ZEUS collaboration at DESY uses an event directory - in their case based on ADAMO - to speed up event selections. A standard program, called EAZE, is provided to access the event data. Users have to provide 3 user-routines and control cards. Each event has an associated header, which includes the run and event number, and the offset within the mini-DST file to the event. In addition, there are a total of 128 bits for event selection.
Experience has shown that the use of individual bits is somewhat inflexible. As a result, the cuts implied by a given bit tend to be rather loose, and hence many jobs read more events than are required, and perform a tighter cut in their program. The ability to perform selections based upon variables, rather than bits, would clearly help, but at the cost of increased storage.
To test these ideas, ZEUS built a prototype "tag database", based upon Objectivity/DB. The system design goals included:
The main difference seen by the users is in the control cards used to steer the standard analysis job, EAZE. Examples of steering cards for selection of events from the ADAMO-based event directory and the Objectivity/DB tag database are shown below. As these examples show, the new data card format is more comprehensible. In addition, as the selection of data can be more precise, less data is read, resulting in improved performance. As shown in the table below, a job reading some 2750 events from a total of 45000 runs over 7 times faster using the tagDB implementation.
|
C ZEUSIO-INFI /zeus/data/mini95/r011539.z . . ZEUSIO-INFI /zeus/data/mini95/r012208.z C ZeusIO-IOPT DRIVER=IE,ZED C ZeusIO-ZCLASS .and. b9 ZeusIO-ZCLASS .and. b10 |
Figure 18 - Example of EAZE Control Cards Without TagDB
|
ZeusIO-INFI ZeusEventStore ZeusIO-Run ((RunNR=>11539)AND(RunNr=<12208)) C ZeusIO-IOPT DRIVER=OBJY C ZeusIO-Variable ( ZeusIO-Variable (Ee>5) and ZeusIO-Variable ((Zvtx>-50)and(Zvtx<50))and ZeusIO-Variable ((Eminpz>35)and(Eminpz<65))and ZeusIO-Variable (Yjb>0.04) ZeusIO-Variable ) |
Figure 19 - Example of Control Cards Using TagDB
|
User Time (seconds) |
System Time (seconds) |
|
|
ZED (old system) |
2756.5 |
91.7 |
|
ZES (new system) |
362.7 |
32.0 |
Table 4 - Performance Comparison: Run 12075, ET > 30
A number of other tools are provided by the system, include a standard program to generate an Ntuple. This can be run as shown below.
|
ZeusIO-Run (RunNr=11543) C ZeusIO-Bit (DST27) C ZeusIO-Variable (Yjb>0.7) |
Figure 20 - control cards (sel.txt) for above Ntuple extraction
Currently, all of the data from 1995 are stored in the database, occupying some 14GB of disk space. 92 physics variables are stored per event, and some 10-30% of analyses are performed using the new system. A second phase is currently under study, whereby some 200 variables would be stored per event. This would include data from 1994 until the present, and require some 150-200GB of disk space.
Future plans include storing the physics data itself in Objectivity/DB, rather than just the tags. At this stage, physics analyses directly from C++ would be supported.
NA48 is an experiment at the CERN SPS that studies CP violation. It has recently initiated a project with similar goals to that of the ZEUS experiment, described above. In other words, they plan to implement a database using Objectivity/DB to optimise access to physics data. So far, some 20TB of data have been acquired. This will increase to by a further 100TB by the end of the year 2000. Although these data will not be stored in a database, the volume involved dictates that efficient access is required. In particular, it is essential that only the needed data is cached to disk and read into memory. Although the initial proposal calls for a query language slightly different from that employed by ZEUS, and uses bit information to minimise the storage requirements, it is likely that there will be cross-fertilisation between the two projects, and that a common strategy will be evolved.
|
Burst.microCompactBurstMetaInfo bit [5..10] as integer within [-10..10] (event.microCompareEventData bit [1..8] as integer within [0..100] ) AND ((event.microCompactEventData bit 20=true) OR (event.microCompactEventData bit 21=true)) |
Figure 21 - Examples of a Possible Query Language for NA48
Before an object can be stored in an ODMG-compliant database, its definition or schema must be defined. This is done using in the Object Definition Language or, in the case of Objectivity/DB, using Objectivity's DDL, and is shown schematically in the figure below. (For a more detailed description, see [11].)

Figure 22 - Database Development Procedure
In the current version of Objectivity/DB, each persistent-capable C++ class is given a type number, which is allocated sequentially. In other words, the type number given to a specific persistent-capable class depends on the order in which the corresponding DDL file is processed. In the current C++ binding this type number is placed as a class variable in the generated code for each persistent class. During the startup phase of an application this number is used to associate the application class with the schema definition for this class stored in the federated database. Maintaining the type-numbering scheme of an application (or library) in agreement with the target federated database schema is therefore an essential requirement to allow the correct functioning of Objectivity/DB.
In a single developer environment, the Objectivity schema pre-processor does guarantee the synchronisation of type numbers, since type number allocation and schema generation is performed against a single federated database. In a larger scale development project with many software packages and many distributed developers, the constraint to use a single federated database is not practical.
Any schema change performed does require write access to the federated database file for the developer. In order to keep the risk of interference with other users of the federation minimal, we assume that any development will be done against a separate development federation. Only the deployment of stable, released packages should be done against the shared production federation.
To allow developers to set-up private development federations, the development environment must provide a mechanism to create federations containing a copy of the production schema using the same type numbering scheme. This allows the use of production versions of binary libraries of other packages against their development federation. It also simplifies the preparation of input data needed for program testing. One can simply copy test data from the production federation into a database, which is then attached to the development federation.
To simplify the preparation of the development federation schema we have requested a tool, which directly exchanges (parts of) the schema information between two separate federations, without the need to repeat the schema pre-processing step. This functionality will be provided in one of next releases of Objectivity/DB.
To remove the type number coupling between different packages introduced by the sequential type number allocation Objectivity/DB provides the so-called "named schema" feature. This feature allows to divide the type number space of a federated database into named subsets, by specifying the -schema [ name ] when running the DDL processor. Each of these named schemata is reserving a range of 64K type numbers, permitting the individual developer to reorganise the schema within a package without compromising the type numbering of other packages. Some 16 schema names have been allocated for the various LHC++ packages (HepODBMS, HistOOgrams, CLHEP, Geant-4 etc.). The named schema feature has been successfully used to de-couple the development of the different LHC++ packages. We recommend that each experiment register additional named schemata for all experiment specific packages that define persistent classes.
Requirements for private user schema include the following:
Although not a hard requirement, an implementation that kept the user schema in a separate file outside of the central federated database would be preferred, for security reasons.
As discussed above, the static binding of C++ application classes to database schema, using type numbers compiled into binary libraries and applications, has some disadvantages. It complicates the development of federation-independent class-libraries and requires a rather complicated schema preparation procedure if the number of database developers and packages becomes large.
The Java binding to Objectivity/DB provides a more flexible solution. In this case the binding of application classes to the federation schema is done at application runtime using the class name. Using this dynamic binding technique the same application or library can be used against different federations independent of the sequence in which the schema has been defined. We expect that a similar implementation for the C++ binding would greatly simplify the development cycle. A more dynamic schema binding has therefore been requested as a longer-term solution (see section 12.4 on page *.)
A strategy for handling both developers' and end-users' schema has been developed and tested. Although received too late for inclusion in this report, new developments in Objectivity/DB for schema exchange and for run-time access to schema appear to largely meet our requirements. Further enhancements in this area are expected from Objectivity/DB and will be discussed at future RD45 workshops.
The use of an ODBMS as the basis for a consistent, experiment-wide, data management scheme has clear advantages, which have already been demonstrated in production in a number of experiments. These advantages address a number of the requirements, such as homogeneity and ease of use, listed in the CMS Computing Technical Proposal [28]. Further developments of the interactive data analysis environment are clearly required and are already underway. Enhancements to the way that schema are handled, particularly for large projects, where the schema must be shared between multiple federations, have been requested. More prototyping of collections and naming schemes needs to be performed, for which realistic use cases are required. We believe that these activities are best covered as part of the production services that are currently being established.
Additional research needs to be performed in order to understand issues related to the distributed environment of HEP, including data import/export, networking issues, the possible use of technologies such as mobile Java agents, and so forth.
Milestone 3
The third milestone set at the March 1997 review of the RD45 project was as follows:
"Demonstrate the feasibility of using an ODBMS and MSS at data rates sufficient for ATLAS and CMS 1997 test-beam requirements."
Since this milestone was set, ATLAS postponed their plans to evaluate Objectivity/DB in a test beam environment until 1998, and hence we only report below on the experience gained in CMS. Furthermore, as the data volumes planned for CMS were of the order of 100GB, it was agreed that these tests would concentrate on the use of Objectivity/DB alone and not address its integration with a mass storage system. Finally, the data rates involved in the CMS test beam activities were rather modest - well below 1MB/second - and hence did not pose any difficulty to Objectivity/DB. Thus, the main challenge posed by the CMS test beam activities was a production demonstration of the overall LHC++ environment, from data taking to analysis - a somewhat different focus to that described in the milestone above.
In addition to the CMS test beam activities, we describe progress on the interface between Objectivity/DB and HPSS, including performance and functionality tests. Plans for test beam activities in 1998 are also included.
The need for an interface between the object manager layer and a mass storage system was identified as part of RD45's activities during its first year and is described in [11] and [12]. In summary, although one can expect significant advances in disk capacity/unit price between now and the startup of LHC, it is unlikely that one will be able to afford, or even manage, disk farms capable of storing the entire LHC data volume - a total of some 100PB. More reasonably, one could expect to cache some tens to hundreds of TB of active data on disk, whilst keeping the bulk of the data on cheaper storage media.
Objectivity/DB and HPSS are emerging as the de-facto standard solutions for the HEP community in their respective areas. Plans to use both of these systems in production exist at BNL, CERN and SLAC. As a community, we have requested an interface between these two products, as described in section 6.2 on page *.
We describe below tests of the prototype version of the interface between Objectivity/DB and HPSS and discuss possible enhancements for the production version of this interface, scheduled for delivery by the end of 1998.
The High Performance Storage System (HPSS) is a software system that provides hierarchical storage management and services for very large storage environments. HPSS is the result of a collaborative effort by leading US Government supercomputer laboratories and industry to address very real, very urgent high-end storage requirements. HPSS is offered commercially by IBM Global Government Industry, Houston, Texas and is built upon the IEEE Reference Model for Open Storage Systems Interconnection, more commonly known by its previous name of IEEE MSS Reference Model, shown below.

Figure 23 - The IEEE Reference Model for Open Storage Systems Interconnection
HPSS is designed to be scalable in terms of data capacity (up to the level of petabytes), data transfer rates (gigabytes per second), number of files (billions), maximum file size (264 bytes), and geographic distribution of both software components and storage devices.
HPSS achieves these scalability features by supporting both direct- attached and network-attached disk and tape storage devices from multiple vendors, as well as by enabling distributed, parallel I/O through software striping.
HPSS is currently in production at a number of sites, including Maui High Performance Computing Center, Cornell Theory Center, Sandia National Laboratory, Caltech, Fermilab, Lawrence Livermore and Lawrence Berkeley National Laboratories, University of Washington, Los Alamos National Laboratory, San Diego SuperComputing Center, Oak Ridge National Laboratory, NASA Langley Research Center, Rechenzentrum der Universität Stuttgart, SLAC and CERN.
As such, it has clearly established itself as the mass storage system of choice for sites with high-end requirements. A long list of enhancements are planned, which can be viewed at the HPSS web-site.
The figure below (page *) shows the flow of control and data of a read operation for a file stored in HPSS-managed storage. HPSS consists of the following software components:
In the following diagram, the first step is performed upon file open. Should the file be disk resident, a read request will execute steps 2, 3, 4 and 7. In the case that the file is offline, steps 5 and 6 are performed, followed by the normal read loop.

Figure 24 - Control and Data Flow in HPSS
As described in section 6.2 on page *, an interface between Objectivity/DB and HPSS has been requested by a number of HEP laboratories. A prototype of such an interface has been produced for IBM AIX systems - the only system on which HPSS is currently officially supported. This interface combines the Objectivity/DB server with the HPSS client, and was built by Andy Hanuchevsky/SLAC and Urs Bertschinger/Objectivity. The prototype consists of two parts:
By providing the interface in this way, end-user sites are able to optimise the I/O layer, or even substitute a different mass storage system, provided that a compatible interface is written. Objectivity/DB applications will be unaware that the associated data resides in HPSS managed storage. When an object is accessed, it will be returned immediately if the corresponding database is already disk resident. If not, the client will block on the implicit database open whilst the server, through HPSS, causes the necessary file to be reloaded from tape.
The current interface, which permits one block to be read at a time, is likely to be sub-optimal, but was provided for convenience. A better strategy would be to read multiple blocks at a time, and hence minimise the interaction with the HPSS server. However, the performance implications of the current prototype are not yet well understood, and it is expected that stress testing over the coming months will suggest areas where improvements are required.
Areas where enhancements are expected include:
In the current HPSS test configuration at CERN, the various HPSS components are distributed across multiple systems. For example, the tape mover(s), disk mover(s) and HPSS nameserver all run on different systems. In addition, an IBM system is currently being used to evaluate the Objectivity/DB - HPSS prototype interface. As such, this system runs both the Objectivity/DB server (AMS) and the HPSS disk mover, together with the rest of the environment required by HPSS, such as DCE. It is anticipated that one and eventually several/many disk servers will be run for each experiment, each supporting a few hundred GB of disk space managed by HPSS and the Objectivity/DB server.

Figure 25 - Objectivity/DB - HPSS Configuration at CERN
Unlike at CERN, SLAC currently plans to run the various HPSS components and the Objectivity/DB server on a single, powerful system. Although such a scenario has the advantage of reducing the network overhead involved in the inter-module communication, it is inherently a less scalable scenario, but nevertheless well-suited to the environment at SLAC, where the system will be used to support a single experiment (BaBar).

Figure 26 - Objectivity/DB - HPSS Configuration at SLAC
The basic functionality required of the proof-of-concept prototype, as described in section 6.2.1 on page *, have been demonstrated. It should be noted, however, that as HPSS uses DCE security, the Objectivity/DB server has to have the appropriate DCE credentials. As such, the familiar problem of token expiry must be handled.
|
[rshpss01] ~ dce_login # Acquire DCE token Enter Principal Name: mnowak Enter Password: [rshpss01] ~ % cd ~/objectivity/bin # Directory containing modules linked with HPSS API [rshpss01] ~/objectivity/bin % ./oostartams # Start the modified AMS Objectivity/DB (TM) Start AMS Utility, Version 4.0.10 Copyright (c) Objectivity, Inc 1989, 1996. All rights reserved. The AMS has been started (process ID = 52260). |
Figure 27 - Starting the HPSS version of the Objectivity/DB server
Once the modified version of the Objectivity/DB server has been started, standard Objectivity/DB tools or applications can be used. For example, the oonewdb tool is used below to create a new database in the federation "BIG", whose bootfile is also given below.
|
[cernsp] ~/amstest % more BIG ooFDNumber=1452 ooLFDNumber=65535 ooPageSize=8192 ooLockServerName=rsobjy01 ooFDDBHost=f-rsobjy01 ooFDDBFileName=/objy01/BIG.FDDB ooJNLHost=rsobjy01 ooJNLPath=/objy01 |
Figure 28 - Bootfile for the "BIG" Federation
|
[cernsp] ~/amstest % oonewdb -db test5 -host f-rshpss01 -filepath . BIG Objectivity/DB (TM) Create Database Utility, Version 4.0.2 Copyright (c) Objectivity, Inc 1992, 1996. All rights reserved. Created Database test5 [DBID = 16]. |
Figure 29 - Creating a New Database in HPSS-managed Storage
|
[cernsp] ~/amstest %oodumpcatalog BIG Objectivity/DB (TM) List Database Files Utility, Version 4.0.2 Copyright (c) Objectivity, Inc 1990, 1996. All rights reserved. FD Name = BIG FD ID = 1452 FD File = f-rsobjy01::/objy01/BIG.FDDB Boot File = rsobjy01::/objy01/BIG Jnl Dir = rsobjy01::/objy01 Lock Host = rsobjy01 ... DB Name = test5 DB ID = 16 DB Image = f-rshpss01::./test5.BIG.DB
|
Figure 30 - Output of oodumpcatalog
Tests have also been made of access to tape-resident databases. To perform these tests, a federation of two databases was created. Using HPSS administration commands, the two database files were forced to tape. A simple application was then run against the federation. When the application attempted to access the databases in question, they were transparently recalled to disk by HPSS, during which time the application was blocked. As soon as they were disk-resident, the application continued as normal.

Figure 31 - Tested Objectivity/DB - HPSS Configurations
Figure 32 - Objectivity/DB - HPSS Configuration
The modified Objectivity/DB server permits us to introduce additional code at the I/O level. For example, this permits an interface to an alternative MSS to be built, provides an exit for access control, and permits I/O operations to be traced.
In the following figure, an application first initialises the federation and opens a database for write access.
|
Start of the transaction oofs: opening file /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 0 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 0 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 1 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 3 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 5 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 7 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 13 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 0 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 1 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 3 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 59 /user/test.DB... oofs: Reading 1 pages (8192 bytes, from page 3168 /user/test.DB... |
Figure 33 - Trace of opening a Database for Write Access
As a second step, the application loops, creating objects of 100KB. Initially, no I/O is performed as the objects are stored in the client cache. Once the cache limit has been reached, data is written to disk. As the objects are large, the I/O is performed in multiple transfers. The first object appears to use a free database page, whereas the second and subsequent objects are written to adjacent pages in a regular pattern. In the case of objects larger than a single database page, the last page contains the page map for the object and is hence written separately.
|
Cache is full, 3 objects are forced to disk: (first updating some internal information)
oofs: Reading 1 pages (8192 bytes), from page 0 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 0 /user/test.DB... oofs: synching /user/test.DB... (writing data) first object: oofs: Writing 8 pages (65536 bytes), from page 3169 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 3177 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 2778 /user/test.DB... oofs: Writing 2 pages (16384 bytes), from page 3178 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 3180 /user/test.DB... second object: oofs: Writing 8 pages (65536 bytes), from page 3181 /user/test.DB... oofs: Writing 4 pages (32768 bytes), from page 3189 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 3193 /user/test.DB... third object: oofs: Writing 8 pages (65536 bytes), from page 3194 /user/test.DB... oofs: Writing 4 pages (32768 bytes), from page 3202 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 3206/user/test.DB... |
Figure 34 - I/O Log
Finally, at transaction commit time, the remaining objects in the client cached are flushed to disk and a sync operation performed.
|
Commit of the transaction, flushing the cache: oofs: Reading 1 pages (8192 bytes), from page 6 /user/test.DB... oofs: synching /user/test.DB... oofs: Reading 1 pages (8192 bytes),from page 0 /user/test.DB... oofs: Writing 1 pages (8192 bytes),from page 0 /user/test.DB... oofs: synching /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 3428 /user/test.DB... oofs: Writing 8 pages (65536 bytes), from page 3415 /user/test.DB... oofs: Writing 4 pages (32768 bytes), from page 3423 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 3427 /user/test.DB... oofs: Writing 8 pages (65536 bytes), from page 3207 /user/test.DB... oofs: Writing 4 pages (32768 bytes), from page 3215 /user/test.DB... ... oofs: Writing 1 pages (8192 bytes), from page 3414 /user/test.DB... oofs: Reading 1 pages (8192 bytes), from page 59 /user/test.DB... oofs: Writing 1 pages (8192 bytes), from page 59 /user/test.DB... oofs: synching /user/test.DB... oofs: closing file /user/test.DB... |
Figure 35 - Transaction Commit Log
In the following figure, we show the performance of HPSS using the so-called "simple API" as a function of blocksize. This API closely resembles the POSIX filesystem interface. In other words, for each POSIX I/O call, there is a corresponding HPSS function.
As the figure shows, HPSS works most efficiently for very large blocksizes - between 1 and 10 MB. Unfortunately, databases typically transfer much smaller amounts of data. In the case of Objectivity/DB, this is a database page, which is limited to a maximum of 64KB. Thus, unless large data volumes were cached on the server side, which brings with it problems with respect to data integrity, such an interface is unlikely to deliver the required performance.
Figure 36 - HPSS Write Performance as a Function of Block Size
Figure 37 - HPSS Read Performance as a Function of Block Size
In the current prototype, each I/O request to a database residing in HPSS-managed storage involves a significant overhead. Before a data block is transferred, the HPSS client, in this case the Objectivity/DB server, must first contact the HPSS nameserver to obtain the "bitfile ID" of the corresponding file. Having obtained the bitfile ID, it must then communicate with the bitfile server and the data mover to read/write the data. This communication overhead results in a significant performance degradation that suggests that the current interface could not, as anticipated, be used in production.
We list below the possible interfaces between Objectivity/DB and HPSS.
The HPSS-NFS option and the "simple API" both suffer from poor performance and can be ruled out for production systems. The future interface to DFS or DMIG can also be ruled out as a short-term alternative.
The "advanced API" permits clients to transfer multiple blocks without the additional control information being passed to the nameserver and bitfile server. However, in a random-access environment, it is hard to predict which blocks will be read in the future. Furthermore, it is essential that any optimisation does not compromise data integrity. For example, any cached data must be kept consistent across multiple Objectivity/DB server processes or threads. Although the Objectivity/DB client, or rather client application, may have more information about which blocks are likely to be requested in the future, e.g. via the object identifiers in the current event collection, it is unclear whether the use of HPSS for small data transfers is desirable.
An alternative solution would be to use HPSS as a conventional staging system and let Objectivity/DB read/write directly to standard Unix filesystems. This would avoid the performance overheads associated with reading/writing to HPSS-managed disk storage, but would require some space management of the disk pools. However, the existing CERN tape staging software already provides such a capability and is currently being interfaced to HPSS. This can be implemented using the interfaces developed for the Objectivity/DB - HPSS proof of concept prototype and is currently considered the most viable short term solution.
It is clear that this area needs a significant amount of further study and will be the subject of much attention during the coming year. The activities planned in this area include workshops between IT/ASD and PDP groups and Objectivity experts, visits to SLAC to coordinate activities and compare results and possible implementations, follow-up meetings with Objectivity and the HPSS consortium and so on. The target for a production-quality interface remains the end of 1998 and is scheduled for inclusion in Objectivity/DB V6.0. The use of an interface to a staging system reduces the amount of work required on the Objectivity side, although it is expected that enhancements, such as the ability to pass "hints" from client to server, will be requested in the future.
CMS Test Beam Experiences
A prototype analysis chain was developed in CMS to test Objectivity/DB and other LHC++ components. This software was tested in the H2 test beam for a period of approximately 2 months (August 6th - September 29th). After a few days of running in, the system operated unattended without major problems. A federation of over 60GB was created, with a total of 1250 database files.
This software was also used in the X5 tracker test beam. In this case, a federation of some 25GB in 200 databases was created.
The prototype analysis chain tested in the H2 test-beam consisted of the following components:

Figure 38 - CMS H2 Test Beam Raw Data Class Diagram

Figure 39 - CMS H2 Test Beam Clustering

Figure 40 - Test Beam Configuration
After a few days of running in, the system ran essentially unattended without major problems. The only manual operation was to change the output disk every 9GB. Although the system was CPU-bound on object and association creation, a federation of over 60GB (1250 database files) was created and a first analysis performed. The software was reused in the X5 test beam, described below, and is to form part of the common framework developed in 1998.
A number of further developments are planned for 1998, including:
A framework for batch analysis of test-beam events was developed, which allowed:
Histogramming is based on the new histOOgram classes from LHC++. User-friendly management of persistent histograms and the usage of generic tags as a potential Ntuple replacement are yet to be tested.
These test-beam activities also allowed testing of development and test federations, based upon the production federation. This was performed using a small script which built a new federation from a reference federation containing the schema, plus a copy of some sample databases from the H2 federation.
The CMS X5 OO project is described in detailed in [35]. In common with the H2 OO project, described above, the goal was to create a general framework that could be used in all CMS test-beam areas. This was to include the complete chain from data acquisition to analysis, and hence tested many of the elements of the overall LHC++ strategy.
The X5 Analysis Tool consists of the following components:
The Objectivity/DB reformatter performs the following operations:
The results from both of these test-beam activities are considered successful by the CMS collaboration. A number of enhancements have been identified for the future, including a scheme for user-friendly management of histograms, and the adoption of the event-tag concept, as an "Ntuple-replacement". Both of these issues are being addressed in the context of LHC++ on the timescale of the 1998 test-beam runs.
The COMPASS collaboration expects to begin full data taking in the year 2000, with a preliminary run in 1999. The expected aggregate figure for the raw-events sample is 300TB per year.
These data will be processed in parallel with the data acquisition. In this stage the following main steps will be performed for all events: consistency checks of the data sample, a full reconstruction of the tracking systems (and momentum measurement in the two spectrometers), and part of the particle identification. The event information will be combined with the output of calibration runs (as alignment files) and with monitoring data. All the data, both from physics and test triggers, and monitor data from the slow-control system, will be stored in the same federated database. The output of this first processing stage will be some 60TB new information (DST). Typically one full reprocessing stage can be foreseen. The number of physicists involved in this stage will be relatively small - of the order of 10.
Different DST sub-samples will be extracted using only the DST information; these data will be stored on disk, requiring from 3 up to 20TB of disk space, depending on the physics programme. Some 50 concurrent users and many passes through the data are expected.
Due to the aggregate size of the data and the complexity of the analyses, the integration between the database technology and data mining tools is of primary interest for Compass.
The use of Objectivity/DB in a test-beam activity has been successfully demonstrated, although the data rates involved were clearly much lower than expected at the LHC or for NA45 and COMPASS. Further tests are planned during 1998, including data rates of around 3-4MB/second for NA45.
A proof-of-concept interface between Objectivity/DB and HPSS has been successfully demonstrated and all of the requirements for this version have been met. However, it appears unlikely that the current interface can satisfy the requirements for the production version, particularly in the area of data rates. Current thinking suggests that a simple staging interface is the most appropriate short-term solution to address these performance problems, and such an interface will be developed and tested shortly.
Although the language bindings defined by the Object Database Management Group offer a significant amount of functionality, it is clear that a general purpose standard cannot - by definition - address the specific needs of a given community. Examples include distributed database administration tools - where database administration is in any case outside the scope of the ODMG - site management tools and application-specific extensions.
In order to facilitate the use of and ODBMS in the HEP environment, a small amount of HEP-specific code has been developed. This code largely falls into two categories:
These elements are described in more detail below and in the LHC++ web pages.
A good introduction to C++ programming using ODMG-compliant databases can be found in [33]. The standard itself is described in [23].
In order to facilitate the development and support of persistent applications, a small number of helper classes have been developed. These classes, which are distributed as a set of class libraries as part of the overall LHC++ strategy, are referred to as HepODBMS.
The main goals of these classes are to:
HEPExplorer is a set of HEP-specific IRIS Explorer modules, developed in the context of LHC++, which help a physicist set up an environment to analyse experimental data, produce histograms, fit models and prepare data presentation plots using the IRIS Explorer framework. IRIS Explorer itself is a toolkit for visualisation of scientific data, built on top of industry standards such as OpenGL [30] and OpenInventor [31].
HEPExplorer consists of extensions to IRIS Explorer as follows:

Figure 41 - Prototype of a Database Browser in IRIS Explorer

Figure 42 - Prototype Persistent Histogram Browser

Figure 43 - Example Map Providing "Ntuple/PLOT"-like Functionality
ODBMS-based calibration databases have been developed both in BaBar and CMS. The basic functionality offered by the two systems is similar, and allows information to be retrieved based upon a "validity time". Calibrations that are stored in the database have a start and end validity time, as shown in the diagram below. The information that is typically stored in such a database includes:

Figure 44 - Calibration Validity Time
It is often the case that improved calibrations are found later, often for a sub-interval of the initial calibration validity range. Thus, one typically retrieves the most recently-inserted constants for the time instant specified.

Figure 45 - Multiple Calibrations
Although the ODMG standard defines a number of language bindings, it does not attempt to define database administration tools or interfaces. Databases such as Objectivity/DB provide both command-line tools and the equivalent programming language interfaces. However, these are typically not well adapted to the fully distributed environment. Hence, a tool for monitoring and administrating an Objectivity/DB federated database has been developed.
A first version of such as tool, named DRO_TOOL, has been built using the Objectivity/DB Java binding. Using this tool, the database administrator is able to observe, control, and manage the basic federated database functionality as well as the autonomous partition and data replication options.
The functionality of this tool is divided in three major groups:
The configuration group is handles the functionality of the autonomous partition and data replication options. In other words, it allows administrators to create or delete partitions, replicate database images, vary partitions on/offline, resynchronise images and so on.

Figure 46 - The Data Replication Management Tool
The control group allows an administrator to monitor and control the database servers. This is performed using the ObjectSpace Voyager product, which is also used to permit the tool to run both as a stand-alone application and as an applet in a web browser.
The statistics group offers the possibility to run a number of tests to check data transfer throughput of a given autonomous partition.

Figure 47 - Performance Statistics from the Management Tool
We note that the current version is very much a prototype, and was initially built as a test of the beta release of the Java binding. Once the Java binding has been officially released, we would expect to design a more powerful tool, based upon the requirements of the experiments and institutes involved. We also plan to evaluate any tools released with future versions of Objectivity/DB, including Java-based data browers, such as the Hudson package, developed by the distributors of Objectivity/DB in Germany.
Along with many other ODBMS vendors, Objectivity announced a Java binding to their database product during the past year. Although not scheduled for release until February-March 1998, we have made a number of tests of the beta version of the binding, including tests of language heterogeneity, i.e. the possibility to access persistent C++ objects from Java applications and vice-versa.
The Java binding offers us the possibility for tools such as those described above but also opens to us many ways to build flexible distributed architectures. Under the assumption that both C++ and Java analysis applications might exist in the future, it is important to understand any constraints imposed by the database binding and issues such as shared, cross-language schema.
As opposed to the C++ binding which relies on the application checking on the status code returned by any database operations, the Java binding uses exception handling. In addition, it supports garbage collection of objects that are no longer reachable.
Persistent Java objects can be clustered in two types of containers:
In a mixed-language environment, it is clearly necessary to use features, such as non-garbage collected containers, which are supported by all languages concerned.
The Java binding also provides a number of clustering classes. Two types of clustering are supported:
As opposed to the C++ binding, where the schema of persistent capable classes must first be defined, the Java binding permits the definition and creation of persistent classes at runtime. The type number allocation is therefore necessarily dynamic, solving the schema handling problems described in section 8.8.
The following figure shows the comparison in application development between the two bindings:
Figure 48 - Schema Definition in C++ and Java
As mentioned in the Object Database Standard ODMG 2.0, the Java Binding does not introduce new constructs specific to the database: the binding is perceived as part of the Java language according to the following principle.
"The ODMG Java Binding is based on one fundamental principle: the programmer should perceive the binding as a single language for expressing both database and programming operations, not two separate languages with arbitrary boundaries between them. " [23]
Even though the binding fully accomplishes this requirement from the standard, mixed-language applications still need to take into account the fact that not all types in Java have a one to one mapping to those in C++.
In the beta version of the binding, the oojVarray is not correctly mapped to its C++ equivalent, ooVarray. This should be corrected in the production release. In addition, there is currently no mapping between e.g. STL-based collection classes in C++ and Java collections. We understand that this issue will be addressed in a future release of Objectivity/DB.
The use of Java offers a number of interesting opportunities that extend the traditional client-server architecture of Objectivity/DB. Not only can multi-tier applications be readily implemented, but the use of Java agents provides a simple mechanism whereby the query can be moved to the data, execute and then move the results back to the host from which the user issued the query. For example, one could communicate from an applet activated from a Web browser to a server that in turn communicates with the database. The applet itself would not need to be linked against Objectivity/DB, nor would this software need to be installed on the client computer.
The recently announced Java binding to Objectivity/DB appears to be well suited to the development of tools for database management and configuration. The potential offered by mobile Java agents and Java's in-built network support is clearly worthy of detailed investigation. Our main requirement with respect to the Java binding is that of full inter-language operability with C++, which in turn requires a convenient mapping of the data types of the two languages.
Our experience with Objectivity/DB has, as predicted, resulted in a number of enhancement requests. These requests are fed-back to Objectivity by means of the regular RD45 workshops and the Objectivity user meetings. The list of outstanding enhancements is regularly reviewed, enabling us to follow up on these issues. A number of key enhancements have already been addressed, such as the need for STL-based persistent collection classes. Others are being worked on, such as an interface between Objectivity/DB and HPSS. It is our understanding that our main enhancement requests will all be addressed on an appropriate timescale; most, if not all, should be delivered in time for BaBar/COMPASS production in 1999. Clearly, we will continue to come up with new requirements, which we will prioritise and feedback to Objectivity. The main enhancement requests are discussed in more detail below.
Previous versions of Objectivity/DB supported - in common with a number of other ODBMS products - a persistent version of the Rogue Wave Tools.h++ collection classes. However, with the emergence of the STL, the need for STL-compliant persistent collections became clear. Such collections have been added to the C++ binding of version 2.0 of the ODMG specification. We therefore requested that Objectivity support such collection classes and suggested that no further releases of their persistent version of the Rogue Wave classes were required. As of the 5.0 release of Objectivity/DB, STL-based collection classes, using the ObjectSpace implementation, are supported as part of the product. The previous Rogue Wave classes have been dropped.
Although, as described in [12], there is reason to be optimistic concerning the evolution of storage capacity versus cost, it is still unlikely that multi-PB disk farms will be either affordable or practical at the time of LHC startup. To solve this problem, the RD45 collaboration has studied a number of possible ways whereby an ODBMS, and Objectivity/DB in particular, could be interfaced to a Mass Storage System. The MSS of choice at CERN for the foreseeable future is the High Performance Storage Server, HPSS. A training course on HPSS was held at CERN during October 1996, to which an Objectivity engineer participated. As a result of this course, a proposal for integrating the Objectivity/DB server (AMS) to HPSS via the HPSS client API was made. This proposal was further discussed between members of the HEP community, representatives of the HPSS consortium and Objectivity at a meeting at Objectivity's headquarters in May 1997. As a result of this meeting, Objectivity committed to producing a proof-of-concept prototype by the time of SuperComputing 97, held in San Jose in November 1997. The requirements for the prototype were as follows:
Production requirements for the Objectivity/DB - HPSS coupling (see section 6.2.2 on page *) come from a number of experiments, including BaBar at SLAC, NA45 and COMPASS at CERN, and of course the LHC experiments themselves. However, it is the timescales of the pre-LHC experiments that dictate when a production version must be ready. We have requested that a product be shipped no later than Q4 1998 - it being understood that extensive testing would be performed at a number of HEP sites during the latter half of 1998.
The current architecture of Objectivity/DB comprises:
The database page size is a constant for the entire federation, and is limited to 216 bytes.
Theoretically, this architecture permits federations of up to 1019 bytes. However, a number of practical limitations mean that such sizes will never be achieved. The most important limitation is that of the filesize. Here we feel that 100GB per database (file) is probably an upper limit - today 1-10GB is perhaps more reasonable. As a rule-of-thumb, we feel that it should be possible to migrate/recall a complete file (database) in 102 - 103 seconds. A file of 100GB would require an overall, if parallelised, I/O bandwidth of 1GB/second to reload in 100 seconds, whereas a 10GB file would require only 10MB/second to reload in 1000 seconds.
Using a maximum file/database size of 100GB - derived from the practical limits given above - federations of 6.5PB are then possible. This would not, however, be sufficient to store all data from a single LHC experiment. We have therefore asked for architectural changes that permit 100PB federations, without imposing arbitrary constraints, such as requiring containers or databases to be full to reach this limit.
Enhancements to the way in which the schema for persistent C++ classes are handled are required such that it is easy and transparent to develop applications across multiple federations. In other words, the developer should be able to build an application using a test federation, and not the production federation of a given experiment. This would require, for example, that no type numbers are hard-coded into the header files produced by the DDL processor. An acceptable solution would be to adopt a similar mechanism to that employed in the Java binding, where the type number of determined at run-time. Such changes should be compatible with currently supported features, such as support for named schema, classes of the same name, but in different named schema and for schema evolution.
In the current version of Objectivity/DB, access control, based on client credentials, is not supported. It is a requirement that such support be added to a future version of Objectivity/DB. Such access control must work consistently across the entire federation, be supported by both language bindings and tools and support both rôle-based (e.g. DBA) and user-based activities. Given the difficulty of implementing a consistent authentification scheme on all relevant nodes in a federation, exits, e.g. at database open time, where site-specific code may be called would be a valid, if not preferred, solution.
The C++ binding of Objectivity/DB is not fully ODMG-compliant in a number of areas. For example, the ODMG specifies methods d_activate() and d_deactivate(), which are called when an object enters or leaves scope. It is a requirement that fully ODMG-compliant bindings be provided for all of the languages of interest to HEP (C++, Java), although vendor extensions, for the purpose of performance, are acceptable if clearly marked as such. The Objectivity/DB documentation and training material should be based on the corresponding ODMG language binding.
Interest in running a version of Unix on cheap commodity processors - i.e. Intel Pentium and similar - has grown considerably over the past two years. The Linux operating system has clearly emerged as the preferred Unix for PCs within the HEP community. This has resulted in a number of informal requests to Objectivity to include Linux in the list of supported platforms. At the time of writing, it is our understanding that Objectivity will provide support for Linux in a future release for a well-defined operating system and compiler combination, such as Red Hat 4.1 and g++ 2.7.2.
In the context of RD45, CERN has associate membership of the Object Management Group (OMG) and is a reviewer member of the Object Database Management Group (ODMG). CERN is also represented in the IEEE Computer Society Executive Committee on Mass Storage, which is the body to which the various standards sub-groups report. As in previous years, CERN has only participated actively in the ODMG.
During the past year, the ODMG released V2.0 of its book, defining the object model for object databases and the various language bindings. This version of the standard is viewed as being a significant improvement over previous versions. The main changes are:
The release version 2.0 marked a turning point in the ODMG. The working groups, which previously met 8 times per year, now meet less frequently: as little as three times a year for the C++ working group. Effort will continue on issues such as Java, but many of the other bindings can now be considered more or less stable. Other changes are being discussed, such as a broadening of the ODMG charter to include persistent objects for relational mappings and application servers, rather than just object databases.
Although most of the ODMG meetings are held in the US, a meeting was held in July 1997 in Annecy. It was at this meeting that the priorities for post-V2.0 work were discussed. Of the enhancements proposed by the voting members, the main priorities, from the CERN point of view, were:
Given the reduction in ODMG activities, it is felt that CERN should reduce its participation accordingly, attending approximately 1 meeting per year, rather than 2 out of 4, as has been the case so far. However, it should be noted that the benefits of involvement in the ODMG go beyond the possibility of influencing the standard itself. Firstly, they allow us to have access to information concerning new ODMG features before they appear in the published standard, which allows us to put pressure on the suppliers to implement the new features in a timely manner. In addition, the ODMG meetings offer an excellent opportunity to meet developers from the various database vendors and also allow the work at CERN to be more widely exposed. For example, the joint ODMG/JavaSoft press release concerning the ODMG binding for Java contained statements from many database vendors, but only a single end-user site, namely CERN.
In addition to the work-items related to the LCB milestones and recommendations, the following activities are worthy of note.
As in previous years, we have held a series of RD45 workshops at CERN. These have been well attended by members of experiments both at CERN and outside, and also by people from other (non-HEP) ODBMS-based projects. These workshops have been extremely useful for discussing and sharing ideas and experiences between different groups, and for feeding back information on enhancement requests to Objectivity. However, as the number of participants has grown, the workshops have evolved from informal working sessions to more formal presentations. We have therefore started a series of mini-workshops, focussed on very specific issues. The first such workshop, discussing event collections and related issues, took place at CERN from February 19-25 1998.
As in previous years, an Objectivity/DB Developers' Conference was held in Santa Clara in May 1997. This conference, which offers an excellent opportunity to meet other developers, e.g. working on the MOTOROLA Iridium project or the Sloan Digital Sky Survey, and Objectivity engineers. As in the past, a paper on RD45 was presented, giving both a brief status of the project and a list of the main outstanding enhancement requests. At the conference, Objectivity announced their plans to support an interface to the HPSS system, which had only formally been requested a few days previously, at a joint meeting between various HEP laboratories and representatives from the HPSS consortium and Objectivity.
Based on larger volumes, we have now been able to obtain even better discounts than in the past. Similar discounts are also available for other HEP laboratories, several of which have acquired licenses for their own research programme (BNL, DESY, KEK, SLAC etc.) It has recently been agreed that the funding of license acquisition and associated maintenance costs for CERN experiments and collaborating institutes will be handled by the annual COCOTIME allocation of computing infrastructure. Experiments will be asked to estimate their requirements for the coming year and the necessary funds provided centrally through CERN. An amendment to the CERN contract with Objectivity has been negotiated, such that licenses may float across all institutes collaborating in the CERN research programme, within the limit of the total number of licenses available. The possibility of unlimited usage on the CERN site is being investigated.
During the past year, Objectivity introduced Web-based access to their support team. This allows registered users - a few people per experiment and members of the IT/ASD group - to query the internal problem database and see much more detailed information on the status of problem reports than was previously possible. This web site complements the standard e-mail support and the on-site consultancy organised principally via the RD45 workshops.
The Object Database market continued to evolve over the past 12 months. Of the events that have occurred, we believe the following to be of most significance:
During the second half of 1996 and 1997, Objectivity was apparently gearing up to go public. This manifested itself by a rapid increase in personnel and a new management structure. At the same time, a new division, AZIZA, was formed. The AZIZA division focussed on an Objectivity/DB-based web management tool of the same name. Although this tool had many interesting features in its own right and had the additional benefit of stress-testing many of the newer features in Objectivity/DB, it appeared to be diverting too many resources from the base product. During the second half of the year, the company split off the AZIZA division and restructured to focus on its database product. We believe that this restructuring was both important for CERN/HEP and necessary for the company. We have seen a marked improvement in response to our requests for enhancements since the change - the company policy is now clearly to focus on the high-end and become the Object Database of choice for this market segment. Clearly, the company will have a challenging year or two ahead, to truly establish itself as the leader in this market. Here, Objectivity as a company understand that satisfying the requirements of the HEP community - such as for an interface to HPSS - give them a significant advantage over their competitors.
We list below the experiments that are currently using or testing Objectivity/DB-based solutions.
The Alpha Magnetic Spectrometer is an experiment that will take data first on the NASA space shuttle - launch date May 1998 - and later on the International Space Station. The physics goals of the experiment are to perform an antimatter and dark matter search. The AMS collaboration has been using Objectivity/DB in test and plan to use it to store their production data, slow control parameters and NASA auxiliary data.
The Aleph collaboration has recently started an exercise whereby their mini-DST will be copied from its existing ADAMO-based format to Objectivity/DB. The purpose of the exercise is to gain experience with more "modern" analysis tools, i.e. those currently proposed as part of the LHC++ strategy.
The ALICE offline team is currently focussing on GEANT-4. At the time being, it has no explicit activities in the context of RD45. During 1998, it will begin to study Objectivity/DB-based solution, but again only in the context of GEANT-4.
The ATLAS collaboration is developing a number of prototype applications using Objectivity/DB in both on- and off-line communities. These include providing access to GEANT-3 simulation data stored in Objectivity/DB and will naturally evolve to storing the hits, digits, etc. from ATLAS GEANT-4 simulation and the results of the reconstruction in Objectivity/DB. In addition, attempts are being made to store the detector description in Objectivity/DB, and to develop various online applications, such as a run booking system and calibration database on top of the system.
The BaBar experiment at SLAC, due to start taking data in 1999, plan to use a combination of Objectivity/DB and HPSS in which to store their data. They currently expect to record some 200TB of data per year, all of which will be stored as persistent objects in an Objectivity/DB federated database. The majority of the associated storage would be managed by HPSS.
The Belle experiment at KEK starts taking data in fall of 1998 and plans to use Objectivity/DB to store the detector constants. They also hope to store mini/micro DST using Objectivity for rapid data analysis. Future mass storage plans are currently being developed in conjunction with the KEK computing centre.
The CDF Collaboration is currently evaluating Objectivity/DB as a possible data management system for their RUN-II, which is expected to begin in 2000 and collect about 450 TB of data within a two year period. Prototype databases containing the RUN-I data have been created, and valuable experience has been gained in the area of optimisation of the database parameters depending on the data model used. The data model proposed for RUN-II is similar to that of ATLAS or Babar, with all data (or, initially, just the reconstructed information) stored as persistent objects in an Objectivity/DB federated database. It is anticipated that an HSM (HPSS is proposed at present) will be used to manage the tape storage. An interface between Objectivity/DB objects and the currently used YBOS (TRYBOS) records has been developed and tested, as many reconstruction algorithms will still be FORTRAN based. At present, a prototype database that realises this new data model is being developed. The CDF Collaboration is expected to make the final decision in May 1998.
The CHORUS collaboration is using Objectivity/DB for an online emulsion scanning database. There are plans to use the same application at a number of collaborating institutes. As a by-product of this activity, they will also evaluate Objectivity/DB as a potential solution for the proposed TOSCA experiment.
The CMS collaboration is using Objectivity/DB for a number of prototype applications, including the test beam activities, discussed earlier in this report. As with ATLAS, the current baseline assumption is that the event and associated data will be stored as persistent objects in an object database, combined with a mass storage system. Currently, it is assumed that this will be Objectivity/DB together with HPSS.
The COMPASS collaboration plans to use Objectivity/DB to store all of its experimental data (300TB raw data per year). In the framework of the prototyping of the COMPASS computing farm, a test at full data rate (35 MB/s) will be performed at the end of 1998. Data from the central data recording will be sent to a federated database and the integration with HPSS will be also tested.
The LHCb collaboration are currently writing their technical proposal and notes. During the current year, the main emphasis will be on GEANT-4 related issues, including the design of the geometry database, the implementation of GEANT-4 within LHCb and designing high-level event classes. General topics for 1998 include event collections, replication and remote access to data and C++ and Objectivity/DB training.
A project has recently been proposed whereby existing LEP data would be archived for a period of some 30 years. Over such a long period of time, it is assumed that little of today's computing environment would remain. In particular, it has been assumed that the current CERN Program Library, existing operating systems and the Fortran programming language will not longer exist. It is estimated that a few TB of data per experiment will need to be converted and stored, giving a total data volume of perhaps 20TB. It is currently assumed that the data will be stored in Objectivity/DB and that a demonstration of an analysis against the database made.
NA45 have been using Objectivity/DB in production since early 1996. A number of production runs have been performed, with a total data volume of some 30GB. For 1998, their plans are to make tests of Objectivity/DB together with central data recording and HPSS, in preparation for their 1999 data taking run, where some 30-50TB of data are expected.
NA48 have a detector configuration management application based on Objectivity/DB. Recently, a new project has been initiated to optimise access to physics data by storing compact micro-DST information and perhaps more in Objectivity/DB. This project is similar in conception to that undertaken by ZEUS.
The RHIC experiments at Brookhaven plan to adopt a common strategy for their data storage. The current plan is that this will be based, as at other laboratories, on a combination of Objectivity/DB and HPSS. Experiments involved include BRAHMS, PHENIX, PHOBOS and STAR. Data volumes for both PHENIX and STAR are expected to be around 200-300TB/year.
The ZEUS experiment has built a tag database on Objectivity/DB, which has been used in production since the end of 1997. This database has been built from the physics data in the ADAMO [24] database. The new system offers considerably more flexibility than was possible in the past. For example, instead of selecting on a combination of 128 bits, the definition of which changed with time, the user is now able to select using a more meaningful predicate string. In addition, predefined "named" samples can be defined, convenient for frequently used samples. The new system has proved popular with the physicists, not least as it offers improved performance - a direct result of reading only the required data.
In the original RD45 project proposal (CERN/DRDC 94-50), three phases were anticipated. The first phase, which lasted approximately 6 months and at the end of which time an interim status report was made to the LCRB, was devoted to obtaining a better understanding of the problem and a preliminary list of requirements. These requirements are listed in the status report for the first year [11]. This was followed by a second stage, in which detailed prototyping and performance comparisons were undertaken [4] [5] [6] [7]. It is the opinion of the RD45 collaboration that both of these two phases have now been completed. Furthermore, we believe that the project has achieved its stated goals of identifying a solution to the object persistence problems of the LHC experiments. We suggest that the project now enter the third phase foreseen at the time of the project proposal, namely that of implementation. This phase is expected to consist of two elements:
The LCB/RD45 project has been studying issues related to the problems of object persistency since 1995. As an LCB project, the focus has clearly been on the needs of the LHC experiments. As described in the various RD45 status reports, a strategy has been built up, in close collaboration with the experiments, based on the use of two commercial components - namely Objectivity/DB and HPSS - with a small number of HEP-specific extensions. This strategy has been adopted by a number of pre-LHC experiments, including COMPASS and NA45 at CERN, BaBar at SLAC and the RHIC experiments at Brookhaven. In addition, there are numerous projects, both at CERN and outside, based upon Objectivity/DB alone. These include activities in CHORUS, NA45, NA48 and the LEP experiments. Finally, as agreed at the 1997 COCOTIME review, there is a need for production Objectivity/DB federations in 1998 for both ATLAS and CMS: systems on which to run these services are being acquired now by IT/PDP.
In summary, there are many requests for general-purpose production services at CERN, based upon Objectivity/DB. We discuss below how these services could be established and how the research issues related to the object persistency services for the LHC experiments could be addressed.
We propose that IT division begin to offer data management services, based upon Objectivity/DB and HPSS. IT/ASD group offer ODBMS services, similar to the current services based on ORACLE for RDBMS applications. IT/PDP would be responsible for issues related to HPSS and the data servers on which the Objectivity/DB server and HPSS client would run. Clearly, a detailed service definition needs to be drawn up by IT/ASD and PDP groups. For the purposes of this document, we focus on those issues that will be handled by IT/ASD group.
IT/ASD group would:
We believe that the RD45 project has attained its primary goal of identifying a solution to the problem object persistency. Clearly, there are many issues that still need to be pursued, such as the outstanding list of enhancement requests, including the production version of the Objectivity/DB - HPSS interface. In addition, there are many associated issues that have not yet been addressed, such as methods of optimising data access and management in a fully-distributed environment, the impact on networking costs, new technologies such as agents, and so forth. Furthermore, neither Objectivity/DB nor HPSS have yet proven themselves in production environments with data volumes of the order of many hundreds of TB. In this respect, we anticipate that much will be learnt from the experiences of high-data volume, pre-LHC experiments such as BaBar, COMPASS, CDF, D0, PHENIX, STAR and others.
Many of these issues are best addressed as part of the general Objectivity/DB services, such as those proposed. Others require further research and development, but perhaps on somewhat longer timescales than in the past.
We believe that the RD45 project will have attained its primary goal of identifying a solution to the problem of providing object persistence services for event and other data for the LHC experiments by the time of the April 1998 LCB review. We recommend the setting up of production, ODBMS-based services, using the technologies and solutions identified in this work. Further investigation into topics identified by the LHC experiments and/or the LCB would continue to be addressed on a timescale compatible with the needs of the experiments and of the available resources.
As described above, we believe that the main research phase of the RD45 project, namely to research into and propose solutions to the "object persistency" problems of the LHC experiments, to have been completed. A logical next phase would be the setting up and running of production services for general CERN use. Even though this phase would be labelled as "production", there would clearly be on-going developments, continued discussion with Objectivity on enhancement requests, further prototyping of ODBMS-based applications and class libraries and so forth.
The key achievements of the RD45 project over the past years have been as follows:
We see the natural evolution of this trend as being:
1998: together with IT/PDP group, set up production services up to 10TB
1999: production usage by BaBar, BELLE, COMPASS, NA45, BRAHMS, PHENIX, PHOBOS, STAR, ZEUS …
Over the past years, the fraction of RD45's activities devoted to production-related work has increased significantly. There are now several experiments at CERN using Objectivity/DB in production, and both NA45 and COMPASS intend to use it, together with HPSS, for their production runs in 1999. Preparing for these runs will be an important component of the activities of the coming year. However, we believe that these are best covered outside of an R&D project, e.g. as part of a standard service offered by IT division, in collaboration with the experiments.
We therefore suggest the following activities for the next 18 months:
The RD45 project was approved in 1995 to investigate and propose solutions to the problems of handling the persistent objects of the LHC experiments: event data, calibration data, histograms and so forth. Strong emphasis has been placed on the potential use of standards-conforming, widely-used (commodity) solutions. At an early stage of the project, a potential solution, based upon an Object Database Management Group (ODMG)-compliant Object Database (ODBMS), coupled with a Mass Storage System (MSS) built according to the IEEE Computer Societies Reference Model for Mass Storage Systems, was identified. This potential solution has been the primary focus of our activities, although we have continued to monitor and evaluate alternatives. The preferred components of this solution are built on top of Objectivity/DB and HPSS, coupled with a small quantity of HEP-specific code. This solution has adopted in part (i.e. Objectivity/DB only) or in its entirety by a growing number of experiments at CERN and outside. Objectivity/DB is used for production purposes by several experiments and will, together with HPSS, form the basis of the event storage for many of the experiments that are due to start taking data in or around 1999.
The manpower savings that have been possible by adopting such a solution are already significant. In addition, the functionality provided is much greater than that offered by previous, HEP-specific solutions. The combination of these factors will help us to cope with the much greater volumes of data that we will have to deal with, with increased flexibility, whilst remaining within foreseen man-power constraints.
We list below the milestones and recommendations from previous reviews of the RD45 project.
Initial Milestones and Recommendations (1995)
RD45 (P59) should be approved for an initial period of one year. The following milestones should be reached by the end of the first year.
It should be noted that the milestones concentrate on event data. Studies or prototypes based on other HEP data should not be excluded, especially if they are valuable to gain experience in the initial months.
Glossary
ADAMO - a system, developed in the ALEPH collaboration, based on the Entity-Relationship (ER) model.
ADSM - A storage management product from IBM
AFS - the Andrew (distributed) filesystem
CASE - Computer Aided Software Engineering
CORBA - the Common Object Request Broker Architecture, from the OMG
CORE - Centrally Operated Risc Environment
CWN - Column-wise Ntuple
CTP - Computing Technical Proposal
DFS - the OSF/DCE distributed filesystem, based upon AFS
DMIG - the Data Management Interface Group
EDMS - Engineering Data Management System
GB - 109 bytes
HPSS - High Performance Storage System - a high-end mass storage system developed by a consortium consisting of end-user sites and commercial companies
IEEE - the Institute of Electrical and Electronics Engineers
KB - 210 (1024) bytes - normally referred to as 103 bytes
LCB - LHC Computing Board
LCRB - LHC Computing Review Board
LIGHT - Life Cycle Global Hypertext
MB - 106 bytes
MSS - a Mass Storage System
NFS - the Network Filesystem, developed by Sun
ODBMS - an Object Database Management System
ODMG - the Object Database Management Group, a group of database vendors and users that develop standards of ODBMSs
OID - Object Identifier
OMG - the Object Management Group
OOFS - the Objectivity/DB Open FileSystem
OQL - the Object Query Language defined by the ODMG
ORB - an Object Request Broker
OSM - Open Storage Manager: a commercial MSS
PAW - the Physics Analysis Workstation
PETASERVE - an MSS based upon OSM
PB - 1015 bytes
RWN - Row-wise Ntuple
SHORE - Scalable Heterogeneous Object REpository
SQL - Standard Query Language: the language used for issuing queries against databases
SSSWG - the Storage System Standards Working Group
STL - the Standard Template Library: part of the draft C++ standard albeit in a modified form
TB - 1012 bytes
TOOLS.H++ - a former de-facto standard container/collection class library, largely made redundant by the collections provided in the standard C++ library
VLDB - Very Large Database
VLM - Very Large Memory
VMLDB - Very Many Large Databases
XBSA - the draft X/Open Backup Services Application Program Interface