EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH

CERN/LHCC 97-6

LCB Status Report/RD45

7 February, 1997













STATUS REPORT OF THE RD45 PROJECT




















The RD45 collaboration

CERN, Geneva, Switzerland

This document has been produced for the March 1997 LCB review of the RD45 project. In this paper, we present the status of the project, including a summary of the responses to the milestones set at the 1996 review by the LCRB, suggestions for future activities and a risk analysis of the current RD45 strategy.

In addition, we describe activities undertaken within various experiments and projects, including NA45, ATLAS, CMS, ALICE, BaBar, BELLE and Zeus.

This documented is complemented by more detailed reports covering the individual milestones.

RD45 documents may be obtained through the Web (see http://wwwinfo.cern.ch/asd/cernlib/rd45/index.html) or via e-mail request to the spokesman.






























The RD45 Collaboration

David Malon, Edward May

Argonne National Laboratory,

Argonne, Illinois, USA

Ryszard Zybert

School of Physics and Space Research

University of Birmingham, UK

Martin Purschke

Brookhaven National Laboratory, USA

Eva Arderiu Ribera, Jacek Becla, Pavel Binko, Gabriele Cosmo, Olivier Couet, Dirk Duellmann, Bernardino Ferrero Merlino, Gunter Folger, Fabrizio Gagliardi, Simone Giani, Marcin Nowak, Jamie Shiers (spokesman)

CERN/CN

Geneva, Switzerland

Elisa Cargnel, Michel Hansroul, Vincenzo Innocente, Lassi Tuura, Ian Willers

CERN/ECP

Geneva, Switzerland

Martin Gasthuber

DESY/Hamburg

Boris Khomenko

JINR, Dubna

D.J.Candlin,

University of Edinburgh, UK

Otto Schaile

Freiburg University

Freiburg, Germany

Predrag Buncic

GSI, Darmstadt, Germany

Andreas Pfeiffer, Carsten Voigt

University of Heidelberg

Boris Klochkov, Viatcheslav Ivanovich Klyukhin

Institute for High Energy Physics

IHEP, Protvino, Moscow Region, Russia

Chris Day, David Quarrie, Craig E. Tull

Lawrence Berkeley National Laboratory

Berkeley, CA, USA

Nobuhiko Katayama, Youhei Morita

KEK, Oho, Tsukuba, Ibaraki, 305 Japan

Aneta Baran, Staszek Jagielski, Piotr Malecki, Andrzej Sobala, Witold Wajda

Institute of Nuclear Physics

Krakow, Poland

Christian Arnault, RD Schaffer

Laboratoire de l'Accelerateur Lineaire

Orsay, France

Kors Bos, Patrick J. Hendriks

NIKHEF, Amsterdam, The Netherlands

Giovanni Organtini

Universita' di Roma "La Sapienza" and Istituto Nazionale

di Fisica Nucleare - Sez. di Roma.

P.le Aldo Moro, 2 - 00185 ROMA, Italy

S.M.Fisher

Rutherford Appleton Laboratory, UK

Julius Hrivnac

Institute of Physics, Academy of Sciences of the Czech Republic, Prague

Traudl Hansl-Kozanecka

CEA/Dapnia, Saclay, France

Sunanda Banerjee

Tata Institute of Fundamental Research, Bombay, India

Anwarul Hasan

ETH Zurich, Switzerland and University of Cyprus, Cyprus

Thomas S. Ullrich

Yale University, USA

TABLE OF CONTENTS

1. Executive Summary 11.1 Summary of Activities During Second Year 11.2 Conclusions 22. Introduction 33. Overview of the First Year's Activities 34. Overview of Activities During the Second Year 55. Milestones Set at the March 1996 LCRB Review 66. Collaboration with ATLAS and CMS on Their CTPs 77. Provision of Persistence Service for GEANT-4 78. Milestone 1 - the Impact of Using an ODBMS 88.1 Object Model Issues 88.1.1 The ODMG Object Model 88.1.2 Differences Between the ODMG Object Model and the C++ Object Model 98.1.3 Impact on Existing Object Models and Modelling Guidelines 98.1.4 Conclusions 98.2 Issues Related to the ODMG and Objectivity/DB C++ Binding 108.2.1 Impact on Existing Code 108.2.2 Conclusions 108.3 The Use of an ODBMS with Third-Party Class Libraries 108.4 CASE Tools, Object Databases and Persistent Applications 118.4.1 Classify/DB 118.4.2 Rational/ROSE 118.4.3 StP 128.4.4 Conclusions 128.5 The Impact of an ODBMS on Object Granularity 128.6 End-User Issues 138.6.1 Access to the Database Run-time Environment 138.6.2 Access to Database Catalogue 138.6.3 Summary 138.7 Requested Enhancements 138.8 Conclusions from Milestone 1 149. Milestone 2 - Object Database Features 159.1 Schema Evolution 159.1.1 Areas of Potential Use in HEP 169.1.2 Prototype Investigations 169.1.3 Requested Enhancements for Schema Evolution 179.2 Object Versioning 179.2.1 Areas of Potential Use in HEP 189.2.2 Prototype Investigations 189.2.3 Requested Enhancements for Object Versioning 189.3 Data Replication 199.3.1 Areas of Potential Use in HEP 199.3.2 Prototype Investigations 199.3.3 Requested Enhancements for Data Replication 209.4 Conclusions from Milestone 2 2010. Milestone 3 - Performance Comparison with PAW+Ntuples 2110.1 Current Practice 2110.2 ODBMS Capabilities 2210.3 ODBMS versus Ntuples 2210.4 Raw Performance Measurements 2310.4.1 Read/Write Performance of Test-Bed System 2310.5 Comparisons with PAW and Ntuples 2310.5.1 Full Tag Comparisons 2410.5.2 Reduced Tag 2510.5.3 Queries Using Indices 2610.6 The Effectiveness of Using an ODBMS 2610.7 Conclusions 2711. Risk Analysis 2811.1 Support for Multiple Federations 2811.2 Number of Databases per Federation 2811.3 Number of Containers per DB, Size of Containers 2911.4 Navigation Across Multiple Containers and Databases 2911.5 Very Large Numbers of Associations 3011.6 Very Large Collections 3011.7 Re-clustering and the Effect on Existing Collections 3011.8 Handling Multiple Containers and/or Databases 3111.9 Database Administration Issues 3111.10 Alternative ODBMS Products 3111.11 Alternative Mass Storage Systems 3211.12 Conclusions 3312. Use of Objectivity/DB in HEP and Related Disciplines 3413. Collaboration with Other Projects 3413.1 ALICE 3413.2 ATLAS 3513.3 CMS 3513.4 CERES/NA45 3513.5 GEANT-4 3513.6 AMY 3613.7 BaBar 3613.8 BELLE 3713.9 ZEUS 3714. Standards Activities 3714.1 ODMG-related Activities 3715. Objectivity/DB Workshops 3816. Objectivity/DB User Meeting 3917. ODBMS to MSS Coupling 3917.1 Integrating an ODBMS with an MSS at the Filesystem Level 3917.2 Integrating Objectivity/DB with HPSS 4017.3 Conclusions 4018. Other Database Developments 4019. Future Activities 4120. Proposed Milestones for 1997-1998 4121. Conclusions 4222. Glossary 4323. References 45Executive Summary

The RD45 project is investigating solutions to the problem of providing persistency to physics data of the LHC experiments, assumed to be in the form of (collections of) objects. At the end of the first year, a potential solution, based on standards-conforming products, was presented. Key elements of this solution, which proposes the use of an Object Database Management System (ODBMS) that conforms to the Object Database Management Group (ODMG) standards [20], together with a Mass Storage System (MSS) that is built according to the Reference Model for Mass Storage Systems developed by the IEEE Computer Society (IEEE MSS), have already been used in production for the storage and management of High Energy Physics data for the NA45 experiment, as well as in the GEANT-4 (RD44) project. Prototyping is also going on at other HEP laboratories with the same or similar technology, including at DESY (Zeus), KEK (Belle) and LBL (BaBar).

In this report, we summarise the activities of the RD45 project during the past year, including progress on the milestones set by the LCRB, experience with NA45 and other projects such as GEANT-4, together with proposals for future activities.

  1. Summary of Activities During Second Year

During the past year the RD45 collaboration has:

  1. Conclusions

The proposed solution to object persistence for LHC event data, based upon a commercial ODBMS and MSS, has been accepted as the baseline solution for both ATLAS and CMS,

pending the final results of the RD45 investigations. The performance and scalability of such a solution, together with the associated risks, have been analysed. Whilst more work needs to be done, particularly in the area of MSS integration and efficient data access, we believe that this solution is still by far the most promising of those considered, offering both the functionality and scalability that is required.


























  1. Introduction

The RD45 project, which was approved in February 1995, is investigating solutions to the problems related to providing persistent object services for the LHC experiments. This includes, but is not limited to, fully distributed heterogeneous architectures capable of scaling, at least architecturally, to the multi-PB region. Various potential solutions to this problem were investigated as part of the first year's activities, including language extensions, object managers and full-blown Object Databases (ODBMS). It was the conclusion of the first year that only full ODBMSs provide sufficient functionality as to satisfy a preliminary list of HEP requirements and that only a few of the currently available ODBMS products have an architecture that is sufficiently scaleable as to meet our needs.

During the past year, the RD45 collaboration has focused on ODMG-compliant solutions, and has demonstrated the use of a standard, off-the-shelf ODBMS product for storing and managing HEP event data in a production environment.

Despite the focus on ODBMSs, RD45 continues to follow progress in other areas, such as persistent object managers, e.g. SHORE, Object-Relational Databases, including object-oriented offerings from the traditional relational (RDBMS) vendors and so forth.

RD45 continues to participate in the Object Database Management Group (ODMG) - the standards body that defines and maintains the various standards for ODBMSs, as well as the OMG, and the IEEE Computer Society Executive Committee on Mass Storage Systems (IEEE MSS EC).

  1. Overview of the First Year's Activities

During its first year, the RD45 collaboration investigated several different approaches to solving the object persistency problem, including language extensions, persistent object managers and ODMG-compliant ODBMS products.

Using the definition of an ODBMS from the "Object-Oriented Database Manifesto" [12], it was our conclusion that HEP requires a system offering all of the facilities listed as mandatory in this manifesto, all of the features listed as optional, and indeed several others besides!

On the other hand, both language extensions and known persistent object managers, both HEP-specific and non-HEP, (see Cattell [10] for a list) impose major restrictions, such as a lack of support for platform or language heterogeneity, no support for the full C++ language model (e.g. no virtual functions), lack of ODMG compliance, lack of scalability and so forth. In addition, it is the conclusion of the RD45 collaboration that such systems would require considerably more man-power to extend and maintain than an existing solution, already deployed to many hundreds of thousands of end-users.

RD45 was also able to identify an ODBMS product with an architecture offering the required scalability, and this product has been used for all of the prototypes built during the past two years, as well as for the NA45 physics production run.

Although the choice of a system for the current prototyping is clearly de-coupled from the eventual choice of a system for the production phase of the LHC experiments, long term support issues are extremely important and should not be under-estimated. The lifetime of the LHC experiments will probably be some 10-15 years, perhaps more, and thus we need guaranteed support until 2020/2025. Existing object managers, such as SHORE, are research projects which are unlikely to last more than a few years before a follow-on project is launched - long-term support is not under consideration by these groups. On the other hand, products such as commercial ODBMSs are used to build production systems, such as telecoms applications, for which long-term support is mandatory. Nevertheless, the RD45 collaboration continues to monitor fully-featured ODBMSs, the object-extensions being added by the traditional RDBMS vendors, as well as simpler approaches, including persistent object managers and places great emphasis on avoiding dependence on a single product.

The activities of 1995 can thus be summarised as follows:

Further details can be found in the RD45 status report for 1995, CERN/LHC 96-15 [1], also available via

  1. Overview of Activities During the Second Year

During the past year, and in addition to the work on the milestones and recommendations described below, the RD45 project has performed an initial risk analysis of the current strategy, and established a joint project with Digital, aimed at providing a test-bed where detailed performance and scalability measurements can be made. This test-bed has been used to make measurements directed at milestone 3 [6], described in section 10 on page 21, but will also be used to understand issues related to parallel filesystems, very large memories, and so forth.

Numerous presentations of the project have been made both at CERN and outside, including at external laboratories such as DESY and KEK.

The number of full-time equivalents at CERN working on the project has approximately doubled.

We have continued to work within the framework of the ODMG to ensure that the future evolution of the ODMG standard satisfies HEP requirements. Additional features that have been requested as part of the V2.0 version of the ODMG standard include support for distributed databases, read/write access to the database schema, schema evolution and user data replication. In addition to the standards-related activities within the ODMG, we have continued to make our requirements available to the ODBMS vendors, and to Objectivity in particular. The latter has been achieved through regular workshops at CERN, and through the Objectivity user group. Requested enhancements to Objectivity/DB include an extended Object Identifier (OID), an interface to the High Performance Storage System (HPSS) - a Mass Storage System - and support for Parallel Query and Very Large Memories.

In addition to this status report, and three supporting documents [4,5,6], each corresponding to one of the three milestones set by the LCRB, we have produced a draft set of guidelines for Objectivity/DB Database Administrators [9], available via

and two internal documents, which can be obtained upon request:

  1. "Where are Object Databases Heading?" [7] - a prediction of the future of various O(R)DBMS products,
  2. "Why Objectivity/DB?" [8] - a justification of the choice of Objectivity/DB for the current prototyping.

  1. Milestones Set at the March 1996 LCRB Review

The RD45 project was reviewed by the LCRB in March 1996, and recommended for continuation for a further year, with the following milestones and comments:

The project has made excellent progress in identifying and applying solutions for object persistence for HEP based on standards and commercial products. The milestones set (as revised by the LCRB in November 1995) have been met.

The LCRB agrees with the program of future work outlined in the RD45 report (CERN/LHCC 96-15) [1] [ and sets the following milestones for the second year of the project: ]

  1. Identify and analyse the impact of using an ODBMS for event data on the Object Model, the physical organisation of the data, coding guidelines and the use of third party class libraries
  2. Investigate and report on ways that Objectivity/DB features for replication, schema evolution and object versions can be used to solve data management problems typical of the HEP environment
  3. Make an evaluation of the effectiveness of an ODBMS and MSS as the query and access method for physics analysis. The evaluation should include performance comparisons with PAW and Ntuples

In addition, the project was asked to:

Detailed reports [4], [5], [6] describing the work on the LCRB milestones are available in printed form from the spokesman or via e-mail request to Heplib.Support@cern.ch.

Web versions of these documents can be found via the web address listed below.




  1. Collaboration with ATLAS and CMS on Their CTPs

The RD45 status report and predictions concerning Object Databases and Mass Storage Systems ("Object Databases and Mass Storage Systems - The Prognosis", CERN/LHCC 96-16 [3]) have been used by both ATLAS and CMS in the preparation of their Computing Technical Proposals [24] [25], as have the reports of the two technology tracking teams in which RD45 is involved. In addition, we have participated in most of the regular meetings of these working groups, made numerous presentations and commented on the draft documents. It is expected that the work on the current and future RD45 milestones will be referenced in future updates of the CTPs and that outstanding questions from these working groups will strongly influence the future activities of the project.

  1. Provision of Persistence Service for GEANT-4

We continue to work closely with the GEANT-4 (RD44) collaboration, with whom regular meetings are held, to work on the persistent aspects of GEANT-4. An Objectivity/DB course was arranged for RD44 members earlier this year, and several sessions at Objectivity workshops have been devoted to understanding the impact of using an ODBMS for persistence in GEANT-4, the object model and performance. We have provided technical assistance in introducing persistence to the GEANT-4 "Hits" class, We have also investigated the suitability of an Express/ODL converter, available from Micram, the distributors of Objectivity/DB in Germany. As part of this support activity, we have acquired sufficient run-time licenses for Objectivity/DB for the users of the first prototype of GEANT-4.

Further details are given in section 13.5 on page 35.











  1. Milestone 1 - the Impact of Using an ODBMS

The work on this milestone, namely to "Identify and analyse the impact of using an ODBMS for event data on the Object Model, the physical organisation of the data, coding guidelines and the use of third party class libraries", has been divided into issues related to the following:

As the physical organisation of the data has a strong impact on performance, the bulk of the work on this issue has been covered in the context of milestone 3 [6]. The work on milestone 1 [4] has been limited to high-level issues, such as object granularity.

Although this milestone is largely oriented towards developer issues, we comment briefly on the impact on end-users of using and ODBMS for production applications.

The work on this milestone is covered in more detail in [4].

  1. Object Model Issues

In this section, we describe the main features of the ODMG Object Model and compare it with the C++ object model.

  1. The ODMG Object Model

The ODMG object model defines persistence (for C++), to be by inheritence. That is, for a class to be persistence-capable, it must derive from the ODMG base-class d_Object. Instances of such classes may be either persistent, i.e. stored in the database, or transient, i.e. deleted either explicitly or automatically when they go out of scope. Transient classes are in any case limited to the lifetime of the creating process. Whether an instance is persistent or transient is decided at object-creation time, through an over-loaded new operator.

The model also includes fixed-length implementations of the basic types, such as int, float, double, etc. (e.g. d_Short, d_Long, d_Float, d_Double.) These are required to provide support for heterogeneity, as implementations of the basic C++ types can vary from platform to platform. Persistent object references are provided through a type-safe smart pointer, d_Ref<T>. Associations, both uni and bi-directional are provided, as are container classes and various utility classes, such as date, timestamp and interval.

  1. Differences Between the ODMG Object Model and the C++ Object Model

The ODMG object model extends the standard C++ object model in a number of respects:

It also imposes a number minor of constraints over the standard model, namely:

  1. Impact on Existing Object Models and Modelling Guidelines

A number of applications, which, for historical reasons were not designed from the start with persistence in mind, have been ported to an ODBMS without major problems. These include applications from NA45 and GEANT-4, as well as the histogram classes being developed in the context of LHC++.

The following guidelines are recommended for creating persistent object models:

Essentially, these guidelines may be condensed into a single rule, namely:

  1. Conclusions

The ODMG object model can be considered to extend the C++ object model in a very natural way. It has the advantage of being language-independent, and provides additional (required) functionality, such as associations, which would otherwise have to be implemented by 3rd-party class libraries. Implementing persistence by inheriting from a special base class does not pose any major problem for the definition of an object model typical of HEP event data.

  1. Issues Related to the ODMG and Objectivity/DB C++ Binding
  2. Impact on Existing Code

There are a number of code changes that need to be made to transient applications to make them persistent using an ODMG-compliant ODBMS. By far the most significant of these relates to the use of C++ pointers, which must be avoided in the case of persistent classes. References to persistent-capable objects must be made using the ODMG-defined d_Ref<T> smart pointer. In most cases, however, it is sufficient to change only the type definitions of the pointers concerned - the user code remains largely unaffected. Pending the support for standard C++ (STL) containers, changes also need to be made to switch from the transient, typically Rogue Wave, containers, to those provided by the ODBMS. In addition, one must also design and implement appropriate clustering and locking strategies and handle the database session.

A prototype of a small package to both simplify the porting of new applications, and also to insulate applications from vendor-specific features, is in use within the RD45 collaboration, and a production version of this software will eventually be made available to the HEP community, as part of the LHC++ framework that is currently being built up.

  1. Conclusions

The implementation of persistence provided by an ODMG-compliant ODBMS is a very natural extension of the normal heap allocation performed by C++. The impact of introducing an ODBMS to existing C++ applications is very small compared to traditional I/O systems, which require explicit I/O calls to be coded. A small set of design rules, described in detail in the supporting document for milestone 1, are sufficient to port existing applications or to design new ones that will use an ODBMS for persistence. The provision of a small layer of middle-ware allows a high-level interface to the database to be developed, isolating the application from vendor-specific details whilst also simplifying locking and clustering strategies.

  1. The Use of an ODBMS with Third-Party Class Libraries

In most cases, there is no incompatibility between 3rd party class libraries, such as graphics and GUI libraries, and an ODBMS. We have built a number of prototypes that use libraries such as OpenInventor, or work in frameworks such as IRIS Explorer, where the introduction of a database has been transparent.

The exception to this rule is that of libraries such as RogueWave's Tools.h++, the de-facto standard for collection classes, and the forth-coming standard C++ library - in other words, collection and containers classes.

Today, persistent versions of Tools.h++ are provided for a number of databases, including Objectivity/DB. However, given the emergence of the Standard Template Library (STL), adopted into the draft C++ library, albeit with a number of changes, the long-term strategy should clearly be to use this library, rather than Tools.h++.

V1.2 of the ODMG standard has made some initial steps in migrating towards full STL compliance, and V2.0 will introduce significant enhancements in this respect.

Until STL-support is provided by Objectivity/DB, we have provided two container classes to assist in the migration of applications from transient to persistent.

We have requested that Objectivity support the ODMG-defined STL subset in the next point release of the product, expect in mid-'97.

  1. CASE Tools, Object Databases and Persistent Applications
  2. Classify/DB

Classify/DB is a product of Micram Technology GmbH, the distributors of Objectivity/DB in Germany. It is the only CASE product designed explicitly to work with Objectivity/DB - or indeed any ODBMS - and was the first product to support any of the ODMG bindings.

Classify/DB is based upon the OMT notation, and is capable of generating the ODMG Object Definition Language (ODL), as well as the DDL used by Objectivity/DB. It is also supports reverse engineering, and can handle a variety of other formats, in addition to those mentioned above, include Step/EXPRESS.

The fact that Classify/DB uses an Objectivity database to store the model information is a strong plus, and it demonstrates that CASE tools capable of generating ODBMS schema can indeed be produced. However, it is our opinion that the same tool should be used for both persistent and transient applications, and that Micram do not have the resources to compete with larger companies such as Rational.

  1. Rational/ROSE

ROSE is a CASE tool produced by the Rational company - that currently employs many of the leading authorities on object-oriented analysis and design. It is the tool used by the CMS and GEANT-4 collaborations. Although ROSE does not directly support the generation of ODMG ODL, we are aware of several customisations of the product that do enable ODL to be produced indirectly. Two of these customisations are available commercially - through the distributors of Objectivity/DB in Japan and through the distributors in Italy. Although we have looked at both of these customisations, they both appear somewhat baroque for something that should be relatively straight-forward. Indeed, some preliminary studies within CMS suggest that the necessary customisation of the ROSE output filter to produce ODL would be simple to perform, although it would clearly be desirable to have support from the product directly.

  1. StP

StP is the CASE tool used by the ATLAS collaboration. StP includes support for requirements definition, object modeling, information modeling, structured development and testing, with code generation support for C++, Smalltalk, OMG IDL, Forte TOOL, Ada, C and SQL for the main relational database management systems (RDBMS). StP stores model information in a Sybase (an RDBMS) and, like ROSE and other CASE tools, can be customised. Given that the ODMG's ODL is a superset of the OMG's IDL, which in turn is based upon C++ syntax, it should be relatively easy to customise StP to produce ODL. This step, however, has not yet been done.

  1. Conclusions

With the exception of Classify/DB, there is no CASE tool that currently supports the generation of ODL directly. Experience from NA45 and GEANT-4 suggests that this is not a major impediment to producing good persistent object models, although, as market penetration of ODBMS products increases, we would expect to see ODL generation directly supported in future releases of the major tools. As the existing output filters have demonstrated, there is no conceptual reason why this should be impossible, or even difficult. It is our recommendation that direct support for ODL generation and reverse engineering be raised as a requirement with the appropriate vendors.

  1. The Impact of an ODBMS on Object Granularity

Although the ODMG standard permits implementations where the persistent base class, d_Object, is dummy, existing implementations typically incur a fixed overhead. In the case of Objectivity/DB, this overhead is currently 14 bytes. In other words, an object that contained a single float would increase in size by the equivalent of 3.5 additional floats as a result of becoming persistence-capable. In addition, associations involve a storage overhead. As a rule of thumb, objects that are less than about 10 words in size should not be made persistent directly - smaller objects can be made persistent by containment in a persistent object.

Reasons for choosing separate, rather than contained, objects include database support for:

Both of these scenarios, i.e. individual objects and containers of small objects, have been tested using the GEANT-4 "hits" object, as described below.

  1. End-User Issues

The use of an ODBMS for storing and managing physics data clearly has implications for end-users. We describe here the main issues related to running applications that use an ODBMS for persistence. Those related to using an ODBMS as input to the analysis stage are covered further under milestone 3.

  1. Access to the Database Run-time Environment

It is clear that, no matter what system is used for object persistency, access to the run-time environment is required. In the case of Objectivity/DB, access to a single library, available in static, shareable and debug versions, is required. As mentioned earlier in this report, in static form, this library is approximately 40% of the size of the "PACKLIB" component of the CERN Program Library, and slightly less than twice the size of the "KERNLIB" component. The Objectivity/DB server is more then 5 times smaller than the existing ZEBRA-server, but is in any case not required to access data on local or NFS-served disks.

  1. Access to Database Catalogue

In addition to the database run-time library, persistent applications need to access the database catalogue, which contains the location of the various physical databases that make up a given federated database, plus also the schema, i.e. class definitions, of the objects that are stored in the database. The federated database catalogue is automatically replicated by the database system, so that it is not necessary to access a single, central server. Subsets of the catalogue can also be extracted and stored on mobile computers, e.g. lap-tops, although clearly these catalogue subsets can only be automatically updated with new schema and database information when the host on which they reside is connected to the network.

  1. Summary

The ODBMS that is currently being used for prototyping within the RD45 collaboration, namely Objectivity/DB, imposes minimal and acceptable restrictions on the run-time environment. Without, for example, embedding the database schema and/or catalogue location into persistent applications, which are clearly highly undesirable strategies, it would not be possible to reduce further these restrictions.

  1. Requested Enhancements

Based upon the work for this milestone, the following enhancements to Objectivity/DB have been requested:

  1. Conclusions from Milestone 1

The use of an ODBMS to provide object persistence for HEP applications implies minimal changes to existing applications, and these changes can be further reduced by the provision of a small layer of software. A prototype version of such a layer is under development by the RD45 collaboration, and will eventually be made available through the LHC++ framework.

With the exception that it is clearly necessary to have consistent object models - both transient and persistent, an ODBMS imposes no restrictions on the object model. The impact on physical data organisation is limited to performance - optimal data clustering will reduce redundant I/Os and result in improved performance. Similarly, the storage overhead imposed by ODBMSs means that very small objects should be avoided. However, this overhead is smaller than for existing, Fortran-based systems and is hence not a new constraint.

With the exception of class libraries providing collections or containers, 3rd-party class libraries can be used freely with applications that use an ODBMS for persistence. In the case of collection/container libraries, changes must be made to avoid storing raw C++ pointers in such collections. However, ODBMS-capable versions of the principle class libraries involved are available and the ODMG is following the C++ standard in this respect.








  1. Milestone 2 - Object Database Features

We have evaluated the support in existing products for schema evolution, object versioning and data replication and analysed the usefulness of these features in solving data management problems typical of HEP event data. It is important to point out that none of these features are currently defined by the ODMG standard. Schema evolution and object versioning, including support for configurations, have both been on the list of future enhancements since V1.2 was finalised, and it has been requested that replication be added to the list for post-V2.0 developments.

As these features are not yet standardised, any reliance on these capabilities currently implies the use of vendor-specific enhancements. We have, therefore, chosen to compare the implementation of these features in at least two products.

The work on this milestone is covered in more detail in [5].

  1. Schema Evolution

Schema evolution refers to the process of changing the definition of a class - its schema - and typically also to the ability to migrate objects created using previous versions of the schema to the new representation. This latter capability is known as object (instance) migration.

Schema evolution operations vary from simple changes, such as adding or renaming a data member in a class definition, to complex operations like adding a non-leaf base class or changing the class of origin of a data member. Schema evolution operations may require modifications to the affected objects. There are several ways, which can be used in combination, that the affected objects can be modified:

In the case of deferred mode conversion, no special steps need to be taken by the user. The first access to the objects in question will trigger the conversion, although only update transactions will result in these changes being stored persistently in the database.

Examples of immediate and on-demand mode conversion are shown below.


IMMEDIATE MODE
ON-DEMAND MODE
ooTrans trans ;

ooHandle(ooFDObj) fdH ;

trans.upgrade() ;

trans.start() ;

fdH.open("TstFB", oocUpdate) ;

fdH.upgradeObjects() ;

trans.commit() ;

ooTrans trans ;

// declare fdH, dbH, contH

trans.start() ;

fdH.open("TstFD", oocUpdate) ;

dbH.open(fdH, "tstDB", oocUpdate) ;

contH.open(dbH, "tstC", oocUpdate) ; contH.convertObjects() ;

// or dbH.convertObjects() ;

trans.commit() ;

Additionally, for every changed class a conversion function can be registered extending standard object conversion according to user requirements.

Finally, schema evolution operations may require that applications are rebuilt in order that the changed objects can be seen.

  1. Areas of Potential Use in HEP

Support for schema evolution is typically not provided by existing HEP data management packages, but is clearly required. Schema are subject to change throughout the lifetime of an experiment, and a flexible mechanism that permits changes to be made not only to the schema, but also to the affected objects, much be provided. Given the volumes of data involved, flexibility in object instance migration is also mandatory - it would be inconceivable to migrate the entire event store synchronously each time a schema change was made.

We consider support for schema evolution to be a mandatory requirement that is not tied to a specific area - it must be supported across the full range of applications, ranging from production to end-user.

  1. Prototype Investigations

A number of prototypes have been built to help understand the impact of schema evolution on persistently stored objects, as well as on existing applications. A major goal in this work was to identify scenarios that permitted existing applications to continue to access data without having to be rebuilt. Although, by adhering to the guidelines listed below, this can sometimes be achieved, it is not always possible.

To minimise the impact on existing applications, one should consider the following:

  1. Instead of adding a new member directly, the new member can be stored in an additional (new) class and accessed through a non-inline association. This increases the number of objects and access time to the new member, which must be compared with the benefits of avoiding changes to existing applications.
  2. In case of applications which require only read access to affected objects, conversion can be performed "on the fly". Each time that an affected object is accessed, it is automatically and transparently converted, without the changes being stored in the database. This approach also influences access performance, but again avoids making changes to existing applications.

To perform object conversion effectively, especially when a large databases are involved, one should use a combination of deferred and on-demand conversions: deferred mode could be applied to convert objects as they are accessed until it is convenient to finish converting all objects within a part of a database, file, or the entire database using the on-demand mode. An immediate conversion is efficient for performing conversion on a small subset of the data.

  1. Requested Enhancements for Schema Evolution

Based upon our experience with schema evolution support in the current release of Objectivity/DB, a number of enhancement requests have been made. These include:

  1. Object Versioning

Object versioning is the capability to manage more than one version of the same logical entity. This implies objects created using the same schema and not different versions of the same schema. Support for versions is often very similar to that offered by code management systems, e.g. CVS, for revision control, including both branch and linear versioning. In ODBMS systems, each version of an object is stored separately, although typically using the same clustering strategy. It is possible to define a default version, in which case one object per "genealogy" (set of all versions of an object) is marked as such and returned unless an specific version is requested. Versioning features include the possibility of accessing all versions as a single object; as well as to merge multiple versions into one. Navigation from a given version in a genealogy to any other version is trivial. Versioning can be customised, simply by inheriting from the class that implements versioning, and customising as appropriate.

  1. Areas of Potential Use in HEP

There are numerous areas of HEP data management to which the concept of versions could be applied. These include:

Although all of these areas merit further investigation, manpower constraints have limited us to an initial investigation of user-level versioning, i.e. management of selections in an analysis environment.

  1. Prototype Investigations

A prototype application has been built to investigate the usefulness of versioning features for managing event selections. The prototype supports versioning of both selections and their associated cuts, which are versioned separately. The user must first specify a set of cuts (predicate), including the names and types of the individual cuts, and is then provided with a powerful and convenient interface for managing an essentially unlimited number of versions, including retrieval of the full history of the cuts, the possibility to name selections and/or predicates, the ability to change the default version of a selection or predicate and to set the values to be used in the individual cuts. Cuts may be combined using logical and or or. Event collections built using such selections may have associations between them and the selections and cuts used to build the collection - an important feature that helps ensure reproducibility.

  1. Requested Enhancements for Object Versioning

The following enhancements to the support for object versioning in Objectivity/DB have been requested:

  1. Data Replication

Replication refers to the case when more than one "copy" of an object or set of objects is maintained by the system - typically in different location and is a technique that is important for both reliability and performance. Users may continue to access data from a local image ("copy") of a database, even if some parts of the network are down. Certain implementations provide a "voting" mechanism, which guarantees that data integrity is maintained - only the partition with the majority of votes may continue to modify data in a replicated database. Users in those partitions which have a minority of votes may still read data from local images, or may choose to wait for the connection to be restored.

  1. Areas of Potential Use in HEP

Replication is potentially useful in HEP for a number of areas:

  1. To increase reliability, by replicating critical resources,
  2. To improve performance, by replicating frequently accessed data, perhaps across a wide-area network,
  3. To distribute appropriate sub-samples of the data to remote sites, whether by network or demountable media,
  4. To collect simulated data from multiple sites, for possible redistribution as above.

Local area replication can be achieved by other means, e.g. by mirroring disks or using RAID systems. However, database replication is more flexible in that it permits data to be replicated to different physical servers, which may even be distributed in the wide area. It may nevertheless be combined with e.g. disk mirroring to give even better performance and/or resilience to hardware failure.

Data distribution and collection has traditionally been performed using magnetic tape. Although affordable network bandwidth may require that at least a portion of the data continues to be distributed using tape or other media, considerable savings in the area of book-keeping and general data management can be made if this distribution is performed under the control of the database.

  1. Prototype Investigations

Data replication is a relatively new feature in ODBMSs and is still far from mature. Objectivity/DB offers data replication as from V4.0 of the product, which was only released in early 1997. We have participated in the field-test of this product, for NT systems only, since mid-96, and were able to make some preliminary tests of its functionality. These tests show that replication is indeed transparent to the user application and that the automatic fail-over from one "image" to another works. However, we were unable to perform more extensive tests, such as evaluation of wide area replication, including over both relatively slow and unreliable connections.

Nevertheless, we intend to pursue this area actively, and will start wide area tests as soon as possible. Tests are planned between CERN, KEK, Krakow and LBL.

  1. Requested Enhancements for Data Replication

Based upon our early evaluation of the replication support in Objectivity/DB, a number of enhancement requests have been raised. These are as follows:

  1. Conclusions from Milestone 2

All of the features described above are clearly important techniques for solving HEP data management problems. Their implementation in ODBMS products is relatively recent, and it is clear that a number of important enhancements need to be made to existing systems in these areas.

With the possible exception of object versioning, which can be managed at the application level, it is our opinion that these features should form part of the list of mandatory requirements that must be satisfied by a HEP persistent object manager. Although versioning could be handled by the application, it would clearly be an advantage if this too was directly support by the system.







  1. Milestone 3 - Performance Comparison with PAW+Ntuples

The work on this milestone has been divided into two parts: strict performance measurements, using both PAW+Ntuples and the raw performance of the underlying storage systems for reference, and an evaluation of the effectiveness of using an ODBMS+MSS as input to physics analysis.

As no appropriate MSS has been available at CERN for these tests, we report below on the use of an ODBMS with secondary (disk) storage only, although we have analysed the impact of using different storage strategies, including striping and parallel filesystems for performance.

The performance and effectiveness evaluations described below have been performed using NA45 data - both a standard NA45 Ntuple and the corresponding data stored in an ODBMS.

An Objectivity performance expert is scheduled to come to CERN for two weeks in March 1997, during which time a workshop, focussing on performance and availability will be held. We intend to use the results of this workshop to finalise the supporting document [6], which will be completed around the end of March. We report below on the results that were available at the time that this document was submitted to the LCB - more recent results will be presented to the LCB open session in March.

  1. Current Practice

The current Physics Analysis Workstation system (PAW) [23] requires that the input data be converted to a special format, namely HBOOK [21] Ntuples. Two types of Ntuple are supported - the original "row-wise" Ntuple, which consisted of a table of single-precision floating point numbers, and the more recent "column-wise" Ntuple (CWN). CWNs support all Fortran data types, and allow variable length blocks to be used. In these performance comparisons, we have tested both column and row-wise Ntuples, and also the approximate equivalent using an ODBMS, namely storing each row as a separate tag object (RWN), or each attribute of a tag object separately (CWN).

In principle, one of the main performance advantages of CWNs over RWNs is that only those columns that are referenced by a given query are read in, offering corresponding performance improvements when only a few columns are needed. Studies have shown that many queries only use a small fraction - say 20% - of the columns present in a given Ntuple, and hence significant gains are to be expected from using such a strategy. However, this may merely reflect one of the known weaknesses of the current Ntuples. Creating an Ntuple is typically a lengthy process, requiring an ad-hoc batch job which processes a large subset of the data. If it is discovered that more information than is present in the Ntuple is required, or if one or more columns needs to be recalculated, then this lengthy process must be repeated. Hence, the observation that only 20% of the columns are referenced in typical queries may simply reflect indicate that users are trying to minimise the number of times that the Ntuple must be recreated, and store extra information "just in case".

In addition, the Ntuple stores both the information that is used for queries and the information that will be analysed - in principle, the selection of events can be based upon a small subset of the event characteristics and should not force a common clustering strategy for the data used for selection and the data that is to be e.g. histogrammed. Using an Ntuple from NA45 for comparison, we examine below the benefits of separating the data used for queries from that needed for analysis.

It is our opinion that the analysis framework should not impose a particular data model or format, and that converting data to such a format is a major inconvenience which should be avoided in future systems. This is particularly important given the volume of data involved in an LHC experiment - redundant copies must be avoided at all costs.

  1. ODBMS Capabilities

In principle, an ODMG-compliant ODBMS supports the full C++ object model. Whilst this is essentially true, there are a number of important considerations that need to be born in mind, if an efficient physical model is to be implemented. That having been said, any C++ object model can be implemented using an ODBMS, with the proviso that associations between objects are implemented using ODMG smart-pointer classes. Thus, the logical object model is unconstrained, whilst, for performance reasons, some basic guidelines, such as those outlined in section 8.5 on page 12, should be followed.

  1. ODBMS versus Ntuples

Unlike HBOOK CWNs, ODBMSs support significantly more general and/or complex data models. Although some minor constraints are likely to be imposed by performance considerations, such as avoiding the use of very small objects (less than 10 words or so), an ODBMS provides access to all of the data of an experiment, and not just that subset that has been extracted into a format dictated by the analysis tool.

Although it would theoretically possible to encode enough navigational information into a CWN to permit an application to reference the complete event data from such an Ntuple, this is not supported by the current analysis tools, such as today's de-facto standard, namely PAW. On the other hand, this is directly supported by an ODBMS - transparent navigation from one element of the data, e.g. the event tag, to another, e.g. the raw data or "analysis objects", is provided by the ODBMS software itself.

For performance reasons, efficient data clustering is always likely to be important. However, it would be perfectly feasible to recluster a small subsample of the data - sufficient to develop the necessary cuts etc., and then run a "production analysis" on the full, unclustered, dataset. This is an inherently more scalable solution than one that forces all data to be converted into a special format, i.e. copied, which becomes unworkable when very large volumes of data, such as those expected at the LHC, are involved.

  1. Raw Performance Measurements

Ideally, the performance overhead introduced by the ODBMS software should be less than a few per cent. In other words, one should be able to read and write data at approximately the speed of the underlying storage system, although this will typically be an upper limit for best-case scenarios. As is the case with all existing ODBMS products, Objectivity/DB uses the standard filesystem in which to store databases - each database appears to the operating system as a normal file. This means that standard techniques, such as parallel filesystems, file caching etc., should translate directly into improved database performance. The maximum throughput obtainable would thus depend only on the hardware resources made available.

  1. Read/Write Performance of Test-Bed System

Read/write performance of up to 100MB/second has been measured on a Digital Alpha 4100 server. These figures exceed the initial goal of 90MB/second. To achieve these results, the following configuration was used:

  1. Comparisons with PAW and Ntuples

Below, we describe performance comparisons between PAW and Ntuples and simple TagDB implementations. The approach has been to perform one to one comparisons between Ntuples and TagDB implementations using Objectivity/DB. To this end, we have used a standard NA45/CERES Ntuple from their 1995 production data. The same analysis has been performed using both PAW+Ntuples and Objectivity/DB, under a variety of different cache conditions. The benchmark environment used in both cases was identical, using the following:

In all cases, we have measured the both first-pass ("cold") and second-pass ("hot") cases. At the time of writing, we have more confidence in the hot measurements - issues such as the filesystem cache can strongly influence these performance measurements and, short of rebooting a CS-2 node between each measurement, it is hard to be certain that no caching is taking place for the first-pass measurements.

The NA45 Ntuple used contains 302 columns (all floats) and some 21K rows, giving a total size of around 25MB. This is seen to be a somewhat typical size for Ntuples today, although they are often combined into larger logical units using the PAW chaining facility.

The main time during the PAW-based analysis is spent in a single command, namely:

ntuple/loop [ntuple] ana.f

ana.f is a single, compiled, Fortran function that performs all section cuts and histogramming. In the current analysis, some 15 columns are typically used to make selections, although this is expected to rise so that eventually 80 columns are used. Both the original Fortran version of this function and the equivalent C++ code are reproduced in an appendix of the milestone 3 document [6].

In the case of the TagDB, the selection code is as follows:


tagItr.scan(tagCont,oocRead);



Timer t("simple scan");


long total = 0, matched = 0;

t.Start();

while(tagItr.next()) {

total++;

if (tagItr->Match()) //the Match function

//accesses the attributes

//used for histogramming

matched++;

}


t.Stop();


In the tables below, the time shown is that spent on performing the event selection, including the time to access the attributes used in the selection and those attributes used for the histogramming. The time spent in filling and displaying histograms has not been measured, as this is independent of the database.

  1. Full Tag Comparisons

The first tests were based upon an implementation whereby each row in the NA45 Ntuple was converted to a separate "tag" object, i.e. an object with one attribute corresponding to each column in the Ntuple. This means that no traversals are required to access additional objects to perform the query or to fill the histograms. All the tags were stored in a single container in an Objectivity/DB database, and clustered according to insertion time. A compiled, user-written selection function was used in both cases. The time taken to compile and load these functions, and the time taken to fill the histograms, has been subtracted from the values shown.
TimeComments
PAW + RWN11.3s cold

2.5s hot

First pass - 0% cache efficiency

Second pass - 100% cache efficiency

PAW + CWN16.4s cold

2.6s hot

Converted using htonew
TagDB6.2s cold

1.4s hot

Comparison of NA45 Ntuple with TagDB Implementation

At the time of writing, the reasons why the column-wise Ntuples show worse performance in the first-pass case are not fully understood, but are being investigated in collaboration with the PAW support team in IT/ASD. Although, in this particular case, the TagDB implemention based upon Objectivity/DB shows better performance, we interpret these results as showing comparable performance, pending further, more detailed, investigations. It is fair to say, however, that these first, unoptimised results, are encouraging, particularly when one considers the significant amount of effort that has been spent on optimising the performance of PAW.

  1. Reduced Tag

A slightly more complex implementation than the case described above is one where the subset of the data used for the selection is stored separately from that used in the analysis of the selected events. In this case, two objects are used - one containing the 15 "columns" used in the selections and the other containing the remaining data. These objects were clustered separately and were stored in different physical databases in the same federation. Only in the case that an event is selected is a traversal made from the reduced tag object to the remaining event data.
Tag ImplementationTime Comments
Full Tag6.2s cold

1.4s hot

No traversal
Reduced Tag - 0% selectivity1.3s cold

1.0s hot

No traversal
Reduced Tag - 2% selectivity5.5s cold

1.2s hot

One traversal per selected tag
Reduced Tag - 20% selectivity7.5s cold

1.5s hot

One traversal per selected tag

Comparison of Different TagDB Implementations

In the above table, the performance of the reduced tag, in the case of low selectivity, improves with respect to that of the full tag simply due to the decrease in I/O that is required. As no events are selected, only the tag objects are read in. As the selectivity rises, we see the effect of object clustering. The objects correspond to a selected event may well have been brought into the client cache as a result of a previous I/O. The potential benefit is very dependent on the object size and page size. Tests using different object and page sizes have not yet been made, but will be included in the supporting document [6].

  1. Queries Using Indices

Further optimisations can be achieved by introducing indices on the tag objects, either for the full or reduced tags. Objectivity/DB uses a B-tree with short OIDs, which are 4 bytes rather than the usual 8 bytes. This implies that they can only refer to objects within the same container, although Objectivity/DB V4 also provides federation-wide indices, based upon normal OIDs.

The storage overhead of a single index entry is the size of the attribute together with a 4 byte overhead. Thus, a single 8KB database page can store 1000 object references indexed on a single 32 bit field.

Unfortunately, it was not possible to include performance measurements using indices in this report. However, detailed measurements can be found in the supporting document [6], which will be available via the Web in draft form from early March and submitted to the LCB in final form by the end of March 1997.

  1. The Effectiveness of Using an ODBMS

In principle, storing all of the data of a given experiment under a consistent scheme offers significant benefits at the analysis stage. Today's techniques of data reduction, which have evolved over many years, have been driven by necessity. The cost of random access storage relative to sequential media (tape) was so high that successive data reductions were imperative. Even at the startup of LEP, the idea of providing a mere 100GB of staging space per experiment on the central mainframe was simply unaffordable.

Today, the situation has changed dramatically, and trends suggest that this will continue into the future. Even tape media now offer some degree of random access - typically fast block addressing - and the amount of disk space that can be afforded has increased enormously.

Thus, it is important that the next generation of experiments are not constrained by the technological limits that inhibited previous ones. Nowhere is this more true than in the area of data management.

Today's experiments use a wide variety of data formats for rawdata, DSTs, Ntuples, calibration, meta-data and so forth. This gives rise to extreme difficulty in navigating e.g. from a histogram to the rawdata of the events corresponding to the entries in the histogram. Despite many man-years of effort, both centrally and within the experiments, this is still largely an unsolved problem and results in considerable inefficiency in extracting physics results from the data - in other words, a waste of extremely valuable resources.

  1. Conclusions

An ODBMS approach offers the possibility of revolutionising our approach to physics analysis - offering not only more efficient access to the data, but also permitting more complicated analyses to take place. Initial tests show that comparable performance to today's systems can be achieved by even naïve approaches, and that separation of data into the part required for selection and that used for analysis (i.e. reduced tag) offers performance improvements for sufficiently selective queries. Only minor performance enhancements have been made so far and it is expected that significant performance optimisations can be made with time. The ease of access and transparency to all of the event data has been demonstrated, and the cost of traversing associations to the event objects shown to be small. The above results have been based on initial measurements and only preliminary interpretations can be made at this time. More complete information will be made available in the supporting document for milestone 3 [6], presented at the LCB open session in March 1997 and discussed at the RD45 workshop to be held at CERN from March 12-14.

























  1. Risk Analysis

Extrapolating from the current prototyping activities, which are at a scale of GB to hundreds of GB, to a production system capable of scaling to 100PB - an increase of some 6 orders of magnitude - requires detailed and careful analysis of the risks involved. We present here the main risks that we have identified, and outline ways that these issues may be better understood in the short-term and possible fall-back scenarios.

Details of investigations concerning the limits and issues listed below can be found in [6].

  1. Support for Multiple Federations

The current RD45 model is that each experiment would use a single logical database in which all of their data would be stored. Such a single logical view is implemented in Objectivity/DB as a so-called federated database, consisting of multiple physical databases, which may be stored on different servers across the network.

The databases of each experiment are expected to be independent, and there would thus be no need for a single application to access multiple logical databases, e.g. those of ATLAS and CMS, concurrently.

Indeed, neither the current version of the ODMG standard nor Objectivity/DB support simultaneous access to multiple (federated) databases.

Due to the current architecture of Objectivity/DB, multiple federations may be required as part of a fall-back solution, e.g. if the extended object identifier (OID) described in section 11.2 on page 28 are not implemented, in which case a separate federation would be required for each year of data taking.

One could also consider the use of multiple federations to handle user data. However, it is our conclusion that multiple federations, and heterogeneous federations in particular, should be avoided, and that alternative approaches to these problems be investigated.

  1. Number of Databases per Federation

In the Objectivity/DB architecture, a single logical database is composed of many physical databases. Currently, each physical database is mapped to a file and the logical database is termed a federated database.

The current RD45 model, using multiple physical databases limited to some 100GB, requires that we will use many thousands of physical databases, spread across multiple servers. Indeed, the current 64-bit OID used by Objectivity/DB implies a maximum size of a federated database of only 6.5PB (216 - 1 databases of 100GB each).

To understand whether such large numbers of databases can really be handled by the current architecture, we have used the existing test-bed to build a federation containing the maximum number of databases, but have limited their size to 1MB. This has allowed us to understand issues relating to the number of databases, without requiring a massive storage system in which to place them.

We were able to create a federation containing 13,000 databases, which would limit the federation to around 1PB, if each database were allowed to grow to 100GB, as foreseen. However, we observed some performance problems when creating many (more than 500) databases in the same process, or when adding new databases to an already large federation. These issues are being pursued with Objectivity.

As it would seem unwise to plan on a physical database size of much more than 100GB, there is a clear requirement to increase the current 64-bit OID, or at least change the mapping from logical to physical model to circumvent this problem.

This has been raised as a requirement with Objectivity.

  1. Number of Containers per DB, Size of Containers

The number of containers per physical database is similarly limited to 32K (215 - 1) . Although we have no obvious requirement for a greater number of containers per database, we have nevertheless tested building a database with such a large number of containers. We were able to reach this limit without problems. Attempting to exceed the limit results in an error, as expected.

The current Objectivity/DB architecture limits the maximum container size to a multiple of the database page size. Using database pages of 8K, the maximum container size is 229 bytes, or 0.5GB. Using the maximum database page size, containers are limited to 4GB. As a physical limitation, this is not considered to be a significant problem, although there is a clear need for logical containers, which group together multiple physical containers. A prototype of such logical containers has been built, although it currently lacks support for appropriate iterators, which can iterate over the entire logical container.

  1. Navigation Across Multiple Containers and Databases

Objectivity/DB permits associations to be established between different objects, regardless of where there are stored in the federation. To test cross-container and cross-DB navigation, we have built a number of prototypes, varying the physical implementation from a single container in a single physical database to multiple containers in multiple databases distributed across many servers.

  1. Very Large Numbers of Associations

Some object models, such as the current prototype for the ALICE raw data, require very large numbers of associations. However, in the case of individual events, we expect that the number of associations that will be required will be of the order of 10-100, or at the most 1000. The theoretical limit on numbers of associations in Objectivity/DB is 232, due to their implementation based on VArrays, and thus this is not considered a risk area.

In tests, we have been able to build up to 5 million associations for a given object without problems. As the current implementation is based upon VArrays, the actual limit depends on resources on the database client. The usage of a "paged-VArray", which only loaded the required pages into the client memory, would circumvent this problem.

  1. Very Large Collections

Some physics channels at the LHC are estimated to include very large numbers of events - perhaps 109 or even more. It is highly unlikely that collections are a viable approach for managing such large numbers of objects, and an approach based on containment, i.e. where all events corresponding to a certain channel would be stored in a given (set of) containers and/or databases, is more appropriate.

Solutions to this problem include "collections of collections", e.g. where a given physics channel is divided into multiple collections, each corresponding to a data taking period, or direct support from the database, in a manner that does not require that the entire collection is loaded into the client cache.

Collections are implemented using a VArray of object references, and hence the limits and comments described in section 11.5 above are also applicable here.

  1. Re-clustering and the Effect on Existing Collections

The ODMG does not define the implementation of an OID, and various different strategies by the vendors. That taken by Objectivity is to use an OID that has a direct physical mapping. This has significant advantages in terms of performance over logical OID implementations, but implies that object re-clustering is likely to render existing collections invalid. If bi-directional associations are used between the collections and the objects, then re-clustering can be performed at any time without rendering these collections invalid. However, if uni-directional associations are used, which is expected to be the case for user collections, then re-clustering will render such collections invalid.

A number of scenarios exist which minimize, or even hide, the effect of re-clustering on user collections. For example, a validity stamp could be used to determine automatically whether collections were still valid, and even update the collections if required. However, it is clear that further investigation is required to fully understand the issues involved.

  1. Handling Multiple Containers and/or Databases

Independent of the current limits on container and database sizes, there is a clear requirement for a facility whereby containers and databases can be limited to a given size, with new containers/databases created automatically as required. In addition, facilities to iterate over the multiple containers/databases must be provided. Such a facility could be provided either by the database vendor or by HEP-specific application code. The preferred solution would be for the vendor to provide such libraries, although it is highly unlikely that such implementation-specific areas will ever be standardised, and hence the usual caveats concerning vendor-specific features apply.

  1. Database Administration Issues

Deploying a fully-distributed database system will clearly involve a certain amount of administration. Many issues need to be better understood, including the real tolerance of the system to prolonged network failures, the propagation of database catalogue and schema changes over faulty networks, and the possibility of applying "rolling-upgrades", i.e. upgrading the database software on the various servers in turn, whilst keeping the database available to users.

Although initial tests can be made with test configurations at CERN, much more exhaustive studies will need to be made in the wide area with remote sites, requiring careful coordination. A number of projects concerning regional centres and wide-area replication are currently being discussed, and it is expected that these issues will be further researched in joint collaboration between RD45 and these projects.

It is clear that data management for multiple PB of data in the fully distributed environment will always involve a non-negligible amount of overhead. However, it is clear that this overhead must be kept as low as possible, preferably requiring less manpower, whilst providing considerably more functionality, than today's ad-hoc solutions.

Further investigations will be made in this area, particularly related to issues concerned with wide-area distribution.

  1. Alternative ODBMS Products

Although the current RD45 prototyping activities are being performed using Objectivity/DB, great care is taken to avoid using vendor-specific features. Certain important features, such as schema evolution, are not part of the current ODMG standard and we are therefore working with the ODMG to extend the standard to ensure that it is sufficiently complete to satisfy our requirements.

Some features, including DBA-related functionality, are unlikely ever to become standardised, and hence migration from one product to another will always require work. Nevertheless, by adhering closely to the ODMG standard, we are able to protect ourselves as much as possible. In addition to the portability of application code between different vendors, the ODMG also provides an interchange format, so that the associated data can also be moved. However, migrating many TB or PB of data will never be a task that can be undertaken lightly.

A recent IDC report on the ODMG estimates that the ODBMS market is currently worth $115M per year and growing at 24% per annum. Object Design International (ODI), one of the two ODBMS vendors that went public in 1996, announced total earnings for the 3-month period that ended in September 1996, of nearly $10M. This growth is expected to accelerate such that the total market is estimated to reach $1.6B by the year 2000. Like other analyses, the IDC report predicts that the Web and the Java binding in particular will be important markets for ODBMSs.

The ODMG standard is widely accepted as being "the" standard for ODBMSs, and all major products already offer partial conformance. We can confidently expect that new products in this market will conform to this standard, and hence that standards-conforming ODBMS products will continue to be marketed for the foreseeable future.

Today, the market for very large databases is small (but non-zero), although this is predicted by many analysts to grow considerably in the coming years. We are aware of a number of projects which call for databases of several hundred GB to a few TB in the immediate future, scaling to tens to hundreds of TB by the end of the decade, and believe that many more such project exist.

Several ODBMS products, including Objectivity/DB and Versant, are currently targetting the telecoms market, which requires distributed databases, scalability and performance. It is probably safe to say that a product capable of satisfying the requirements of the telecoms industry will continue to exist. This market is sufficiently large as to be able to sustain at least one, if not both, vendors, and hence that a product capable of satisfying at least a minimum set of HEP requirements will continue to exist.

Nevertheless, fallback strategies need to be considered, including the use of a "commodity" ODBMS, should an appropriate high-end system cease to be available.

  1. Alternative Mass Storage Systems

The only known MSS that - even theoretically - offers the scalability and functionality required for LHC is HPSS. The absence of alternatives is indicative of the fact that this is very much a niche market. Other MSS products exist, but are typically targetted at much more modest volumes of data, and almost certainly could not satisfy our requirements.

The US National Laboratories, such as Lawrence Livermore, Los Alamos, Sandia, etc. are all involved in the HPSS consortium and are all expected to use HPSS. Several HEP sites (CERN, DESY, FNAL, IN2P3, etc.) are considering or planning to use HPSS, which could provide the critical mass needed to ensure HPSS's survival.

It is clear that the effort to produce a system as powerful as HPSS is simply not available within HEP, and so the absence of a suitable product in this area would be a major inconvenience. However, this is also true for the US National Labs - by adopting a common strategy, there is much more chance that such a strategy will survive than if we pursued separate paths. In other words, should HPSS fail, the wisest strategy would be to combine forces with other sites facing similar problems to build, or preferably commission, a replacement system. The design phase for such a system could be considerably shortened by basing the system upon the IEEE Reference Model for Mass Storage Systems, and even using the standard APIs that are currently being developed.

As with the ODBMS, a fallback solution needs to be considered. Unlike the ODBMS case, no clear alternative currently exists. Commodity products typically target the backup market, and today have no clear way of scaling sufficiently to meet the requirements of LHC. It is possible, however, that as backup volumes increase, it will be feasible to use a small number of such systems, e.g. one per year per experiment, to manage LHC event data volumes.

This is clearly an area of risk which needs to be studied further.

  1. Conclusions

Many of the risk factors associated with the current strategy can be both identified and tested today. Work over the next months will allow us to better understand the precise risks involved, and develop work-arounds and/or alternative strategies as appropriate.

Initial investigations of the limits and scalability of the current Objectivity/DB architecture lead us to the following requirements:

Further work needs to be done in the area of distributed database management, and the ODBMS and MSS markets need to be followed closely, so that alternative strategies can be developed in time should, for example, the HPSS project or the current ODBMS supplier fail.

  1. Use of Objectivity/DB in HEP and Related Disciplines

Over the past year, several projects have been started within HEP that use an ODBMS and Objectivity/DB in particular. In addition to the work at BaBar, mentioned in the previous status report, Objectivity/DB is now installed at DESY, for some prototyping activities on Zeus; KEK, for work related to the BELLE experiment and is under consideration at FNAL, for some studies related to the use of Objectivity/DB for run 2 physics data. More details of these activities can be found below.

In addition to the activities described in this report, there are several other prototypes at CERN using Objectivity/DB, most particularly in CMS, including test-beam and calibration database studies, as well as the CRISTAL project. Objectivity/DB is also the database system used by one of the EDMS systems currently under study at CERN.

  1. Collaboration with Other Projects

In addition to the activities described above, directly related to the LCRB milestones and referees' recommendations, RD45 has worked with the LHC experiments at CERN as well as experiments at other laboratories, on issues related to data management and object persistence. More details are given below.

  1. ALICE

In the context of the ALICE experiment, a first version of an object model describing the raw data has been developed. Persistent classes describing the raw data for the 7 main detectors have been defined, consistent with the typical raw event size of 40MB.

In addition, a pseudo-event generator has been built, which generates events according to this object model, but without physical content - the data members of the objects involved are simply numbers generated randomly within the defined limits for each quantity.

This generator has been used to test the feasibility and consistency of such a model, as well as to investigate various alternatives in the design. Finally, it has been used to test some of the ODBMS limits, corresponding to the "risk analysis" described in section 0 on page 27.

  1. ATLAS

Collaboration with ATLAS has increased during the last year, and a new ATLAS sub-group, which will work closely with RD45, has recently been set up. Amongst other activities, this group will study issues such as wide-area replication, by network or tape, of physics data. In addition, members of ATLAS are proposing using RD45-like solutions on CDF for run II of the Fermilab collider.

  1. CMS

There are a number of prototyping activities exploiting Objectivity/DB within the CMS collaboration. Perhaps the most significant is the plan to store some 50GB of test-beam data in an Objectivity/DB database during 1997, and use the elements of the LHC++ environment to analysis this data. In a possible future extension, this project may exploit the data replication option and HPSS interface of Objectivity/DB, but in the short term will probably be limited to disk storage and manual copying of database files from the test beam area to the computer centre.

This activity is clearly strongly related to the proposed milestones for 1997, listed in section 20 on page 41.

  1. CERES/NA45

Starting in late 1995, the NA45 collaboration completely redesigned their reconstruction and filtering software and re-implemented it in in C++. The total package consists of some 30K lines of code, and has been ported to use an ODBMS for persistence. The system has been used to write to a single logical store from multiple (16) processing nodes in parallel. So far, some 20GB of data have been stored. A reprocessing is planned which will store 60GB of data in the ODBMS.

More details concerning RD45 collaboration with NA45 can be found in [4] and [5].

  1. GEANT-4

Persistence for calorimeter and tracker "hits" objects has been introduced in GEANT-4 using Objectivity/DB.

Two different implementations have been tested:

In both cases, as can be seen from the tables below, the overhead introduced by making the objects persistent is very small. In the case of the calorimeter hits, two collections are created, of 19 and 17 objects respectively. The objects are accessed 100 times, as the energy deposition is accumulated. In the case of the tracker hits, collections of 1900 and 1700 objects are created. However, each object is accessed only once (at construction time). The tests were performed on the SP-2 at CERN and the times shown below are in seconds.

Calorimeter Hits Tracker Hits
TransientPersistent TransientPersistent
User time7.969.63 8.8013.09
Real time12.214.22 9.6326.33

Individual Persistent Objects

User time8.668.37 9.668.89
Real time10.9615.87 11.2814.41

Persistence by Containment in a VArray

The slightly better user time in the case of "persistence by containment" comes from improved optimisation in the persistent collection class, which is based upon the Objectivity-supplied VArray, whereas the transient case uses a Rogue Wave collection class. Further optimisations to the persistent versions are possible, for example, the performance of the "individual objects" implementation should improve if multiple persistent objects were created at the same time, rather than individually, as shown above.

The small overhead introduced by the database is striking, and can be compared with that incurred by storing ZEBRA objects in an RZ file. In the case of a very simple test, e.g. using a linear chain of 1900 banks, each containing 10 data words, the I/O overhead represents a small factor, i.e. the performance is several times worse in the persistent case, rather than a fractional increase, as is seen to be the case when using an ODBMS for persistence in the GEANT-4 prototype.

  1. AMY

The AMY experiment at Tristan, KEK, originally used a Fortran-based bank system, known as the Tristan Bank System (TBS). DST-level data has been converted from TBS format and stored in an Objectivity/DB database using a variety of different object models, and performance comparisons made of the different approaches as well as with the original TBS-based system.

  1. BaBar

The BaBar collaboration are currently planning to use an ODBMS both for calibration data and also for physics events. An evaluation of two commercial ODBMS products recommends the use of Objectivity/DB, based upon its superior performance and scalability characteristics in a HEP environment. Work is progressing on the design of an ODBMS-based event store.

  1. BELLE

Objectivity/DB is currently being evaluated at KEK for the BELLE collaboration. A system is being built up based on 7 28-node UltraSparc servers with nearly 4TB of disk space and 4 Sony tape robots attached. The system will use the Petaserve MSS from Sony, which is based on the Lachman Open Storage Manager (OSM) that is currently in use at DESY.

  1. ZEUS

The ZEUS experiment have built a prototype event directory based upon Objectivity/DB. The philosophy has been to follow an evolutionary approach - first to reproduce more or less the functionality provided by the existing, ADAMO [21]-based, event directories, but with more flexibility, and then to use an ODBMS for micro-DST-level data, as input to physics analysis. Using a sample of 106 events, corresponding to about 100MB of data, the prototype demonstrated about the same performance as the existing, highly-optimised, solution. Work on ODBMS-based event directories continues, hopefully leading to a production system in 1997.

  1. Standards Activities

In the context of RD45, CERN has associate membership of the Object Management Group (OMG) and is a reviewer member of the Object Database Management Group (ODMG). CERN is also represented in the IEEE Computer Society Executive Committee on Mass Storage, which is the body to which the various standards sub-groups report. During the past year, the only significant involvement of CERN has been with the ODMG, although a workshop focussing on the current theory and practice of high-end data management, to be held near CERN, is planned for 1997.

  1. ODMG-related Activities

Version 1.2 of the ODMG-93 standard, finalised in 1995, was published in early 1996. During the past year, the ODMG has concentrated on version 2.0 of the standard, for which CERN helped to set the priorities. This version of the standard should be finalised in the February/March 1997 timeframe, after which work will start on the next release of the standard. The first meeting after finishing V2.0 is scheduled for July 1997, to be organised by CERN.

The bindings defined by the ODMG are intended as portability bindings. That is, an application built on top of one ODMG-compliant database should port without source code changes to another compliant product. In addition to providing application portability, the ODMG have defined an interchange format, permitting data portability between the various conforming products.

In reality, the current standard is insufficient to satisfy all of our requirements - it does not, for example, include schema evolution, distributed databases, replication and so forth - and hence it is inevitable that current prototypes exploit vendor extensions. However, RD45 places strong emphasis on working within the ODMG to ensure that the standard is enhanced to minimise, and perhaps eliminate, the need for reliance on such features. This work will inevitably span numerous updates to the standard and so cannot be considered a short-term goal.

Important new features expected in V2.0 of the standard include the data interchange format mentioned above, access to schema meta-objects and an ORB adaptor.

Post V2.0 options for the ODMG include merging with the OMG, which would decrease the control that the ODBMS vendors have over the direction of the standard, but giving a corresponding increase in the amount of user participation that would be possible.

  1. Objectivity/DB Workshops

A number of workshops, focusing on Objectivity/DB, were held at CERN over the past year. The first two workshops, held in February and May 1996 respectively, were largely devoted to discussions of initial prototypes and modelling experiences. They were extremely useful in helping us to better understand the current Objectivity/DB product and future enhancements, and in deciding the implications of various implementation choices, such as object granularity.

The final workshop of 1996 focussed on the results of the work in meeting the current LCRB milestones, plans for performance and scalability measurements, the risk analysis described above, and discussions on the requested Objectivity/HPSS interface. In addition, presentations were made on requirements from ATLAS and CMS, prototyping activities in ALICE, AMS, ATLAS, CMS, GEANT-4, NA45 and ZEUS, and discussions of problems encountered in a number of these prototypes.

This workshop was attended by some 20-30 people and provided important feedback on the progress on the LCRB milestones, as well as high-lighting a number of areas where product enhancements are required.

These workshops have been attended by consultants and/or architects from Objectivity, and have proved extremely profitable. It is our intention to continue regular workshops as required. For 1997, three workshops are currently planned, two of which will be held at CERN:

  1. March 12-14, focussing on the results of milestone 3,
  2. May, at LBL or SLAC, focussing on the integration of Objectivity/DB and HPSS,
  3. November, focussing on the interim results of the milestones set at the March 97 review.
  1. Objectivity/DB User Meeting

The annual Objectivity/DB Developers' Conference was held in Santa Clara - close to Objectivity's headquarters in Mountain View - on April 26-27 1996. This meeting included sessions on new features of Objectivity/DB, including schema evolution, user data replication, the Java JDBC interface, Objectivity/DB-based Web servers, performance tuning etc., as well as presentations from the user community, such as RD45 and the Sloan Digital Sky Survey (SDSS). This meeting provided ample opportunity to discuss with other users of the product, as well as to meet the developers and support staff.

It is our recommendation that CERN participate regularly to these meetings, using them to provide feedback on CERN's (HEP's) requirements.

  1. ODBMS to MSS Coupling

The RD45 collaboration has identified a number of ways that an ODBMS could be coupled to an MSS. The most promising of these are:

  1. Integrating an ODBMS with an MSS at the Filesystem Level

There are currently two investigations of integrating Objectivity/DB at the filesystem level:

  1. Using the Open Storage Manager, from Computer Associates (ex-Lachman/Legent), with Sony tape robots, at KEK, Japan,
  2. Using Unitree, at the Institute of Nuclear Physics in Krakow, Poland.

Although neither of these Mass Storage Systems are under active consideration for production deployment at CERN, these activities offer a useful existence proof of a transparent interface between an ODBMS and MSS.

  1. Integrating Objectivity/DB with HPSS

A course on HPSS was held at CERN during October 1996, and was attended by an engineer from Objectivity. During this course, a powerful new mechanism whereby Objectivity/DB could be interfaced to HPSS was identified.

The Objectivity/DB server - a light-weight page server - uses basic I/O calls such as lseek(), read(), write(). The HPSS client API provides equivalents for these routines, e.g. hpss_read(). It would thus be possible to interface the Objectivity/DB server to HPSS without even making code changes - simply by providing jacket routines to the HPSS library - and relinking the server with the HPSS library. This would have the considerable advantage that client applications would remain the same. HPSS would be responsible for managing the disk space, and also the tertiary storage, and would move entire databases (bit-files) to/from tertiary storage as required.

A requirement for HPSS support has been raised with Objectivity, and there are plans for a proof-of-concept prototype by the time of CHEP '97, and a full product by the end of 1997.

  1. Conclusions

A loose-coupling between ODBMS and MSS, as described above, offers an extremely simple yet powerful way of extending disk-based object management solutions into the tertiary storage region, as required by LHC experiments and others.

During the coming year, the most promising of these techniques will be investigated further, both at CERN and outside.

  1. Other Database Developments

Many people predicted that the ODBMS market would take off during the past year. Arguably, this did indeed occur, although in a somewhat more modest fashion than foreseen. Two ODBMS companies, ODI and Versant, went public in July 1996 and others are expected to follow.

Although the relational vendors have largely ignored the ODBMS market, all, except IBM, established some relationship with an ODBMS company. ORACLE, which had marketed the Omniscience product as ORACLE-Lite since early 1996, took over the company in November.

Almost without exception, the ODBMS vendors have pre-announced Java bindings and put significant emphasis on the Web. It is predicted that both of these two areas will play a significant role in further developments of the ODBMS market.

Activity on various Internet newsgroups, e.g. comp.databases and comp.databases.object, has grown considerably, indicating that many more people are working with, and developing applications on, ODBMSs.

  1. Future Activities

The future activities of the RD45 project are driven almost exclusively by the needs of the LHC experiments. As part of the development of their CTPs, both ATLAS and CMS are developing a list of issues that require further study, which we expect to have a strong influence on the future milestones and activities of the project. Indeed, the proposed milestones, listed below, draw from the milestones from the Computing Model chapters of the CTPs.

In addition, we will continue to work with GEANT-4 collaboration, the NA45 experiment and other groups both at CERN and outside who are investigating the same or similar technology.

  1. Proposed Milestones for 1997-1998

We propose the following activities to be considered for the milestones for the third year of the RD45 project. These suggestions have been prepared in consultation with ALICE, ATLAS, CMS and NA45.


By mid-1998, a proof-of-concept MSS interface should also be demonstrated. In addition, it is expected that further investigations of the areas covered by the current milestones will be required. Examples include further performance investigations, a study of the feasibility of wide-area data replication, and so on.

  1. Conclusions

We have identified and described the impact of using an ODBMS on physics applications, the potential benefits of ODBMS features such as schema evolution, object versioning and data replication on HEP data management, made an evaluation of the effectiveness of using an ODBMS as input to physics analysis as compared with traditional techniques and made an analysis of the key risks involved in an ODBMS+MSS based physics event store. In addition, we have worked closely with the ATLAS and CMS Computing Model working groups and with other projects, both at CERN and outside, that are using or considering the use of an ODBMS for object persistency.

The use of an ODBMS+MSS for storing and managing event data is currently the baseline assumption of both ATLAS and CMS Computing Technical Proposals [24] [25], pending further investigations into performance and scalability, and is also being considered by a number of pre-LHC experiments at other laboratories. We will continue to work with these groups to identify and investigate the key issues and propose a strategy of gradually scaling from the current 100GB-1TB region to the 100TB region in the years before LHC data.

  1. Glossary

ADAMO - a system, developed in the ALEPH collaboration, based on the Entity-Relationship (ER) model.

ADSM - A storage management product from IBM

AFS - the Andrew (distributed) filesystem

CASE - Computer Aided Software Engineering

CORBA - the Common Object Request Broker Architecture, from the OMG

CORE - Centrally Operated Risc Environment

CWN - Column-wise Ntuple

CTP - Computing Technical Proposal

DFS - the OSF/DCE distributed filesystem, based upon AFS

DMIG - the Data Management Interface Group

EDMS - Engineering Data Management System

GB - 109 bytes

HPSS - High Performance Storage System - a high-end mass storage system developed by a consortium consisting of end-user sites and commercial companies

IEEE - the Institute of Electrical and Electronics Engineers

KB - 210 (1024) bytes - normally referred to as 103 bytes

LCB - LHC Computing Board

LCRB - LHC Computing Review Board

LIGHT - Life Cycle Global Hypertext

MB - 106 bytes

MSS - a Mass Storage System

NFS - the Network Filesystem, developed by Sun

ODBMS - an Object Database Management System

ODMG - the Object Database Management Group, a group of database vendors and users that develop standards of ODBMSs

OID - Object Identifier

OMG - the Object Management Group

OQL - the Object Query Language defined by the ODMG

ORB - an Object Request Broker

OSM - Open Storage Manager: a commercial MSS

PAW - the Physics Analysis Workstation

PETASERVE - an MSS based upon OSM

PB - 1015 bytes

RWN - Row-wise Ntuple

SHORE - Scalable Heterogeneous Object REpository

SQL - Standard Query Language: the language used for issuing queries against databases

SSSWG - the Storage System Standards Working Group

STL - the Standard Template Library: part of the draft C++ standard albeit in a modified form

TB - 1012 bytes

TOOLS.H++ - the current de-facto standard container/collection class library, now based on the STL

VLDB - Very Large Database

VLM - Very Large Memory

VMLDB - Very Many Large Databases

XBSA - the draft X/Open Backup Services Application Program Interface

  1. References
    1. RD45 - A Persistent Object Manager for HEP, LCRB Status Report, March 1996, CERN/LHCC 96-15
    2. RD45 - A Persistent Object Manager for HEP, LCB Status Report, March 1997, CERN/LHCC 97-6
    3. Object Databases and Mass Storage Systems: The Prognosis, the RD45 collaboration, CERN/LHCC 96-17
    4. Object Databases and their Impact on Storage-Related Aspects of HEP Computing, the RD45 collaboration
    5. Object Database Features and HEP Data Management, the RD45 collaboration
    6. Using and Object Database and Mass Storage System for Physics Analysis, the RD45 collaboration
    7. Where are Object Databases Heading? CERN/RD45/1996/4
    8. Why Objectivity/DB? CERN/RD45/1996/6
    9. Objectivity/DB Database Administration Issues. CERN/RD45/1996/7
    10. Object Data Management. R.G.G. Cattell, Addison Wesley, ISBN 0-201-54748-1
    11. DBMS Needs Assessment for Objects, Barry and Associates (release 3)
    12. The Object-Oriented Database System Manifesto M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik. In Proceedings of the First International Conference on Deductive and Object-Oriented Databases, pages 223-40, Kyoto, Japan, December 1989. Also appears in [17].
    13. Object Oriented Databases: Technology, Applications and Products. Bindu R. Rao, McGraw Hill, ISBN 0-07-051279-5
    14. Object Databases - The Essentials, Mary E. S. Loomis, Addison Wesley, ISBN 0-201-56341-X
    15. An Evaluation of Object-Oriented Database Developments, Frank Manola, GTE Laboratories Incorporated
    16. Modern Database Systems - The Object Model, Interoperability and Beyond, Won Kim, Addison Wesley, ISBN 0-201-59098-0
    17. Objets et Bases de Donnees - le SGBD O2, Michel Adiba, Christine Collet, Hermes, ISBN 2-86601-368-9
    18. Object Management Group. The Common Object Request Broker: Architecture and Specification, Revision 1.1, OMG TC Document 91.12.1, 1991.
    19. Object Management Group. Persistent Object Service Specification, Revision 1.0, OMG Document numbers 94-1-1 and 94-10-7.
    20. The Object Database Standard, ODMG-93, Edited by R.G.G.Cattell, ISBN 1-55860-302-6, Morgan Kaufmann.
    21. ADAMO Reference Manual, CERN ECP
    22. HBOOK - Statistical Analysis and Histogramming Package - CERN Program Library Long Writeup, Y250
    23. PAW - the Physics Analysis Workshop - CERN Program Library Long Writeup, Q121
    24. ATLAS Computing Technical Proposal, CERN/LHCC 96-43
    25. CMS Computing Technical Proposal, CERN/LHCC 96-45

Go to Top Go to the RD45 Homepage