Object Databases and Petabyte Storage – Dreams or Reality?

 

Dirk Düllmann, Jamie Shiers

CERN

1211 Geneva 23

Switzerland

Abstract

The European Laboratory for Particle Physics (CERN) is located on the border between France and Switzerland, just outside Geneva. Much of the on-going activities of the laboratory are focussed on a new accelerator – the Large Hadron Collider (LHC) – that is currently under construction. Scheduled to enter operation in 2005, experiments at this facility will generate enormous amounts of data. Over an estimated 20 year running period, some 100PB – 1017 bytes – of data will be acquired at rates ranging from 100MB/s to 1.5GB/s. A number of research and development projects have been initiated to find solutions to the many challenges that the LHC will pose. Amongst these, the RD45 project has focussed on the problems of providing persistent storage to these vast quantities of data. Starting in 1995, this project has focussed exclusively on object-oriented solutions and object database management systems (ODBMS) in particular. In this paper, we describe the criteria by which we chose this technology, issues related to product selection and our experience in using ODBMSs in production. We discuss the risks involved with the current strategy and outline future directions for the project.

Introduction

Although CERN – like many scientific institutes – has traditionally used Fortran as a programming language, the High Energy Physics (HEP) community is moving rapidly in the direction of object-oriented languages – mainly C++ and Java. Along with the well-known benefits of object orientation, these languages offer significant advantages over Fortran (77) in their support for data structures and memory management. However, they do not provide solutions for the storage and manipulation of massive volumes of data in a distributed, heterogeneous, multi-user environment. In the "Fortran era", HEP tended to develop its own systems to deal with such problems, resulting a significant development and maintenance burden. Could commercial, "standard" solutions, for example those based on the Object Data(base) Management Group’s specifications, be a viable way of tackling these problems? It was with this background that the RD45 project was born in early 1995: with the goal of finding – or developing – solutions to the data management problems of the LHC experiments.

In the HEP world, "experiment" is used rather loosely to describe the large, international collaborations that design, build and operate a particle detector at a facility like CERN, plus the various physics studies that are performed with this equipment. There will be four experiments at the LHC, each consisting of well over a thousand physicists, from some hundreds of institutes in tens of countries. Each of these experiments will need to store of the order of 1PB – 1000TB – of new data per year. The rate at which data will be acquired ranges from around 100MB/s to 1.5GB/s. However, the total data rate that must be supported, including data processing, analysis, import and export to collaborating institutes and so on, will be at least a factor higher than this.

This paper is intended to be an account of our experiences – both successes and failures – in attempting to meet these challenges. As such, it necessarily includes both positive and negative results. However, it should be stressed that the overall message concerning the technology and specific products mentioned is indeed positive – even if multi-PB databases do not actually exist today, they appear to be within our grasp.

Methodology

As stated above, the RD45 [1] project was established to investigate the problem of providing object persistency to the event data of the LHC experiments. A baseline assumption of the project was that these experiments would adopt object-oriented approaches and C++ as the implementation language. Consequently, there was a need to provide support for persistent objects in a fully distributed, heterogeneous environment. Potential solutions obviously needed to be able to scale sufficiently so as to be able to handle the vast volumes of data that were expected. Although initially only C++ was considered, Java is now also of interest, bringing the need for support for language heterogeneity. In addition, it was felt to be highly desirable that a consistent interface be offered, regardless of the nature of the objects stored.

In evaluating different approaches to object data management, we were strongly influenced by Cattell’s book on the subject [2]. Essentially, all of the different strategies described and many of the specific products or implementations covered in this book were evaluated – some on paper, some by the development of prototypes.

Why an ODBMS?

After investigating various alternatives, it became evident that a number of our requirements could only be met by full object database management systems – as defined by the "Object Database Manifesto" [3] – and perhaps not even by these. Many of the "simpler" alternatives, such as language extensions or light-weight storage managers, lacked essential functionality – support for heterogeneity, for example. In addition, it was clear that our needs in terms of scalability went far beyond what had actually been achieved by any system at the time. The largest object databases then discussed were a few GB in size – approximately the volume of data that a single experiment would create every minute and at least one million times smaller that what we would store every year. As a result of these arguments, we did not expect to find a concrete solution to all of our needs off the shelf. It was rather our intention to first evaluate the technology with a view to understanding if there were one or more products that could eventually be extended sufficiently as to satisfy our requirements.

With this in mind, we attempted to contact a number of ODBMS vendors. Here, we encountered our first surprise. Having expected a great deal of interest in our project, we were disappointed that some vendors simply did not react – in one case, it took persistent badgering for 3 years to get a reply! Of those that did reply, a number seemed only interested in discussing the amount of money that might be involved, and not addressing the technical issues.

Which ODBMS?

Before describing our initial selection of an ODBMS, it should be stressed that we were not making a "final" choice for the production phase of the LHC. Rather, we were intending to make a "light-weight" selection of the most promising products for the R&D phase, with the clear intention to perform a much more complete evaluation closer to the startup of the LHC. It is now foreseen that such a choice will be made around the end of 2001, following extensive testing in the first half of that year. The requirements and other metrics by which such a choice will be made will be established during 2000, in conjunction with the experiments and the main external sites that will be involved in LHC computing.

Nevertheless, we followed the procedure described in Barry’s "Object Database Handbook" [5] in this preliminary choice. As such, we attempted to identify our key requirements and match products available at that time against these. Not surprisingly, our most important need was that of scalability. An architecture that – on paper at least – offered the possibility of scaling to many TB and perhaps even PB was considered extremely important. However, other issues, such as a flexible partner, willing to consider enhancements and the ability to provide good support, were also felt to be of high priority.

Initially, we contacted all known ODBMS vendors, inviting them to consider our project and explain how their product could help. Given the nature of the problem – surely unique at the time – we expected significant interest. In fact, some vendors simply did not respond. In these days of the World Wide Web, it is sometimes surprising how little the name "CERN" appears to be known, even in technical markets. However, the fact that these database vendors appeared uninterested in a database challenge was even more of a mystery.

In order to further our understanding of the technology, we invited a number of companies to present their products at CERN. Presentations of the O2, ObjectStore and Objectivity/DB databases were made and on-site training was held on both O2 and Objectivity/DB, it being the plan that both of these systems be used for further studies. However, primarily due to lack of manpower, it was decided to concentrate our investigations on a single product, with the clear understanding that a decision on a solution for the production phase of the LHC would occur much later. As stated above, this is currently foreseen for the end of 2001.

As a European research institute, CERN actively supports and promotes European products. However, it was clear that the architecture of Objectivity/DB – described in more detail below – made it considerably more attractive as a basis on which to build a multi-PB database. In addition, the company (at that time) offered excellent support to CERN and appeared to be extremely interested in meeting our requirements. Therefore, the bulk of the work since that time has focussed on this tool, although we continue to watch and evaluate alternatives. In summary, Objectivity/DB was selected for the initial R&D phase as a result of its uniquely scalable architecture and due to the enthusiasm of the company for our project. None of the other products surveyed (including O2, ObjectStore and POET) appeared to be sufficiently scalable as to meet our needs. The only possible alternative that has since been identified is Versant, which – although built upon a very different architecture – offers similar scalability characteristics.

Of course, other projects may well have different priorities and hence massive scalability may be of less importance. However, for CERN it is clearly key.

The Case for Standards

There are strong motivations for the use of standards, from both customer and vendor points of view. Indeed, feeling that the lack of standards was one of the reasons why the technology was not taking off, a group of ODBMS vendors founded the Object Database (now Data) Management Group (ODMG). This body has produced a number of revisions of its "standard" (it is not, in fact a de-jure standard, being made available as a book and not ratified by any official standards organisation), which include a number of language bindings. The ODMG set itself a number of ambitious goals: in theory, it should be possible to port an application from one compliant product to another simply by a re-compile. There is even a standard data exchange format, so that the data can be moved too. Unfortunately, conformance to the standard is far from complete – there are no two fully conformant implementations, so the goal expressed above is moot. In addition, it is unclear whether the standard – which by design does not attempt to address implementation issues – is sufficiently complete as to be usable for any realistic application. Nevertheless, particularly in the case of long-lived projects such as ours, the need for standards remains and we continue to push for vendor compliance.

Nevertheless, the ODMG principle, that the application developer should see the database as a seamless extension of the programming language, rather than two distinct environments, is felt to be an important and attractive feature of today’s ODBMS products.

"The ODMG [ …] binding is based on one fundamental principle: the programmer should perceive the binding as a single language for expressing both database and programming operations, not two separate languages with arbitrary boundaries between them." [4]

Objectivity/DB Architecture

As described above, the initial choice of Objectivity/DB as primary candidate for the R&D phase was based upon its scalability characteristics, as well as the enthusiastic support of the company.

The storage hierarchy of Objectivity/DB is based upon several layers: pages, containers, databases, and a federation. Objectivity/DB uses a 64-bit object identifier (OID), which in turn is divided into 4 16-bit quantities.

An Objectivity/DB "federated database" or federation, which is in fact, a distributed database system with centralised schema and a database catalogue, can contain up to 65536 databases. Each database maps to a file on a specified host – a federation may therefore be distributed over the entire WAN.

Field

Limit

Database Page

64KB maximum

Container

64K logical database pages

Database (file)

32K containers

Federation

64K databases

Thus, the maximum size of a federation is 64KB x 64K x 32K x 64K, or 263 bytes. In fact, even larger federations would be possible in the case of "large objects" – those larger than a single physical page. Each large object is stored on a separate logical page, so that truly humungous federations – well in excess of 10,000 PB – could, in principle, be created.

However, practical considerations, such as the maximum database size (filesize) apply. Even assuming 64bit filesystems, files in the TB region are not likely to be practical. It is our assumption that databases should be of a "manageable" size: that is, it should be possible to store a complete database on a single disk or tape volume, transfer it over the network or to/from tape in 102 – 103 seconds, and so forth. Such practical constraints result in a maximum recommended filesize of perhaps 10GB – maybe more by the time of the LHC. As a result, the current limits imposed by Objectivity’s architecture are too restrictive for our needs. The maximum "practical" size of a federation is currently some 65TB (almost certainly rather less), whereas our requirements call for federations perhaps 3 orders of magnitude larger! This was clearly an area where enhancements would be required and will consequently be treated in more detail below.

A drawback of the rather physical OID of Objectivity/DB is that an object cannot typically be moved – e.g. as a result of re-clustering – without the OID changing. As is well known, efficient clustering is important for performance reasons, particularly in the case of systems such as Objectivity/DB, which implements a page-server type architecture. Re-clustering maybe required if the initial clustering turns out to be sub-optimal – perhaps due to a change in access patterns, or simply because the predominant patterns were not well known at the time that the data was stored.

The access patterns that will occur when the LHC data is analysed are not yet understood. However, it is clear that more than one pattern will occur and that re-clustering will be required to maintain good performance. It is expected that an important access method will be via "named collections". That is, users will select a set of the data that is interesting to them by name – e.g. "Higgs candidates", and then iterate through this collection, performing further selection. Although, using bi-directional associations, it would be possible to automatically update these collections as a result of re-clustering, it is unlikely to be practical to maintain such associations between the data and all "user-defined" collections. Hence, a mechanism for maintaining at least some degree of transparency to re-clustering is required.

Prototypes

Initial prototypes with Objectivity/DB were encouraging and are discussed in considerably more detail in the numerous RD45 reports that are available through the Web [1]. In brief, these prototypes allowed us to confirm that the data definition language (DDL) of Objectivity/DB was sufficient to describe HEP data models (although it is not strictly ODMG compliant) and that the performance of the system were comparable to that of "legacy" in-house systems. In short, we were able to confirm that this technology was worthy of further investigation.

Not surprisingly, an area that needed study was that of scalability. Based on the architecture described above, we tested every explicit limit and searched for other, arbitrary restrictions in the product. For example, we tested the creation of the maximum number of databases, limits in the maximum size of pages, containers and databases themselves, object sizes and cache issues and so forth. These tests uncovered a small number of bugs – most, but not all of which have since been fixed – but also confirmed that the product scales in practice as on paper. However, there were clearly a number of issues that needed to be addressed, if the product were to be used in the PB region.

Detailed results of these tests can be found via the RD45 Web page. In brief, the scalability tests revealed the following:

In general, these results suggest that federations as large 1PB are feasible – say 50,000 databases of 20GB each. Multi-PB federations do not look realistic unless a number of enhancements, discussed below, are satisfied.

Enhancement Requests

Essentially, the bulk of our enhancement requests can be grouped under one heading – VLDB support. As described above, we needed extensions to permit larger federations than were – and still are – possible. In addition, under the assumption that multi-PB disk farms are not affordable, manageable, or perhaps both, we felt that it was necessary to provide an interface between Objectivity/DB and a Mass Storage System – a hardware and software system supporting essentially unlimited storage.

Objectivity/DB supports a "fat-client / thin-server" model. The server is essentially an I/O server – performing I/O operations on databases (files) for one or more remote clients. As such, it knows nothing about objects, and simply handles page I/O to files. The Mass Storage System (MSS) currently in use at CERN is called HPSS [7], and was developed by a consortium of industry, government and end-user sites, primarily in the US. It offers a number of interfaces, including (parallel) ftp, NFS and a Posix-like filesystem interface. Given the nature of the Objectivity/DB server, a convenient interface between these two systems appeared to be via the latter – simply replacing standard I/O calls with their HPSS equivalents. A proof-of-concept prototype, based on this approach, was developed and made available at the time of SuperComputing ’97 – an important internal deadline. Unfortunately, the performance of this prototype was poor and led to the need for a different approach. In brief, HPSS is optimised for the transfer of very large data volumes – far in excess of those typically performed by a database system. Thus, it was necessary to rethink the interface and allow both components – the database and MSS – to operate optimally. The solution that has been adopted is to copy complete database files from the MSS to a local Unix file system at file open time, if they are not already in the "file cache". Free space is managed by a daemon that moves unused databases to tape based on an LRU algorithm. This approach has been shown to deliver good performance, and is the strategy that is hoped will lead to a production version of the interface in the immediate future [8].

In its current form, Objectivity/DB uses a unique single-threaded server per host. As some operations – such as a file open on a database that is tape resident – may take a long time, the existing RPC mechanism is liable to time-out. Although it is possible to globally increase this time-out, such an approach is clearly far from an optimal solution and will result in poor overall performance, as all clients will be blocked by a single request for a tape resident object. Assuming a likely 10 / 90 % split in disk / tape storage capacity, the overall system is likely to spend the bulk of its time waiting for tape mounts.

Two strategies are being adopted to address this issue: firstly, a multi-threaded implementation of the server is being developed. Secondly, the client-server protocol will be extended to allow the server to ask clients to retry after a specified period, useful if a lengthy operation is encountered. Again, production versions of these features are eagerly awaited.

An additional architectural change that needed to be addressed, and indeed the last that is covered in this paper, is that of the federation size. As described above, practical limitations on the filesize lead to a need for more files per federation. This could be implemented in a variety of ways, including via an extended OID or via changes to the OID mapping. The approach that is currently being followed is the latter – a future release of Objectivity/DB should permit containers to map to files, essentially increasing the maximum number of files/federation from 216 to 232. Keeping a 10GB limit on the filesize, such a change should be sufficient as to permit federations in excess of 100PB and hence satisfy our needs in terms of storage capacity. However, given the experience with the proof-of-concept HPSS interface, we feel that the initial implementation will need to be carefully evaluated before drawing any such conclusion.

All of the above features, as well as support for the Linux operating system, were scheduled for delivery by the end of 1998. None are currently available. If this situation is not remedied in the very near future, it will inevitably have a highly negative impact on our ability to make production deployment of this product and will force us to consider alternatives.

Successes

Although the primary focus of our project has been R&D towards the needs of the LHC experiments, there is clearly much to be learnt from early real-life use of any potential solutions. As such, it is important to note the proposed strategy – primarily Objectivity/DB but more recently both Objectivity/DB and HPSS – has been used for a number of production purposes both at CERN and outside HEP institutes for a number of years. Whilst the databases produced have been rather modest in size – in the 10-100GB region – tests have also been made of databases up to 1TB, data rates (writing) up to 150MB/s (writing) and of some 200 concurrent users (reading). These results confirm the overall strategy and represent significant progress towards the needs of the LHC. Further multi-TB databases – both test and production – will be created this year, leading to several hundred TB databases around 2000. Thus, on a technical basis, we believe that things look rather positive, although there are many additional areas that need to be studied. It may be that changes to the product could successfully address these concerns. However, the time to market for changes has proven to be significant: the as-yet unreleased (in a production version) enhancements for an interface to an MSS have been in development for over two years and under discussion for at least one more. Given that our requirements are seen as somewhat specific, it is unclear if we can expect any further such changes. Therefore, our working assumption must be that we develop techniques to solve these issues using the product as it exists today.

Problem Areas

Issues that are not yet satisfactorily addressed include those related to deploying a distributed database in a multi-user, multi-site environment. In our field, there is no clear distinction between "user" and "developer". Furthermore, there is not a defined point at which development ceases and production begins. All users need the ability to define their own schema and associated data, and the data needs to be accessible at all participating institutes around the world. Neither of these issues is satisfactorily addressed in the current version of Objectivity/DB: a federation is a tightly-coupled system that does not map well to the loosely-coupled, heterogeneous environment that exists in HEP. As an example, the current version of Objectivity/DB requires that all participating nodes in a federation be accessible for certain operations to be performed, such as catalogue updates or schema changes. This requirement effectively rules out the use of a single, collaboration-wide federation. It may be possible to deploy a single federation across a small number of well-managed, tightly-coupled sites, but the need for data import/export between multiple federations and the multi-user issues described above remain unresolved.

In addition, the continued delays in V5.2 of Objectivity/DB, where the main enhancements listed above are scheduled, is a significant cause for concern.

Release Issues

Related to our specific choice of ODBMS, there have been a number of issues related to releases of Objectivity/DB and other commercial packages. For some time now, it has been hard to obtain realistic release information from the company: release schedules – if announced – are rarely kept. Release features – that is, the new and changed features in a given release – also remain a mystery. This situation, which has persisted throughput the last two major releases of the product, is an area of considerable concern. In addition, and as will often be the case – we need to obtain versions of Objectivity/DB and other packages for a compatible set of operating systems and compilers. Finally, as the schedule at CERN is driven by that of the laboratory’s accelerators, we need to make a consistent release of all commercial products and HEP-specific packages based there-on early in the year. Once the accelerators start, typically in early-Spring, experiments normally freeze their software environment until the end of the year. Thus, if we are unable to provide a consistent release sufficiently early as for the experiments to migrate to it in time for the start of data-taking for that year, we essentially miss an entire year. Although, in detail, this may differ from other disciplines, the constraints tend to be largely the same: realistic schedules are fundamental to success in production deployment. Many software companies tend to take a rather relaxed attitude to delivery dates, pointing the finger at other, typically much larger, firms that also have a poor track record in this respect. To the customer, this is not acceptable – do not tolerate it: link payment to delivery schedules. Whilst this might not guarantee on-time delivery, it certainly provides "motivation".

Object Databases and Mass Storage Systems: the Prognosis

A few years ago, there were a number of predictions that ODBMSs would experience growth similar to that experienced RDBMSs in the 1980s and perhaps even become the dominant technology. Estimates of the ODBMS market have been consistently overoptimistic. Despite the fact that object technology has been increasingly adopted and used together with database systems, the market penetration of ODBMSs remains negligible. This is no doubt due, at least in part, to the maturity of the RDBMS market – and relatively immaturity of the ODBMS one. Nowadays, few companies can exist without the fairly widespread use of one or more RDBMSs. Object Databases, one the other hand, face an up-hill struggle and need to clearly identify precisely where and when they provide advantages over more conventional solutions. The traditional, somewhat overused, arguments of complex data models, performance and scalability might all be valid, but have demonstrably failed to convince a sufficiently large number of users as to enable the technology to take off. An area where it seems that RDBMS technology really is insufficient is that of very (very) large databases, such as those that we are attempting to construct at CERN. Disregarding the usual arguments of "impedance mismatch", relational technology does not today seem applicable to the multi-PB region. This is not felt to be a technical issue – at least one (very) large database vendor has told us that they have no intentions of addressing such large volumes of storage until there is sufficient commercial demand. Thus, whilst we can be reasonably sure that this will happen at some stage during the lifetime of the LHC (2005 – 2025), it will probably not happen in time for a decision for an LHC data management system, around the end of 2001.

Similarly, the market for high-end Mass Storage Systems seems to be confined to a handful of research and government laboratories around the world. Indeed, given the current trends, disks may well render tapes (or equivalent) totally redundant for data storage (as opposed to backup – distribution already having long-since lost out to CDs). It is not the purpose of this paper to speculate on the future of storage in general, but the perceived lack of market for high-end storage and associated software is inevitably a matter of concern that must be addressed when evaluating the overall risks of our current strategy.

The Need for a Fall-Back Solution

The current baseline assumption of the LHC experiments is that a combination of a commercial ODBMS and MSS will be used as the foundation on which a data management solution is built. Unfortunately, as described above, the market for these systems is small and maybe insufficient as to sustain even a single product capable of satisfying our key requirements. In addition, given the extremely long timescales involved – in excess of 20 years – the survival of any given system or vendor cannot be guaranteed. Thus, we need to consider alternatives – including home-grown solutions once again – and be prepared for a migration between systems, which we must assume will take place at least once during the running period of the LHC. In some senses, what is needed is probably only a stop-gap solution. Whereas multi-PB problems are rare today, it is likely that the needs of industry will out-pace those of HEP in this area, as they have already done in many others. Thus, the large database giants will eventually turn their gaze to this market, and produce sufficiently scalable solutions. However, this cannot be assumed on our timescale, and hence work on a fall-back solution is necessary.

Previously, our fall-back strategy was based upon Versant – an alternative ODBMS with a somewhat different architecture to Objectivity/DB (e.g. thin-client, fat-server), but similar on-paper scalability. As with Objectivity/DB, we studied its architectural limits and scalability, which in some cases demonstrated remarkedly similar characteristics. Unfortunately, Versant’s financial situation does not appear to be encouraging, and hence we no longer consider it to be a viable alternative.

Thus, it now appears that no existing systems – commercial or otherwise – other than those identified offer sufficient scalability as to be considered a serious potential solution. In addition, even were such a candidate identified, it is felt to be highly likely that enhancements would need to be made – as has been the case with Objectivity/DB. Given the long lead times in obtaining such enhancements in a released product, the chance of obtaining an alternative solution to our needs is considered very low.

Faced with such a dilemma, the inescapable conclusion is that we need to build a system ourselves – or at least undertake sufficient feasibility studies as to convince the experiments that such a system could be built on an appropriate timescale, if required.

Manpower Requirements

"Conventional wisdom" suggests that some 150 man-years are required to bring an ODBMS product to market and perhaps the same again – or even more – to deliver a mature product. This figure was consistently quoted to us at the start of our project by a number of ODBMS vendors.

The O2 database system, for example, was the result of the Altaïr project – a 5-year effort that began in 1986. This project is described in "Building an Object-Oriented Database System – The Story of O2" [6]. At the end of the project, a first evaluation system was available on Sun systems only. A total of 39 people are acknowledged in this project, although it is unlikely that they all participated full-time for its complete duration. Assuming 50% efficiency, we arrive at 100 man-years – consistent with the above estimate. In addition, it is clear that a significant amount of additional work was required to bring the product to market.

To the best of our knowledge, work on Objectivity/DB started slightly later – around 1988. Unfortunately, there is no equivalent book documenting the history of the product. However, it is estimated that some 80 – 100 man-years of engineering effort had been invested by the time that CERN acquired a license (for V3.5) in 1995. In addition, a realistic estimate must include a considerable amount of additional effort to cover documentation, training and marketing activities.

We feel that these estimates, whilst valid for products developed in the late 1980s – early 1990s, are rather pessimistic. Today’s C++ environment is considerably more mature than was the case some ten years ago. The availability of libraries such as the STL, the standard interface defined by the ODMG and the existence proof provided by the numerous ODBMSs that have been developed, all contribute to the overall manpower savings that a project initiated today would enjoy. It is our conclusion that 50 man-years would be more than sufficient for such a project – markedly less if a specific community were targeted. For example, an SQL interface and ODBC driver are of little concern to our users and could be safely dropped. It is too early to make detailed man-power estimates, but we believe that it would indeed be possible for such a system to be developed with the limited resources available in our field.

Global Requirements

The preparation of a detailed list of requirements, leading up to a choice of system for the production phase, is scheduled for 2000. However, it is clear that some of the key elements of the system can already be identified. For example, given our needs in terms of scalability, it seems inevitable that support for distributed databases with consistent schema is required. In addition, heterogeneity must also be catered for, even if, today, the number of platforms used in our community appears to be dwindling. (PC hardware is predominantly used for all except high-end fileservers.) Support for typical production use cases must clearly be provided. In addition, a mechanism whereby all users can have their own private data and schema with somewhat looser coupling to the main shared data than is possible today seems highly desirable.

Similarly, we need to identify the features of ODBMSs that have been particularly beneficial and those that we can safely ignore – at least initially. For example, do we need to implement object versioning in the system itself? Could such a capability be adequately handled by the provision of an add-on class library?

A HEP ODBMS Test-bed

A potential first step towards the development of a fully-fledged alternative is an ODBMS test-bed. It is felt that such a test-bed is needed for numerous reasons, including:

For example, the ODBMS/MSS interface will have taken several years from first thoughts to final release. As described above, the proof-of-concept prototype suffered from poor performance and hence a new design and implementation was required. Such an approach is not only wasteful of effort but can also result in unacceptably long delivery times. By using a simple "ODBMS emulator", we should be able to obtain rapid feedback on potential design problems, without the overhead of waiting for a pre-release of a commercial product.

Production Choices

As stated above, the production phase of the LHC is scheduled to start in 2005. The various software solutions that will be deployed at this time will be chosen somewhat earlier, to permit integration and deployment. As such, it is currently foreseen that a choice of database vendor – assuming a commercial solution is adopted – will be made in 2001. It is understood that a choice will be from production systems that exist at that time, although a replacement system cannot be ruled out, and should indeed be assumed, over the lifetime of the LHC. Given that the systems currently under study – Objectivity/DB and HPSS – have already been used for demonstrations at a scale similar to that of the LHC, it is likely that the acceptance criteria will be somewhat stringent. For example, one can imagine scalability tests up to at least 10TB, if not 100TB, I/O rates of 100MB/s and support for hundreds of concurrent clients – both readers and writers. In addition, areas that are currently less well understood, such as multi-user, multi-site support, will need to be adequately handled.

Conclusions

We have described our attempts to build a powerful data management system out of industry standard components, such as commercial object databases and mass storage systems. The system, which has been used in production over a number of years, has already met a number of the very challenging requirements of the LHC experiments. Scalability to hundreds of clients, data rates in excess of 150MB/s and the storage of over 1 TB of data has been demonstrated in practise. Tests of the architectural limits of Objectivity/DB lead us to believe that multi-PB databases are very close to being possible today and are likely to become feasible within the immediate future. Extending the successes described in this paper to the level of the LHC would appear to be well within our grasp. Unfortunately, the ODBMS market as a whole remains small, and hence the risks associated with this strategy lead us to conclude that at least one alternative solution must be identified, or if necessary developed. The considerable experience on object database technology that has been built up over the past 5 years will be key to the successful development of such alternatives.

Acknowledgements

The work described in this paper is the result of the effort of numerous people over several years, both within the RD45 project and outside. In particular, the interface between Objectivity/DB and HPSS was implemented by Objectivity and Andy Hanushevsky, SLAC, together with Marcin Nowak, CERN. The HPSS system at CERN is run by members of the CERN IT/PDP group, who are also responsible for running the production Objectivity/DB servers. Finally, we owe much to the experiments at CERN and outside that have entrusted their data to the system described above.

References

  1. RD45 - A Persistent Object Manager for HEP – see http://wwwinfo.cern.ch/.
  2. Object Data Management, R.G.G. Cattell, Addison Wesley, ISBN 0-201-54748-1.
  3. The Object Database Manifesto – see http://www.cs.cmu.edu/People/clamen/OODBMS/Manifesto/index.html.
  4. The Object Data Management Group (ODMG), see http://www.odmg.org/ and "The Object Database Standard, ODMG-93", Edited by R.G.G.Cattell, ISBN 1-55860-302-6, Morgan Kaufmann.
  5. The Object Database Handbook, Douglas K. Barry, ISBN 0-471-14718-4.
  6. Building an Object-Oriented Database System – The Story of O2, ISBN 1-55860-169-4.
  7. High Performance Storage System (HPSS) – see http://www.sdsc.edu/hpss/.
  8. Coupling a Mass Storage System With a Database for Scalable High Performance, Andrew Hanushevsky (SLAC), Marcin Nowak (CERN). In proceedings of the 16th IEEE Symposium on Mass Storage Systems.