Introduction to RD45
RD45 is a CERN R&D project established in 1995 to investigate
the question of object persistency for HEP.
A recent presentation describing the goals and status of the project can be found
here.
Status reports and the original project proposal can be found
here.
RD45 - Basic Concepts
RD45: a CERN R&D project established in 1995 to investigate
the question of object persistency for HEP.
-
Persistent objects are those which continue to exist upon
process termination and may then be accessed by other processes.
-
Transient objects are typically created by a process
and cease to exist when that process terminates (or before).
RD45 handles objects of all kinds
-
Histograms
-
Detector calibration and geometry
-
Production control
-
main target is event data
Data volumes and rates
The following table shows the date volumes per unit of time.
It assumes an average event size = 1MB and and event rate = 100 Hz,
as is expected for ATLAS and CMS
| Time interval
| Volume of data
| Equivalent
|
| 1 second
| 100 MB
| 1 linear metre of books
|
| 1 minute
| 6 GB
| 1 Exabyte 8500
|
| 1 hour
| 360 GB
| 15000 trees-worth of paper
|
| 1 day
| 8.6 TB
| US Library of Congress
|
| 1 week
| 60 TB
| The NCAR MSS today
|
| 1 month
| 260 TB
| The ECMWF MSS in 2002
|
| 1 year
| 1 PB (assuming 100 day operation)
| 3 years EOS data (2001)
|
| 1 millenium (all experiments)
| 5 EB
| All words ever spoken by humans
|
Non-goals
-
Not an investigation of whether OO is good or not
-
Not a comparison of OO programming languages (C++ versus Smalltalk)
Preferred solutions
-
Direct support from C++ (e.g. classes)
-
Class libraries - for containers, iterators
-
Standard Template Library (STL) - low level
-
Tools.h++ - higher level
-
Standard solutions
-
ODMG compliant Object Database (Objectivity, O2, ObjectStore)
-
X/Open compliant Mass Storage System (HPSS, ADSM, OSM)
We want single, global, scalable solutions
(or at least, a single interface)
C++
-
supports OO programming
-
is widely supported by the industry (class libraries, object databases etc.)
-
is interoperable with C (still needed for some time to come)
-
interoperable with Fortran, perhaps with some pain (ditto, unfortunately...)
Standard Template Library
(by Alex Stepanov and Meng Lee of Hewlett-Packard Labs)
-
a set of easily composable C++ container classes
-
vectors, lists, sets, multisets,
maps, multimaps, stacks, queues and priority queues
-
generic algorithms
(searching, sorting, merging, copying, and transforming)
Tools.h++
(from Rogue Wave Software, Inc.)
-
More user friendly, more complete, more safe than STL
Features in addition to STL:
-
IOstream facility
-
templatized string class and class for representing complex numbers
-
class for numeric limits
-
memory management
-
exception handling
Object Database Management System (ODBMS) requirements
-
Standards compliance (ODMG)
-
Scalability (100GB-1TB per database file)
-
Heterogeneity
-
WAN support (distribution, replication, caching, recovery etc.)
-
Schema evolution
-
Many others ...
Object Database Management Group (ODMG)
The ODMG is a consortium of ODBMS vendors
and interested parties working on standards to allow portability
of customer software across ODBMS products.
Some of ODMG standards
-
Object Definition Language ODL - a programming language
independent mechanism to express user object models
-
Object Query Language OQL - for interactive
and programmatic query (an extension of SQL)
-
C++ Binding
-
Smalltalk Binding
-
Java Binding
ODMG compliant Object Databases
-
Objectivity - installed at CERN - preferred solution
-
Technical reasons: stringent requirements, including scalability,
robustness, performance, replication, versioning (schema evolution),
federated database, etc.
-
Practical reasons: good support, flexible company, responsive
to our requests etc.
-
Versant
-
O2 - French company (member state), persistence by reachability
-
ObjectStore - one level store (they map the DB in virtual
memory, use very low level routines from OS - problem to debug)
-
...
Mass Storage System (MSS) requirements
Scalability issues:
-
Total size: multi-PB region
-
Bitfile size: 1TB (64 bit filesystems, 300GB files exist today)
-
Data rates: 100 MB/second/stream (may be implemented using parallelism)
-
Overall bandwidth: multi-GB/second (low??)
Obvious choices include
-
HPSS - CERN considering joining HPSS collaboration - preferred solution
-
ADSM - installed at CERN
-
OSM - installed at CERN, but the usage will stop soon
Very Many Large Databases
-
Split database at 100GB level (every 20 minutes)
-
1995: 10-30 GB databases are possible
-
Fits well with expected tape capacity (one DB per volume)
-
1995: NTP 10 GB, Redwood 50GB
-
2004: 500GB-1TB, 50-100MB/sec
The Physicists' dream
Give me all events that will let me find the
Higgs and win a Nobel prize
We believe we can build such a system using
-
standard, off the shelf ODBMS technology
-
standard, off the shelf MSS technology
-
Provide transparent, (sub-)event level access
using standard language and database features

Last update: 3rd April 1996 by
Jamie.Shiers@cern.ch,
Pavel.Binko@cern.ch