Oracle Activity for the SAM Database Servers

(Document Number: PPDG-14)


The intent of this web page is to provide some statistical data on the SAM database servers. It will initially focus on the level of Oracle activity generated by these servers.

The original request for this type of information came from Koen Holtman as a part of the HEP Particle Physics Data Grid (PPDG) project initiatives.  It is hopefully useful in estimating access and transaction rate for metadata within data grid systems.  If this data is maintained on an on-going basis, it could be useful in the monitoring of server capacity and bottlenecks and if trended over time, it might be useful in identifying if/how data needs vary (or don’t vary) over the lifetime of an experiment.

The page provides:

·        some background on sam

·        the physical architecture of SAM to understand where the statistics are derived from,

·        a description of the statistics being collected and

·        some observations  regarding the data collected along with daily and hourly data queries and bar charts.

Background

The DZero (D0) experiment's data and job management system software, SAM (Sequential data Access via Metadata), is currently being used to provide a worldwide system of shareable computing and storage resources. It is currently providing file storage and retrieval for Monte Carlo simulations and detector data for reconstruction and analysis programs.

The SAM system relies heavily on locally deployed servers, called stations, with the following responsibilities:

·        Storage and retrieval of data files to and from the available mass storage systems.

·        Management of data stored on cache disk over which it has exclusive control.

·        Submitting to the underlying batch systems and monitoring the user processes which request data files from the system

The SAM system can be accessed by several mechanism which, in many cases, utilize these SAM stations:

·        UNIX command line interfaces - files can be accessed or stored via these commands. Examples of these would be 'sam translate constraints...' and 'sam store' commands.

·        WEB based interfaces - provides general information about the number of files, size and number of events based on user specified criteria.

·        Python APIs

·        C++ interfaces providing access through a standard D0 framework package

More detailed information in terms of architecture and functionality can be found in the following:

SAM Overview and Operation at the D0 Experiment (MS/Word format)

SAM and the Particle Physics Data Grid (pdf format)

SAM Home Web Page

The SAM system has been used for all of the D0 data management beginning with Monte Carlo data in 1999 and extending through the current on-line acquisition of detector data since March 2001. Nearly all the Monte Carlo data is produced at sites other than Fermilab and the files and metadata are sent to FNAL through the use of remote SAM stations.

top

Sam's Physical Architecture

The purpose of this section is to provide a brief explanation of the physical architecture of SAM in order to understand how the database statistics where gathered.

SAM is designed as a 3-tier architecture:

·        Oracle DBMS server,

·        CORBA compliant application level database servers

·        Client software including the SAM stations

The middle tier, application level database servers, provides the only access to all data maintained in the Oracle DBMS. All SAM client functions access the Oracle DBMS via the middle tier.  This 3 tier physical architecture provides flexibility in managing and balancing resources and demands on the Oracle DBMS. There are currently 5 database servers handling the access to and from Oracle.

    prd

    Access via the SAM station managers for both Monte Carlo and
    D0 detector data. These can be local or remotely deployed stations.

    dlsam_prd

    Online storage to SAM of D0 detector data.

    user_prd

    Access via SAM user commands entered from the UNIX command
    line for both Monte Carlo and D0 detector data
    (e.g. - 'sam translate constraints....' commands ).

    web_prd

    Access via web based applications for general queries of both
    Monte Carlo and D0 detector data.

    farm_prd

    Access of D0 detector data for reconstruction to a network-distributed
    system of processing nodes referred to as the farms.

 

Each database server generates a log file identifying when a particular method is invoked and also identifies each call made to the Oracle DBMS.

top

Database Server Statistics

When the initial request for this data was received, the first place we looked was into the Oracle DBMS.   The only data available in Oracle is some average value based on periodic samplings that are taken.   The term ‘some average value’ is used because it has been difficult to determine just what the value means even after several explanations from Oracle.  Hopefully, based on the data derived here, we can validate the numbers Oracle maintains.   There is also no means within Oracle of identifying activity for specific applications.  The D0 Oracle database contains tables for SAM, several types of  calibration data, runs summary information and other application data.  It would not have possible to distinguish between any of these within Oracle

So, the database server logs files were chosen for this reason.  A python program was written to parse these logs files and from all observations it appears that all Oracle commands are included in these logs. Tallies of the number of Oracle database rows updated by the inserts, updates and deletes is not available in the log files. While number of rows retrieved on queries (selects) is available, it was not captured at this time. If it is considered important, it can be made available.

The data was collected beginning 1/24/02, providing daily and hourly summaries of the following information:

    Methods

    # of times a method was called

    Selects

    # of times an Oracle select statement was issued

    Updates

    # of times an Oracle update statement was issued

    Inserts

    # of times an Oracle insert statement was issued

    Deletes

    # of times an Oracle delete statement was issued

 

This data is exclusively SAM’s.  Based on how the log files are currently being populated, it appears to be all-inclusive. Initially we considered separating application level activity from monitoring activity and, at this point, have identified 3 types:

·        DBMS monitoring (the 1st tier)

There are many queries made daily against the Oracles database that are used to determine the state of the database for performance and maintenance purposes.  These numbers will not be reflected in the statistics collected.

·        Application level database server monitoring (the middle tier)

There are several methods built in the database servers to determine the state of the servers, such as, ‘ping’ or a ‘hello’ type method that are invoked by the clients to determine if the server is alive.  From reviewing the log files, it appears that none of these methods send any statements to the Oracle DBMS.  So there is no monitoring activity of this type in the statistics.

·        Client level monitoring (the 3rd tier)

In this very complex architecture of SAM stations, there may be data maintained in the Oracle RDMS that allows the stations to maintain their current state for both recovery purposes and for inter-station communication.  Monitoring at this level is not yet deployed..

The data is currently being stored in Oracle tables and has detailed information down to the individual methods invoked. Hourly and Daily summary tables were then created to facilitate access. The detail data will be deleted once any additional summary tables are identified and populated.

top

Observations

There are 2 means of viewing the data:

·        Data Queries - provides visibility to the summarized daily and hourly tallies for data collected since 1/24/02

·        Daily and Hourly Charts - provides averages and a graphical display of the activity for February

Approximately 90% of the Oracle activity generated by SAM is in the form of data queries (selects).  The deletes (0.2%) represent primarily maintenance of data file and cache locations.

From charts showing the activity based on the database servers, we can identify the relative level of activity associated with online data taking (.8%) and with reconstruction on the farms (18.4%) as these servers are used exclusively for these respective purposes.  For the other servers, user_prd (14.0%), web_prd (3.6%) and prd (63.2%), we cannot easily identify the specific activity they are involved in.

To provide some context in terms of the type of data being written or accessed via the SAM system, we performed a cursory analysis of the metadata within SAM looking at the number of files created new in SAM for the month of February 2002 and the number of files being accessed in analysis projects during the same month.  The intent is to determine the relative level of activity associated with Monte Carlo data, Raw data and Reconstructed data.   We used the metadata tag called Data Tier (the definition can be found in the Sam Data Model)  to produce this chart showing SAM Usage Based on Data Tier Categorization.  At the bottom of the chart is tab called usageDetails that shows the categorization we applied to each data tier.

Monte Carlo files account for 45% of the files created within the SAM system, but only represent 0.3% of the files used by analysis projects.   As mentioned earlier, a large amount of Monte Carlo data from around the world is sent to and stored at Fermilab.  However, the reconstruction of that data is performed elsewhere.

Raw Data (10.2% created / 22.6% used) and Reconstructed Data (25.7% created / 35.7% used) are both created and analyzed at Fermilab through SAM   There is a rather significant number of files in the Other category (19.1% created / 41.5% used) that is mainly physics analysis performed on the highest level of reconstructed data (root-tuple). 

 

top


Questions / comments?  Contact Gabriele Garzoglio or John Weigand