Management of
multidimensional Aggregates for
efficient Online Analytical Processing

Jens Albrecht, Andreas Bauer, Oliver Deyerling, Holger Guenzel, Wolfgang Hümmer, Wolfgang Lehner, Lutz Schlesinger, Oliver Deyerling

Abstract

Proper management of multidimensional aggregates is a fundamental prerequisite for efficient OLAP. The experimental OLAP server CubeStar , which concepts are described in this paper, was designed exactly for that purpose. All logical query processing is based solely on a specific algebra for multidimensional data. However, a relational database system is used for the physical storage of the data. Therefore, in popular terms CubeStar can be classified as a ROLAP system. In comparison to commercially available systems, CubeStar is superior in two aspects: First, the implemented multidimensional data model allows more adequate modeling of hierarchical dimensions, because properties which apply only to certain dimensional elements can be modeled context-sensitively. This fact is reflected by an extended star schema on the relational side. Second, CubeStar supports multidimensional query optimization by caching multidimensional aggregates. Since summary tables are not created in advance but as needed, hot spots can be adequately represented. The dynamic and partition-oriented caching method allows cost reductions of up to 60% with space requirements of less than 10% of the size of the fact table.

Introduction

The area of statistical analysis of empiric or business data has been boosted significantly in recent years by the concepts of data warehousing and online analytical processing (OLAP, See CoCS93 ). This paper focuses on the concepts of the experimental OLAP server CubeStar , which differs essentially in two aspects from commercial systems and research prototypes. First, the used multidimensional data model provides a more very flexible modeling of classification hierarchies resulting in an extended star schema on the relational side. Second, the system implements a dynamic and adaptive aggregate cache which allows much more efficient multidimensional query processing.

The structure of the paper is as follows: The next section gives a short introduction to the multidimensional data model and the query language. Section three outlines the fundamental ideas of the dynamic and partition-oriented aggregation cache. The fourth section explains the relational representation of the multidimensional structures and queries. The general architecture of CubeStar is outlined in section five. The last section illustrates the performance potential of the aggregate cache by some simulation results. The paper concludes with a summary and an outlook of the current research activities.

Data Model and Query Language

A structural deficiency of many multidimensional modeling approaches like See AgGS97 , See CaTo97 or See LiWa96 is the missing distinction of classifying and characterizing information. Consider the following SQL statement for the statistical table in figure 1 assuming a two-dimensional scenario:

SELECT Product_Family, Product_Brand,
Shop_Region, Shop_Type,
SUM(Sales)
FROM ...
WHERE Product_Group = `Video' AND
Shop_Country = `Germany'
GROUP BY Product_Family, Product_Brand,
Shop_Region, Shop_Type

Both the relational mapping and the multidimensional equivalent know only grouping attributes resulting in an actually four-dimensional output table, respectively data cube. However, like the prefixes of the attributes demonstrate, the example is about a two dimensional problem where Product_Family is a classification attribute ( CA ) in a hierarchical classification structure of the product dimension (figure 2). The attribute Brand however is not part of the classification but serves as a property attribute for an additional characterization of single items. These in turn - generally called dimensional elements - are instances of the primary attribute ( PA=CA0 ) of a dimension, which functionally determines all other attributes.

At the instance level, each class of dimensional elements inherits some property attributes from its parent class where others are specific to that node ( See SmSm77 ). For example, only Video products show a property attribute VideoSystem . A property VideoSystem for the product group Washers , which is also element of the product dimension, would result in a structural schema defect ( See LeAW98 ). Other multidimensional data models are not able to express this fact. The modeling paradigm of CubeStar distinguishes between classification attributes (figure 2, upper part) and property attributes (figure 2, lower part). The classification hierarchy defines the overall framework for the analysis, specific property attributes are used for detailed evaluations.

Multidimensional Objects

The implementation of a multidimensional query processor requires a formal notion for multidimensional data cubes. Like relations in the relational model, such a formalism should be able to express descriptive queries and stored data as well as intermediate results. The basic data structure of CubeStar is the multidimensional object ( See Lehn98 ).

A multidimensional object describes a partition of a possibly aggregated multidimensional data cube. The example of figure 1, the sum of the Sales of Video recorders in Germany for each (product) family and (shop) region further detailed by Brand and ShopType may be specified as a multidimensional object in the following manner:

M  = ( [ Sales, SUM, FLOW ],
( Product.Group = `Video',  Shop.Country = `Germany' ),
[ (Product.Family, Shop.Region), { Brand, ShopType } ] )

Formally a multidimensional object (MO) is defined as a triple ( M, S, G ) . The measure M  =  [ N, O, A ] consists of a name N (e.g. Sales ), the applied operation O { NONE, SUM, AVG, COUNT, MIN, MAX, ... } and the aggregation type A { FLOW, STOCK, VALUE_PER_UNIT } specifying the permitted aggregation operations for that measure ( See LeSh97 ). The second component S is a tuple of one classification node per dimension describing scope of the represented data partition ( (Product.Group = `Video', Shop.Country = `Germany') ). The restriction to single classification nodes instead of arbitrary predicates results in simple conditions for the derivability of MOs ( See The Patch Working Principle ). The last component of a MO describes the granularity G = [ L, P ] , consisting of a tuple of hierarchy levels L ( (Product.Family, Shop.Region) ) and a set of property attributes P ( {Brand, ShopType} ).

Operations on Multidimensional Objects

The application of typical interactive OLAP operators on a multidimensional object results in a simple manipulation of its components. The classification oriented operators `drill-down' and `roll-up' result in a change of the hierarchy level part of the granularity. The equivalent of the operators for property attributes are `split' and 'merge' which add, respectively remove, a property attribute. `Slice' and `unslice' operations change the scope of the MO. For example, the data cube from figure 1 can be obtained from the total sum of all sales by a "drill-down" to Product.Family and Shop.Country , a "split" on Brand and ShopType and a following "slice" to Shop.Country='Germany' and Product.Group='Video' .

Besides these navigational operators, operations like explicit aggregations (e.g. AVG(G')MO or binary operations (  MOTurnover = MOSales*MOPrice  ) are defined ( See Lehn98 ).

Query Language

Descriptive queries can be specified in the Cube Query Language (CQL, See BaLe97 ). The query for See Statistical table would be expressed in CQL as follows:

SELECT SUM(Sales)
FROM Product, Shop
WHERE Product.Group = `Video',
Shop.Country = `Germany'
UPTO Product.Family, Shop.Region
SPLIT BY Product.Brand, Shop.ShopType

Despite the (intended) similarity, CQL shows some substantial differences to SQL. On the one hand the FROM -clause contains no relations, but a set of dimensions spanning the actual analysis context. Queries are internally represented by multidimensional objects. The translation of such a simple query into a MO is straightforward. Note that the distinction of classification and property attributes is represented on the syntactical level by the UPTO and the SPLIT-BY clause.

But CQL has much more expressive power than the previous example suggests. Nested statements and arbitrary restrictions on the dimensional structures can easily be formulated. For example, the computation of the percental sales of each article based on its product group (range-to-total) can be expressed in the following manner:

SELECT Num  / Denum * 100 AS PercentalSales
FROM Products P, Shops S, Time T
WHERE P.Group = ´Video´,
S.Region = ´South´,
T.Month = ´07/99´
WITH (SELECT SUM(Sales) AS Num
 UPTO P.Article, S.Region, Time.Month),
(SELECT SUM(Sales) AS Denum
 UPTO P.Family, S.Region, Time.Month)

Therefore, in general a query is represented by a tree. Its leaves represent multidimensional objects which can be computed by a simple aggregation, the inner nodes represent binary operators, like the division and the multiplication in the example above, or complex restrictions predicates.

Processing Multidimensional Aggregates

In the application domain of OLAP/data warehousing an adequate and very common optimization method is the use of summary tables thus pre-aggregating the enormous quantity of data. One way to do so, is the a-priori-approach, where aggregates are precalculated for all possible aggregation levels, so that queries may easily be answered during run-time through a simple lookup operation. Some commercial products use this strategy ( See Cogn98 , See Hype98 ). However, since the number of possible aggregates grows exponentially with the number of grouping attributes, this method is not feasible in application scenarios where there are 20 or more property attributes describing a product or a customer.

The only alternative is to select some of the most beneficial aggregates for materialization. Some queries then may be derived from the aggregates where others need to access the raw data. The problem is to select those aggregates which give the highest overall speed-up by allocating the least amount of disk space. Based on the relational data model See Gupt97 and See YaKL97 proposed algorithms to select views in a data warehouse. Of fundamental importance in the multidimensional context was the work See HaRU96 where a greedy algorithm based on an aggregation lattice was presented. The work was later extended by See GHRU97 and See BaPT97 .

The fundamental lack of this technique is the missing support of analysis hot spots. The algorithm only investigates aggregates at different granularities but does not include the scope. For example it is not possible to pre-aggregate only the sales for the actual month; each aggregate contains data for all months.

Another central criticism of this method is that it does not adapt to the dynamics of the interactive query behavior ( See ThSe97 ). The selected set of aggregates is statically materialized after each data warehouse refresh and does not adapt the actual query set during the analysis phase. In order to overcome this problem, See ScSV96 presented a rather restrictive data warehouse cache based on the relational data model. Recently, See DRSN98 implemented a more flexible caching method for partitions of multidimensional aggregates. They address the same problem as the CubeStar approach. In contrast to CubeStar , their solution is based on aggregate partitions of a fixed size, like buffer pages.

The strategy of CubeStar is to store precalculated queries for reuse in an aggregate cache. The typical draw back of query caching is that a new query must be contained in the old one in order to utilize it. In CubeStar a patch working algorithm is used to combine cached multidimensional objects. The result of this algorithm in combination with a sophisticated cache strategy are substantial performance gains even for very small cache sizes ( See Experimental Results ). For a detailed description of the algorithms see See LeAl99 .

The Patch Working Principle

If a multidimensional query is to be answered by cached aggregates which only overlap but do not contain the query, a composition method (patch working) is necessary. Figure See Patch-Working illustrates the fundamental idea. Several MOs or fragments of MOs are selected (dark gray shaded areas) in order to answer the query. The goal of the patch working algorithm is to compose those fragments of multidimensional objects in the cache which allow the computation of the query at the least cost.

The restriction of convex scopes for multidimensional objects allows a very simple algorithm to determine the set of candidates to answer the query. One condition a candidate must fulfill is that its granularity must be smaller or equal than the granularity of the query. The other condition is that the query and the candidate are overlapping. Since the scope of a MO is defined by a tuple of classification nodes, this problem can be solved by simply checking intersections of nodes in the respective hierarchy trees.

The problem of selecting the candidates with the least cost, however, is in general NP complete and can be reduced to the 0/1 knapsack problem. In CubeStar a greedy algorithm is used to find an approximate solution. The algorithm successively chooses for each part of the query which is not yet covered that multidimensional object with the least (estimated) access cost per resulting data cell. If the cost function to access a fragment of a MO is monotonous 3 , the problem degenerates to the fractional knapsack and the greedy algorithm yields the optimal solution.

The Replacement Strategy

The fundamental part of a caching method is the replacement strategy which in this case is necessary to select multidimensional objects for removal from the cache if the cache is full and new aggregates are to be added. Due to non-uniform MO-sizes a set of several smaller MOs might be removed simultaneously in order to admit one large MO to the cache.

The knowledge about the data schema and specific properties of the data model in the OLAP application domain allows a much more sophisticated estimation of the benefit of a cached object as it is possible in traditional database systems. A simple LRU strategy can not make use of the available information about the dependencies and relationships between multidimensional objects. Therefore, the replacement strategy used for dynamic aggregate caching involves several factors some of which are strongly using the knowledge about the underlying data schema.

In the experiments ( See Experimental Results ) the influence of the following parameters for aggregate replacement is investigated:

The overall benefit per unit space B M ( MQ , C ) of a multidimensional object M for a cache configuration C after query MQ is a linear combination of these factors 5 divided by the size of M :

In the experiments, a, b, g, d are used to enable or disable these factors for the investigation of their specific influences. Only those MOs with the highest benefit are kept in the cache.

Relational Mapping of Multidimensional Structures

Of fundamental importance for the structured and efficient relational processing of multidimensional operators is a suitable relational representation of multidimensional objects. Today, most relational data warehouses are based on the star schema ( See Sample Star Schema ), where the data cube containing the measures is represented by a central fact table. Its primary key is a composition of foreign keys to the according dimension tables. The dimension tables contain - in a denormalized fashion - the classification structures as well as the property attributes for each dimension.

The use of a single dimension table for each dimension turns out to be problematic in the context of class-specific property attributes since different classes, like Camcorder and Washers , have also different schemata with regard to their properties (e.g. attributes VideoSystem and MaxLoad ). In order to cope with these problems, the global dimension tables of the traditional star schema are replaced (basically partitioned) by class-specific dimension tables (scope tables), each of which having a possibly different schema.

The following example illustrates this method for the product dimension. Foundation for the hierarchy are the base classes, i.e. the classes of the lowest level in the hierarchy like Camcorder and HomeVCR , with their specific property schemata:

Products_Camcorder ( ArticleId, Brand, VideoSystem, AccuType)
TR-75 Sony HI8 Li
TS-78 Sony HI8 NiCd
A200 JVC N8 NiCd

Products_HomeVCR  ( ArticleId, Brand, VideoSystem, RemoteControl)
V-201 JVC VHS Yes
ClassicI Grundig VHS-C No

All classes at higher levels are then recursively defined as views to the lower classes. The names of the child classes as well as all common attributes are propagated to the parent. For the product group Video the resulting scope table (view) is defined as follows:

create view Products_Video (ArticleId, Brand, VideoSystem, Family) as
select ArticleId, Brand, VideoSystem, ` Camcorder '
from Products_Camcorder
union all
select ArticleId, Brand, VideoSystem, ` HomeVCR '
from Products_HomeVCR

Products_Video    ( ArticleId, Brand, VideoSystem, Family)
TR-75 Sony HI8 Camcorder
TS-78 Sony HI8 Camcorder
A200 JVC N8 Camcorder
V-201 JVC VHS HomeVCR
ClassicI Grundig VHS-C HomeVCR

Hence, the information for each class node of a dimension is stored in a class-specific relation or view, containing all dimensional elements below that class node with information about their parents in the classification hierarchy and the subset of property attributes visible at that node ( See LeTW97 ).

This construction has a positive side effect. Since the scope of a MO is defined by a classification node ( See Multidimensional Objects ) which is represented by a specific scope table, the SQL queries for the computation of a multidimensional object do not need any restrictions beside the join conditions. The following example illustrates this approach. Suppose for the computation of the multidimensional object M from section 2, the patch worker picks up the following aggregates from the cache:

M 1 = ( [ Sales, FLOW, NONE ],
(Products.Group = `Video', Shops.Country = `Germany'),
[ (Products.ArticleId, Shops.ShopId), {} ] ) // raw data

M 2 = ( [ Sales, FLOW, SUM ],
(Products.Group = `Video', Shops.Region = `South'),
[ (Products.Family, Shops.State), { Brand, ShopType} ] )

M 3 = ( [ Sales, FLOW, SUM ],
(Products.Family = `Camcorder', Shops.Country = `Germany'),
[ (Produkte.Family, Shops.Region), { Brand
, VideoSystem , ShopType } ] )

In order to compose M based on these MOs CubeStar generates the following SQL query:

CREATE TABLE M AS
SELECT Family, Region, Brand, ShopType, SUM(Sales) AS Sales
FROM M 1, Products_HomeVCR P, Shops_North S
WHERE M 1.ArticleId = P.ArticleId AND M 1.ShopId = S.ShopId
GROUP BY Family, Region, Brand, ShopType
UNION ALL
SELECT Family, Region, Brand, ShopType, SUM(Sales) AS Sales
FROM M 2,
(SELECT DISTINCT(State, Region) FROM Shops_Germany) S
WHERE M 2.State = S.State
GROUP BY Family, S.Region, Brand, ShopType
UNION ALL
SELECT Family, Region, Brand, ShopType, Sales
FROM M 3
WHERE Family = `HomeVCR'

The example clearly shows that in the context of multidimensional objects all restrictions are implicitly applied by joins with the according scope tables. The view mechanism selects with the scope tables Products_HomeVCR and Shops_North in the case of MO M 1 only those elements which are requested in the query avoiding to scan a possibly large dimension table (like customer). In the case of using the cached aggregate MO M 2 , the summary data must be further aggregated from the state to the regional level. Therefore, the according scope table containing only the State-Region relationships must be dynamically generated by a SELECT DISTINCT statement applied to scope table Shops_Germany . No aggregation is necessary for the MO M 3 , because its granularity is already at the requested level. Moreover no join with a scope table is needed, because the selection attribute corresponds to the group-by attribute.

The Architecture of CubeStar

Like typical ROLAP-systems, the CubeStar prototype is designed according to a three-tier-architecture. CubeStar consists of a client tier, the multidimensional Cube­Star server and a relational backend (figure 5).

All query processing and aggregate management is based on the data model introduced in See Multidimensional Objects , i.e. queries as well as aggregates are internally represented by multidimensional objects. The task of the CubeStar server is twofold: On the one hand, based on the patch-working algorithm a set of cached MOs to accelerate the execution of the query must be determined and reflected in an SQL statement. On the other hand, these MOs must be created and maintained by the server as well. Therefore, the CubeStar server itself has a three-layer architecture consisting of the query processor, the aggregate cache management and the partition directory service.

Another component which is used by all three layers is the classification information service. Basically, the classification information service is an API providing easy access to the dimensions and classification hierarchies. Calls to the classification directory request for example the children or properties of a certain class node or relationships between class nodes in the hierarchy.

The task of the multidimensional query optimizer is to find for each leaf of the tree of a set of physically existing MOs. All relevant informations about materialized multidimensional objects are stored and managed by the partition directory. Based on the decision of the patch-worker, in the first phase, an SQL query is generated and executed for each leaf of the query tree. The intermediate result, a newly materialized MO, is added to a temporary storage area. In a second phase, an SQL statement is generated recursively for the whole query tree in a bottom up manner. The intermediate results for inner nodes are not materialized, because the operators applied during the computation of a nested query may become very complex. Thus, the compatibility checks necessary to determine the reusability of such an aggregate are very complicated. The final result, however, is materialized and asynchronously delivered to the client.

After the complete computation of the query, each of the newly materialized MOs for the leaves is handed over to the aggregate cache. Based on the replacement strategy ( See The Replacement Strategy ) one or more cached MOs might be chosen for replacement. Depending on the outcome of the benefit computation, either the new multidimensional object or one of those in the cache might be deleted in order to satisfy the space constraint.

Experimental Results

In order to simulate a typical OLAP query behavior we implemented a query generator, which constructs a new query in modifying the previous one based on the decision path in See Query generation . Each alternative can be parametrized with a certain probability. Since it is very hard to get actual query statistics from real world OLAP applications, we experimented with different parametrizations and got very good results for most of the query sequences. For simplicity, all simulations in this chapter are based on query sequences generated with the annotated probabilities.

The scenario consists of three dimensions with a four-, five- and six-level hierarchy, respectively. Each class at the base level of the hierarchies had between 10 and 15 property attributes. With a sparsity of 3% the data cube consisted of approximately 250.000 tuples.

The relational database management system used for the physical storage of the multidimensional objects was Oracle Enterprise Server 8.0.4. on a DEC Alpha 3000/800 server (1 CPU, 192 MByte RAM). The average query execution time for queries which could not utilize the cache was between 10 and 20 seconds. Therefore, the time for multidimensional processing can be neglected.

The following figures show the cost saving ratio (CSR; See LeAl99 , See ScSV96 , See DRSN98 ), i.e. the relative savings resulting from accessing the cache instead of raw data, for a set of simulations. The values of the abscissa are determined by the step counter.

See Development of the cost saving ratio (CSR) for 1.000 simulated single user queries; Comparison of the the dynamic aggregate and static preaggregation compares the dynamic aggregate cache as explained in See Processing Multidimensional Aggregates to the static preaggregation strategy from See HaRU96 . The different graphs in each diagram show the cost saving ratio for a number of different cache sizes. With the dynamic caching method cost reductions of more than 50% are possible with a cache size of only 10.000 tuples ( See Development of the cost saving ratio (CSR) for 1.000 simulated single user queries; Comparison of the the dynamic aggregate and static preaggregation a). Note, that this is only about 4% of the fact table size, i.e. much less than a simple B-Tree index on the fact table would require. With a cache size of 25.000 tuples the speed up can be increased to about 60%. Adding more space does not yield any benefit, since there are always queries which can not be answered using the cache.

Static versus Dynamic Preaggregation

See Development of the cost saving ratio (CSR) for 1.000 simulated single user queries; Comparison of the the dynamic aggregate and static preaggregation b shows the cost savings if a fix set of aggregates was computed a priori using the algorithm proposed in See HaRU96 . Although the time to compute the aggregates was not even taken into account, the cost saving ratio is much lower. An additional space of already 25.000 tuples yields only cost savings of about 5%. This is because each aggregate contains the full scope of the data cube. Therefore, only aggregates which are summarized to very high aggregation levels are smaller than 10.000 tuples. For cost savings comparable to the dynamic method an additional space for summary tables of more than 200% of the size of the fact table would be necessary. Note that especially if there are many independent grouping attributes, like the 10 or 15 property attributes for each dimension, static algorithms are not able to select a proper set of summary tables because of the exponentially growing number of possibilities.

Influence of the Cache Parameters

In See Development of the cost saving ratio (CSR); Influence of the cache parameters and multiuser simulation a the influence of each of the cache parameters on the cost saving ratio was investigated separately. The cache size was 50.000 tuples. It turns out that the reference density D M and the degree of relationship R M had a more positive effect than the reconstruction cost C M and the particularly bad absolute benefit A M , which is the only parameter being absolutely independent of the query behavior and current cache content. The best result, cost savings of over 60%, were obtained using the combination of R M and D M . Simulations with other query sets confirmed that only these two parameters yields the highest cost savings.

Influence of Multiple Users

See Development of the cost saving ratio (CSR); Influence of the cache parameters and multiuser simulation b shows the cost saving ratio in a multi-user simulation with five independent users working with the same cache. Each user issued about 300 queries. The parameter configuration was the same as in See Development of the cost saving ratio (CSR) for 1.000 simulated single user queries; Comparison of the the dynamic aggregate and static preaggregation a. The cost saving ratio is only about 10% less than in the single user simulation. The reason for the high cost savings of the small cache even for multiple users is twofold. First, each user needs only a very small cache. Second, some queries may access aggregates in the cache which were requested by another user.

Summary and Conclusion

This article introduces the general concepts of the multidimensional OLAP server CubeStar . Both, the multidimensional data model and the dynamic management and composition of aggregates based on the patch-working algorithm are an extension of previously published work. It was demonstrated that an adaptive strategy for preaggregation in general yields much better results than static algorithms. It was shown that performance increases of up to 60% are possible with cache sizes of only 5-10% of the size of the fact table. Except for the patch working, no additional pre- or post-processing is necessary. The cache can be used in conjunction with other optimizations like indexes or also static aggregates. In order to make the results comparable to other systems we will adapt the data of the APB-1 Benchmark of the OLAP Council ( See OLAP98 ).

All components of the CubeStar server are completely implemented in Java. Current work is focussed on good parametrizations of the replacement strategy and an extension according to of the dynamic creation of index structures supporting access to multidimensional aggregates. This can be seen as the dynamic adaption of See GHRU97 . Other research activities involves the exploitation of parallelization and distribution ( See AlLe98 , See AlGL98 ). On the data modeling side we are currently exploring object relational techniques in order to reflect the classification structures, or classes, in a more natural manner.

References

AgGS97

Agrawal, R.; Gupta, A.; Sarawagi, S.: Modeling Multidimensional Databases, in: 13th International Conference on Data Engineering (ICDE'97, Birmingham, U.K., April 7-1), 1997

AlLe98

Albrecht, J., Lehner, W.: On-line Analytical Processing in Distributed Data Warehouses, in: International Database Engineering and Applications Symposium (IDEAS'98, Cardiff, Wales, U.K., July 8-10), 1998

AlGL98

Albrecht, J., Günzel, H., Lehner, W.: An Architecture for Distributed OLAP, in: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'98, Las Vegas (NA), USA, July 13-16), 1998

BaLe97

Bauer, A.; Lehner, W.: The Cube-Query-Language for Multidimensional Statistical and Scientific Database Systems, in: 5th International Conference on Database Systems For Advanced Applications (DASFAA'97, Melbourne, Australia, April 1-4), 1997

BaPT97

Baralis, E.; Paraboschi, S.; Teniente, E.: Materialized Views Selection in a Multidimensional Database, in: 23rd International Conference on Very Large Data Bases (VLDB'97, Athens, Greece, August     25-29), 1997

CaTo97

Cabibbo, L.; Torlone, R.: Querying Multidimensional Databases, in: 6th International Workshop on Database Programming Languages (DBPL'97, Estes Park (CO), USA, August 18-20), 1997

Cogn98

N.N.: Powerplay, Product Information, Cognos Incorporated, http://www.cognos.com/busintell/products/powerplay_overview.html, 1998

CoCS93

Codd, E.; Codd, S.; Salley, C.: Providing OLAP (On-line Analytical Processing) to User Analysts: An IT Mandate, White Paper, Arbor Software Corporation, 1993

DRSN98

Deshpande, P.; Ramasamy, K.; Shukla, A.; Naughton, J.: Caching Multidimensional Queries Using Chunks, in: 27th International Conference on the Management of Data (SIGMOD `98, Seattle, USA, June 2-4), 1998

GHRU97

Gupta, H.; Harinarayan, V.; Rajaraman, A.; Ullman, J.D.: Index Selection for OLAP, in: 13th International Conference on Data Engineering (ICDE'97, Birmingham, U.K., April 7-11), 1997

Gupt97

Gupta, H.: Selection of Views to Materialize in a Data Warehouse, in: 6th International Conference on Database Theory (ICDT'97, Delphi, Greece, January 8-10), 1997

GyLa97

Gyssens, M.; Lakshmanan, L.V.S.: A Foundation for Multi-Dimensional Databases, in: 23th International Conference on Very Large Data Bases (VLDB'97, Athens, Greece, August 25-29), 1997

HaRU96

Harinarayan, V.; Rajaraman, A.; Ullman, J.D.: Implementing Data Cubes Efficiently, in: 25th International Conference on Management of Data , (SIGMOD'96, Montreal, Canada, June 4-6), 1996

Hype98

N.N.: Hyperion Essbase OLAP Server, Product Information, Hyperion Solutions Corporation, http://www.hyperion.com/essbaseolapprodInfo.cfm, 1998

LeAl99

Lehner, W.; Albrecht, J.: Divide and Aggregate: Caching Multidimensional Objects, Technical Report, University of Erlangen, 1999 (for access of the reviewer only, please do not distribute: http://www6.informatik.uni-erlangen.de/~albrecht/LeAl99.html)

LeAW98

Lehner, W.; Albrecht, J.; Wedekind, H.: Multidimensional Normal Forms, in: 10th International Conference on Scientific and Statistical Data Management (SSDBM'98, Capri, Italy, July 1-3), 1998

Lehn98

Lehner, W.: Modeling Large Scale OLAP Scenarios, in: 6th International Conference on Extending Database Technology (EDBT'98, Valencia, Spain, March 23-27), 1998

LeSh97

Lenz, H.-J.; Shoshani, A.: Summarizability in OLAP and Statistical Data Bases, in: 9th International Conference on Statistical and Scientific Database Management (SSDBM'97, Olympia (WA), August 11-13), 1997

LeTW97

Lehner, W.; Teschke, M.; Wedekind, H.: Über Aufbau und Auswertung multidimensionaler Daten. In: Tagungsband der Konferenz über Datenbanken in Büro, Technik und Wissenschaft (BTW'97, Ulm, Germany, March 5-7), 1997

LiWa96

Li, C; Wang, X.S.: A Data Model for Supporting On-Line Analytical Processing, in: 5th International Conference on Information and Knowledge Management, (CIKM'96, Rockville (MD), November 12-16), 1996

OLAP98

N.N.: APB-1 OLAP Benchmark, Release II, OLAP Council, http://www.olapcouncil.org/research/bmarkly.htm, November 1998

ScSV96

Scheuermann, P.; Shim, J.; Vingralek, R.: WatchMan: A Data Warehouse Intelligent Cache Manager, in: 22nd International Conference on Very Large Data Bases (VLDB ´96, Bombay, India, September 3-6), 1996

SDNR96

Shukla, A.; Deshpande, P.; Naughton, J.F.; Ramasamy, K.: Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies. In: International Conference on Very Large Data Bases (VLDB'96, Bombay, India, September 3-6), 1996

SmSm77

Smith, J.M.; Smith, D.C.P.: Database Abstractions: Aggregation and Generalization, ACM Transactions on Database Systems, Vol. 2, No. 2, 1977

ThSe97

Theodoratos, D.; Sellis, T.: Data Warehouse Configuration, in: Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB´97, Athens, Greece, August 25-29), 1997

YaKL97

Yang, J.; Karlapalem, K.; Li, Q.: Algorithms for Materialized View Design in Data Warehousing Environment, in: 23rd International Conference on Very Large Data Bases (VLDB'97, Athens, Greece, August 25-29), 1997


3. Here, monotony basically means that the cost to access a fragment of an object de- or increases with the size of the fragment. Unfortunately, this is in general not the case in relational databases.

4. This formula corresponds to the weighted reference density as used in the classical least reference density (LRD) replacement strategy.

5. The parameters are normalized by their averages (indicated by the dot).