This chapter contains:
The motivation for using materialized views is to improve performance, but the overhead associated with materialized view management can become a significant system management problem. Materialized view management activities include:
This chapter groups these tasks into two areas: warehouse refresh and warehouse advisor, where warehouse refresh is concerned with ensuring that the materialized views contain the correct and latest data and the warehouse advisor recommends the materialized views to create, retain, and drop.
After the initial effort of creating and populating the data warehouse or data mart, the major administration overhead is the update process, which involves the periodic extraction of incremental changes from the operational systems; transforming the data; verification that the incremental changes are correct, consistent, and complete; bulk-loading the data into the warehouse; and refreshing indexes and materialized views so that they are consistent with the detail data.
The update process must generally be performed within a limited period of time known as the update window. The update window depends on the update frequency (such as daily or weekly) and is business-dependent. For a daily update frequency, an update window of two to six hours might be typical.
The update window usually displays the time for the following activities:
A popular and efficient way to load data into a warehouse or data mart is to use SQL*Loader with the
PARALLEL option or to use another loader tool that uses the Oracle direct path API.
See Oracle8i Utilities for the restrictions and considerations when using SQL*Loader with the
Loading strategies can be classified as one-phase or two-phase. In one-phase loading, data is loaded directly into the target table, quality assurance tests are performed, and errors are resolved by performing DML operations prior to refreshing materialized views. If a large number of deletions are possible, then storage utilization may be adversely affected, but temporary space requirements and load time are minimized. The DML that may be required after one-phase loading causes multi-table aggregate materialized views to become unusable in the safest rewrite integrity level.
In a two-phase loading process:
Immediately after loading the detail data and updating the indexes on the detail data, the database can be opened for operation, if desired. Query rewrite can be disabled by default (with
= FALSE) until all the materialized views are refreshed, but enabled at the session level for any users who do not require the materialized views to reflect the data from the latest load (with
= TRUE). However, as long as
QUERY_REWRITE_INTEGRITY = ENFORCED or
TRUSTED, this is not necessary as the system ensures that only materialized views with updated data participate in a query rewrite. These packages can be used to refresh any type of materialized view, such as ones containing joins only, or joins and aggregates, or aggregates on single tables.
When creating a materialized view, you have the option of specifying whether the refresh occurs ON DEMAND or ON COMMIT. To use the fast warehouse refresh facility, the ON DEMAND mode must be specified, then the materialized view can be refreshed by calling one of the procedures in DBMS_MVIEW.
The DBMS_MVIEW package provides three different types of refresh operations.
Refresh one or more materialized views.
Refresh all materialized views.
Refresh all table-based materialized views that depend on a specified detail table or list of detail tables.
See "Manual Refresh Using the DBMS_MVIEW Package" for more information about this package.
Performing a refresh operation requires temporary space to rebuild the indexes, and can require additional space for performing the refresh operation itself. Fast refresh may also require temporary tables to be created in the user's temporary tablespace.
Some sites may prefer to not refresh all of their materialized views at the same time. Therefore, if you defer refreshing your materialized views, you can temporarily disable query rewrite with
= FALSE. Users who still want access to the stale materialized views can override this default with
SESSION SET QUERY_REWRITE_ENABLED
= TRUE. After refreshing the materialized views, you can re-enable query rewrite as the default for all sessions in the current database instance by setting
Refreshing a materialized view automatically updates all of its indexes; in the case of full refresh, this requires temporary sort space. If insufficient temporary space is available to rebuild the indexes, then you must explicitly drop each index or mark it unusable prior to performing the refresh operation.
When a materialized view is refreshed, one of four refresh methods may be specified as shown in the table below.
Refreshes by recalculating the materialized view's defining query when atomic refresh=TRUE and COMPLETE is the same as FORCE if atomic refresh=FALSE.
Refreshes by incrementally applying changes to the detail tables.
Uses the default refresh method. If the default refresh method is FORCE, it tries to do a fast refresh. If that is not possible, it does a complete refresh.
Unconditionally does a complete refresh.
A complete refresh occurs when the materialized view is initially defined, unless it references a prebuilt table and complete refresh may be requested at any time during the life of the materialized view. Since the refresh involves reading the detail table to compute the results for the materialized view, this can be a very time-consuming process, especially if there are huge amounts of data to be read and processed. Therefore, one should always consider the time required to process a complete refresh before requesting it.
However, there are cases when the only refresh method available is complete refresh because the materialized view does not satisfy the conditions specified in the following section for a fast refresh.
Most data warehouses require periodic incremental updates to their detail data. As described in "Schema Design Guidelines for Materialized Views", you can use the SQL*Loader direct path option, or any bulk load utility that uses Oracle's direct path interface, to perform incremental loads of detail data. Use of Oracle's direct path interface makes fast refresh of your materialized views efficient because, instead of having to recompute the entire materialized view, the changes are added to the existing data. Thus, applying only the changes can result in a very fast refresh time.
The time required to perform incremental refresh is sensitive to several factors:
KEYdeclaration in a
The first two factors can be addressed by partitioning the materialized view container by time, like the fact tables, and by creating a local concatenated index on the materialized view keys. The third factor can be addressed by creating dimensions and hierarchies for your schema, and by ensuring that all materialized view inner joins are strict 1:n relationships whenever possible, as described below.
If an incremental load was performed, it is typically much faster to perform a fast refresh than a complete refresh. Warehouse fast refresh is supported in all but the following cases:
Note that incremental refresh may perform both inserts and updates to the materialized view. If a new row is inserted, any columns in the materialized view, other than key or measure columns, are set to their default values.
If you want to have a materialized view that can be fast refreshable, even when DML operations such as UPDATE and DELETE are applied to the referenced table, refer to Chapter 29, "Materialized Views", which describes the types of materialized views that allow DML operations, provided a materialized view log exists.
Three different refresh procedures are available in the DBMS_MVIEW package for performing ON DEMAND refresh and they each have their own unique set of parameters. To use this package, Oracle8 queues must be available, which means that the following parameters must be set in the initialization parameter file. If queues are unavailable, refresh will fail with an appropriate message.
See Oracle8i Supplied Packages Reference for detailed information about the
The number of background processes. Determines how many materialized views can be refreshed concurrently.
In seconds, the interval between which the job queue scheduler checks to see if a new job has been submitted to the job queue.
Determines the directory where the refresh log is written. If unspecified, no refresh log will be created.
These packages also create a log which, by default, is called refresh.log and is useful in helping to diagnose problems during the refresh process. This log file can be renamed by calling the procedure DBMS_OLAP.SET_LOGFILE_NAME ('log filename').
The DBMS_MVIEW.REFRESH procedure is used to refresh one or more materialized views that are explicitly defined in the FROM list. This refresh procedure can also be used to refresh materialized views used by replication, so not all of the parameters are required. The required parameters to use this procedure are:
When refreshing multiple materialized views, if one of them has an error while being refreshed, the entire job will continue if set to TRUE.
These are the values required by warehouse refresh, since these parameters are used by the replication process.
If set to TRUE, then warehouse refresh is not used. It uses the snapshot/replication refresh instead. If set to FALSE, the warehouse refresh method is used and each refresh operation is performed within its own transaction.
Therefore, to perform a fast refresh on the materialized view store_mv, the package would be called as follows:
Multiple materialized views can be refreshed at the same time and they don't all have to use the same refresh method. To give them different refresh methods, specify multiple method codes in the same order as the list of materialized views (without commas). For example, the following specifies that store_mv will be completely refreshed and product_mv will receive a fast refresh.
An alternative to specifying the materialized views to refresh is to use the procedure DBMS_MVIEW.REFRESH_ALL_MVIEWS. This will result in all materialized views being refreshed. If any of the materialized views fails to refresh, then the number of failures is reported.
The parameters for this procedure are:
An example of refreshing all materialized views is:
The third option is the ability to refresh only those materialized views that depend on a specific table using the procedure DBMS_MVIEW. REFRESH_DEPENDENT. For example, suppose the changes have been received for the orders table but not customer payments. The refresh dependent procedure can be called to refresh only those materialized views that reference the ORDER table.
The parameters for this procedure are:
A Boolean parameter. If set to TRUE, the number_of_failures output parameter will be set to the number of refreshes that failed, and a generic error message will indicate that failures occurred. The refresh log will give details of each of the errors, as will the alert log for the instance. If set to FALSE, the default, then refresh, will stop after it encounters the first error, and any remaining materialized views in the list will not be refreshed.
A Boolean parameter.
In order to perform a full refresh on all materialized views that reference the ORDERS table, use:
If the process that is executing DBMS_MVIEW.REFRESH is interrupted or the instance is shut down, any refresh jobs that were executing in job queue processes will be requeued and will continue running. To remove these jobs, use the DBMS_JOB.REMOVE procedure.
Here are some guidelines for using the refresh mechanism for materialized views with joins and aggregates.
A materialized view which contains aggregates and is based on a single table may be fast refreshable, provided it adheres to the rules in Table 29-1, "Requirements for Fast Refresh of Materialized Views" when data changes are made using either direct path or SQL DML statements. At refresh time, Oracle detects the type of DML that has been done (direct-load or SQL DML) and uses either the materialized view log or information available from the direct-path to determine the new data. If changes will be made to your data using both methods, then refresh should be performed after each type of data change rather than issuing one refresh at the end. This is because Oracle can perform significant optimizations if it detects that only one type of DML is done. It is therefore recommended that scenario 2 be followed rather than scenario 1.
To improve fast refresh performance, it is highly recommended that indexes be created on the columns which contain the rowids.
Furthermore, for refresh ON-COMMIT, Oracle keeps track of the type of DML done in the committed transaction. It is thus recommended that the user does not do direct-path load and SQL DML to other tables in the same transaction as Oracle may not be able to optimize the refresh phase.
If the user has a lot of updates to the table, it is better to bunch them in one transaction, so that refresh of the materialized view will be performed just once at commit time rather than after each update. In the warehouse, after a bulk load, the user should enable parallel DML in the session and perform the refresh. Oracle will use parallel DML to do the refresh, which will enhance performance tremendously. There is more to gain if the materialized view is partitioned.
As an example, assume that a materialized view is partitioned and has a parallel clause. The following sequence would be recommended in a data warehouse
If a materialized view contains joins but no aggregates, then having an index on each of the join column rowids in the detail table will enhance refresh performance greatly since this type of materialized view tends to be much larger than materialized views containing aggregates. For example, referring to the following materialized view:
CREATE MATERIALIZED VIEW detail_fact_mv BUILD IMMEDIATE REFRESH FASY ON COMMIT AS SELECT f.rowid "fact_rid", t.rowid "time_rid", s.rowid "store_rid", s.store_key, s.store_name, f.dollar_sales, f.unit_sales, f.time_key FROM fact f, time t, store s WHERE f.store_key = s.store_key(+) and f.time_key = t.time_key(+);
Indexes should be created on columns FACT_RID, TIME_RID and STORE_RID. Partitioning is highly recommended as is enabling parallel DML in the session before invoking refresh because it will greatly enhance refresh performance.
This type of materialized view can also be fast refreshed if DML is performed on the detail table. It is therefore recommended that the same procedure be applied to this type of materialized view as for a single table aggregate. That is, perform one type of change (direct-path load or DML) and then refresh the materialized view. This is because Oracle can perform significant optimizations if it detects that only one type of change has been done.
Also, it is recommended that the refresh be invoked after each table is loaded, rather than load all the tables and then perform the refresh. Therefore, try to use scenario 2 below for your refresh procedures.
For refresh ON-COMMIT, Oracle keeps track of the type of DML done in the committed transaction. It is therefore recommended that you do not perform direct-path and conventional DML to other tables in the same transaction because Oracle may not be able to optimize the refresh phase. For example, the following is not recommended:
One should also try not to mix different types of conventional DML statements if possible. This would again prevent using various optimizations during fast refresh. For example, try to avoid:
If many updates are needed, try to group them all into one transaction because refresh will be performed just once at commit time, rather than after each update.
Note that if, when you use the DBMS_MVIEW package to refresh a number of materialized views containing only joins with the "atomic" parameter set to TRUE, parallel DML is disabled, which could lead to poor refresh performance.
In a data warehousing environment, assuming that the materialized view has a parallel clause, the following sequence of steps is recommended:
The following parameters
While a job is running, a
SELECT * FROM V$SESSION_LONGOPS statement will tell you the progress of each materialized view being refreshed.
To look at the progress of which jobs are on which queue, use a
SELECT * FROM DBA_JOBS_RUNNING statement.
The table ALL_MVIEW_ANALYSIS contains the values, as a moving average, for the time most recently refreshed and the average time to refresh using both full and incremental methods.
Refresh will schedule the long running jobs first. Use the refresh log to check what each refresh did.
After you have performed a load or incremental load and rebuilt the detail table indexes, you need to re-enable integrity constraints (if any) and refresh the materialized views and materialized view indexes that are derived from that detail data. In a data warehouse environment, referential integrity constraints are normally enabled with the
RELY options. An important decision to make before performing a refresh operation is whether the refresh needs to be recoverable. Because materialized view data is redundant and can always be reconstructed from the detail tables, it may be preferable to disable logging on the materialized view. To disable logging and run incremental refresh non-recoverably, use the
MATERIALIZED VIEW...NOLOGGING statement prior to
If the materialized view is being refreshed using the ON COMMIT method, then, following refresh operations, the alert log (alert_ <SID>.log) and the trace file (ora_<SID>_number.trc) should be consulted to check that no errors have occurred.
To help you select from among the many materialized views that are possible in your schema, Oracle provides a collection of materialized view analysis and advisory functions in the
DBMS_OLAP package. These functions are callable from any PL/SQL program.
From within the DBMS_OLAP package, several facilities are available to:
Whenever the summary advisor is run, with the exception of reporting the size of a materialized view, the results are placed in a table in the database which means that they can be queried, thereby saving the need to keep running the advisor process.
The advisory functions of the
DBMS_OLAP package require you to gather structural statistics about fact table cardinalities, dimension table cardinalities, and the distinct cardinalities of every dimension
KEY column, and fact table key column. This can be accomplished by loading your data warehouse, then gathering either exact or estimated statistics with the
DBMS_STATS package or the
TABLE statement. Because gathering statistics is time-consuming and extreme statistical accuracy is not required, it is generally preferable to estimate statistics. The advisor cannot be used if no dimensions have been defined, which is a good reason for ensuring that some time is spent creating them.
Optionally, if you have purchased the Oracle Enterprise Manager Performance Pack, then you may also run Oracle Trace to gather dynamic information about your query work load, which can then be used by an advisory function. If Oracle Trace is available, serious consideration should be given to collecting materialized view usage. Not only does it enable the DBA to see which materialized views are in use, but it also means that the advisor may detect some unusual query requests from the users which would result in recommending some different materialized views.
Oracle Trace gathers the following work load statistics for the analysis of materialized views:
Oracle Trace includes two new "point events" for collecting runtime statistics about materialized views: one event that records the selected materialized view names at request execution time, and another event that records the estimated benefit and ideal materialized view at compile time. You can log just these two events for materialized view analysis if desired, or you can join this information with other information collected by Oracle Trace, such as the SQL text or the execution time of the request, if other Trace events are also collected. A collection option in the Oracle Trace Manager GUI provides a way to collect materialized view management statistics.
To collect and analyze the summary event set, you must do the following:
For further information on these parameters, refer to the Oracle Trace Users Guide.
SUMMARY_EVENTset. Oracle Trace Manager reads information from the associated configuration file and registers events to be logged with Oracle. While collection is enabled, the workload information defined in the event set gets written to a flat log file.
V_192216243_F_5_E_14_8_1 and V_192216243_F_5_E_15_8_1. The workload tables should be located in the same schema that the subsequent workload analysis will be performed in. Alternatively, the collection file, which usually has an extension of .CDF, can be formatted manually using otrcfmt. A manual collection command is illustrated below:
GATHER_TABLE_STATSprocedure of the
STATISTICSto collect cardinality statistics on all fact tables, dimension tables, and key columns (any column that appears in a dimension
KEYclause of a
Once these four steps have been completed, you will be ready to make recommendations about your materialized views.
The analysis and advisory functions for materialized views are
RECOMMEND_MV_W in the
DBMS_OLAP package. These functions automatically recommend which materialized views to create, retain, or drop.
RECOMMEND_MVuses structural statistics, but not workload statistics, to generate recommendations.
RECOMMEND_MV_Wuses both workload statistics and structural statistics.
You can call these functions to obtain a list of materialized view recommendations that you can select, modify, or reject. Alternatively, you can use the
DBMS_OLAP package directly in your PL/SQL programs for the same purpose.
The summary advisor will not be able to recommend summaries if the following conditions are not met:
Four parameters are required to use these functions:
A typical call to the package, where the main fact table is called FACT, would be:
No workload statistics are used in this example.
The results from calling this package are put in the table MVIEWS$_RECOMMENDATIONS. The contents of this table can be queried or they can be displayed using the SQL file sadvdemo.sql. The output from calling this procedure is the same irrespective of whether the workload statistics are used.
The recommendations can be viewed by calling the procedure DEMO_SUMADV.PRETTYPRINT_RECOMMENDATIONS, but first you need to run sadvdemo.sql. It is suggested that SET SERVEROUTPUT ON SIZE 900000 be used to ensure that all the information can be displayed. A sample recommendation that resulted from calling this package is shown below.
Recommended Action is DROP existing summary GROCERY.QTR_STORE_PROMO_SUM
Storage in bytes is 196020
Percent performance gain is null
Benefit-to-cost ratio is null
Recommended Action is RETAIN existing summary GROCERY.STORE_SUM
Storage in bytes is 21
Percent performance gain is null
Benefit-to-cost ratio is null
To call the package and use the workload statistics, the only difference is the procedure name that is called. For example, instead of recommend_mv, it's recommend_mv_w.
Recommendation Number = 3 Recommended Action is CREATE new summary: SELECT PROMOTION.PROMOTION_KEY, STORE.STORE_KEY, STORE.STORE_NAME, STORE.DISTRICT, STORE.REGION , COUNT(*), SUM(FACT.CUSTOMER_COUNT), COUNT(FACT.CUSTOMER_COUNT), SUM(FACT.DOLLAR_COST), COUNT(FACT.DOLLAR_COST), SUM(FACT.DOLLAR_SALES), COUNT(FACT.DOLLAR_SALES), MIN(FACT.DOLLAR_SALES), MAX(FACT.DOLLAR_SALES), SUM(FACT.RANDOM1), COUNT(FACT.RANDOM1), SUM(FACT.RANDOM2), COUNT(FACT.RANDOM2), SUM(FACT.RANDOM3), COUNT(FACT.RANDOM3), SUM(FACT.UNIT_SALES), COUNT(FACT.UNIT_SALES) FROM GROCERY.FACT, GROCERY.PROMOTION, GROCERY.STORE WHERE FACT.PROMOTION_KEY = PROMOTION.PROMOTION_KEY AND FACT.STORE_KEY = STORE.STORE_KEY GROUP BY PROMOTION.PROMOTION_KEY, STORE.STORE_KEY, STORE.STORE_NAME, STORE.DISTRICT, STORE.REGION Storage in bytes is 257999.999999976 Percent performance gain is .533948057298649 Benefit-to-cost ratio is .00000206956611356085
Recommended Action is CREATE new summary: SELECT STORE.REGION, TIME.QUARTER, TIME.YEAR , COUNT(*), SUM(FACT.CUSTOMER_COUNT), COUNT(FACT.CUSTOMER_COUNT), SUM(FACT.DOLLAR_COST), COUNT(FACT.DOLLAR_COST), SUM(FACT.DOLLAR_SALES), COUNT(FACT.DOLLAR_SALES), MIN(FACT.DOLLAR_SALES), MAX(FACT.DOLLAR_SALES), SUM(FACT.RANDOM1), COUNT(FACT.RANDOM1), SUM(FACT.RANDOM2), COUNT(FACT.RANDOM2), SUM(FACT.RANDOM3), COUNT(FACT.RANDOM3), SUM(FACT.UNIT_SALES), COUNT(FACT.UNIT_SALES) FROM GROCERY.FACT, GROCERY.STORE, GROCERY.TIME WHERE FACT.STORE_KEY = STORE.STORE_KEY AND FACT.TIME_KEY = TIME.TIME_KEY GROUP BY STORE.REGION, TIME.QUARTER, TIME.YEAR Storage in bytes is 86 Percent performance gain is .523360688578368 Benefit-to-cost ratio is .00608558940207405
Since a materialized view occupies storage space in the database, it is helpful to know how much space will be required before it is created. Rather than guess or wait until it has been created and then discoverer that insufficient space is available in the tablespace, use the package DBMS_ESTIMATE_SIZE. Calling this procedure instantly returns an estimate of the size in bytes that the materialized view is likely to occupy.
The parameters to this procedure are:
and the package returns:
In the example shown below, the query that will be specified in the materialized view is passed into the ESTIMATE_SUMMARY_SIZE package. Note that the SQL statement is passed in without a ";".
DBMS_OLAP.estimate_summary_size ('simple_store', 'SELECT product_key1, product_key2, SUM(dollar_sales) AS sum_dollar_sales, SUM(unit_sales) AS sum_unit_sales, SUM(dollar_cost) AS sum_dollar_cost, SUM(customer_count) AS no_of_customers FROM fact GROUP BY product_key1, product_key2' , no_of_rows, mv_size );
The procedure returns two values, an estimate for the number of rows and the size of the materialized view in bytes, as shown below.
One of the major administrative problems with materialized views is knowing whether they are being used. Materialized views could be in regular use or they could have been created for a one-time problem that has now been resolved. However, the usergroup who requested this level of analysis might never have told the DBA that it was no longer required, so the materialized view remains in the database occupying storage space and possibly being regularly refreshed.
If the Oracle Trace option is available, then it can advise the DBA which materialized views are in use, using exactly the same procedure as for collecting workload statistics. Trace collection is enabled and in this case the collection period is likely to be longer that for query collection because Trace will only report on materialized views that were used while it was collecting statistics. Therefore, if too small a window is chosen, not all the materialized views that are in use will be reported.
Once you are satisfied that you have collected sufficient data, the data is formatted by Oracle Trace, just as if it were workload information, and then the package EVALUATE_UTILIZATION_W is called. It analyses the data and then the results are placed in the table MVIEWS$_EVALUATIONS.
In the example below, the utilization of materialized views is analyzed and the results are displayed.
Note that no parameters are passed into the package.
Shown below is a sample output obtained by querying the table MVIEW$EVALUATIONS which is providing the following information:
MVIEW_OWNER MVIEW_NAME RANK SIZE FREQ CUMULATIVE BENEFIT ----------- ------------------- ----- ------ ---- ---------- ---------- GROCERY STORE_MIN_SUM 1 340 1 9001 26.4735294 GROCERY STORE_MAX_SUM 2 380 1 9001 23.6868421 GROCERY STORE_STDCNT_SUM 3 3120 1 3000.38333 .961661325 GROCERY QTR_STORE_PROMO_SUM 4 196020 2 0 0 GROCERY STORE_SALES_SUM 5 340 1 0 0 GROCERY STORE_SUM 6 21 10 0 0